Create a NumPy Array¶
The learning objectives of this section are:
- Understand advantages of vectorised code using NumPy (over standard python ways)
- Create NumPy arrays
- Convert lists and tuples to NumPy arrays
- Create (initialise) arrays
- Compare computation times in NumPy and standard Python lists
NumPy Basics¶
NumPy is a library written for scientific computing and data analysis. It stands for numerical python.
The most basic object in NumPy is the ndarray
, or simply an array
, which is an n-dimensional, homogenous array. By homogenous, we mean that all the elements in a NumPy array have to be of the same data type, which is commonly numeric (float or integer).
Create an array From an Iterable¶
Such as
list
tuple
range
iterator
Notice that not all iterables can be used to create a numpy array, such as set
and dict
#np is simply an alias, you may use any other alias, though np is quite standard
import numpy as np
Create an 1D Array¶
# Creating a 1-D array using a list
arr = np.array([1,2,3,4,5])
print(arr)
print(type(arr))
# Creating a 1-D array using a tuple
arr = np.array((1,2,3,4,5))
print(arr)
arr = np.array(range(10))
print(arr)
Create an 2D Array with Specified Data Type¶
arr = np.array([[1,2,3], [4,5,6]], dtype='int')
print(arr)
print('Data Type:',arr.dtype)
Create an 3D Array¶
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)
Create an aray within specified range¶
np.arange()
method can be used to replace np.array(range())
method
# np.arange(start, stop, step)
arr = np.arange(0, 20, 2)
print(arr)
The other common way is to initialise arrays. You do this when you know the size of the array beforehand.
The following ways are commonly used:
np.linspace()
: Create array of fixed lengthnp.random.rand()
: method returns values in the range [0,1)np.ones()
: Create array of 1snp.zeros()
: Create array of 0snp.random.random()
: Create array of random numbersnp.arange()
: Create array with increments of a fixed step size
Create an array of evenly spaced numbers within specified range¶
np.linspace(start, stop, num_of_elements, endpoint=True, retstep=False)
has 5 parameters:
start
: start number (inclusive)stop
: end number (inclusive unlessendpoint
set toFalse
)num_of_elements
: number of elements contained in the arrayendpoint
: boolean value representing whether thestop
number is inclusive or notretstep
: boolean value representing whether to return the step size
arr, step_size = np.linspace(0, 5, 8, endpoint=False, retstep=True)
print(arr)
print('The step size is ' + str(step_size))
Create an array of random values of given shape¶
np.random.rand()
method returns values in the range [0,1)
np.random.rand()
np.random.rand(5)
arr = np.random.rand(3, 3)
print(arr)
np.random.rand(3,3)
# Create a 4 x 4 random array of integers ranging from 0 to 9
np.random.randint(0, 100, (4,4))
Create an array of zeros of given shape¶
np.zeros()
: create array of all zeros in given shapenp.zeros_like()
: create array of all zeros with the same shape and data type as the given input array
zeros = np.zeros((2,3))
print(zeros)
np.zeros_like()¶
arr = np.array([[1,2], [3,4],[5,6]])
arr
zeros = np.zeros_like(arr)
print(zeros)
print('Data Type:',zeros.dtype)
Create an array of ones of given shape¶
np.ones()
: create array of all ones in given shapenp.ones_like()
: create array of all ones with the same shape and data type as the given input array
ones = np.ones((3,2))
print(ones)
arr = [[1,2,3], [4,5,6]]
ones = np.ones_like(arr)
print(ones)
print('Data Type: ' + str(ones.dtype))
Create an empty array of given shape¶
np.empty()
: create array of empty values in given shapenp.empty_like()
: create array of empty values with the same shape and data type as the given input array
Notice that the initial values are not necessarily set to zeroes.
They are just some garbage values in random memory addresses.
empty = np.empty((5,5))
print(empty)
arr = np.array([[1,2,3], [4,5,6]], dtype=np.int64)
empty = np.empty_like(arr)
print(empty)
print('Data Type: ' + str(empty.dtype))
Create an array of constant values of given shape¶
np.full()
: create array of constant values in given shapenp.full_like()
: create array of constant values with the same shape and data type as the given input array
full = np.full((4,4), 5)
print(full)
arr = np.array([[1,2], [3,4]], dtype=np.float64)
full = np.full_like(arr, 5)
print(full)
print('Data Type: ' + str(full.dtype))
Create an array in a repetitive manner¶
np.repeat(iterable, reps, axis=None)
: repeat each element by n timesiterable
: input arrayreps
: number of repetitionsaxis
: which axis to repeat along, default isNone
which will flatten the input array and then repeat
np.tile()
: repeat the whole array by n timesiterable
: input arrayreps
: number of repetitions, it can be a tuple to represent repetitions along x-axis and y-axis
# No axis specified, then flatten the input array first and repeat
arr = [[0, 1, 2], [3, 4, 5]]
print(np.repeat(arr, 3))
# An example of repeating along x-axis
arr = [[0, 1, 2], [3, 4, 5]]
print(np.repeat(arr, 3, axis=0))
# An example of repeating along y-axis
arr = [[0, 1, 2], [3, 4, 5]]
print(np.repeat(arr, 3, axis=1))
# Repeat the whole array by a specified number of times
arr = [0, 1, 2]
print(np.tile(arr, 3))
# Repeat along specified axes
print(np.tile(arr, (2,2)))
Create an identity matrix of given size¶
np.eye(size, k=0)
: create an identity matrix of given sizesize
: the size of the identity matrixk
: the diagonal offset
np.identity()
: same asnp.eye()
but does not carry parameters
identity_matrix = np.eye(5)
print(identity_matrix)
# An example of diagonal offset
identity_matrix = np.eye(5, k=-1)
print(identity_matrix)
identity_matrix = np.identity(5)
print(identity_matrix)
Create an array with given values on the diagonal¶
arr = np.random.rand(5,5)
print(arr)
# Extract values on the diagonal
print('Values on the diagonal: ' + str(np.diag(arr)))
# Not necessarily to be a square matrix
arr = np.random.rand(10,3)
print(arr)
# Extract values on the diagonal
print('Values on the diagonal: ' + str(np.diag(arr)))
# Create a matrix given values on the diagonal
# All non-diagonal values set to zeros
arr = np.diag([1,2,3,4,5])
print(arr)
Advantages of NumPy¶
What is the use of arrays over lists, specifically for data analysis? Putting crudely, it is convenience and speed :
- You can write vectorised code on numpy arrays, not on lists, which is convenient to read and write, and concise.
- Numpy is much faster than the standard python ways to do computations.
Vectorised code typically does not contain explicit looping and indexing etc. (all of this happens behind the scenes, in precompiled C-code), and thus it is much more concise.
Let’s see an example of convenience, we’ll see one later for speed.
Say you have two lists of numbers, and want to calculate the element-wise product. The standard python list way would need you to map a lambda function (or worse – write a for
loop), whereas with NumPy, you simply multiply the arrays.
list_1 = [3, 6, 7, 5]
list_2 = [4, 5, 1, 7]
# the list way to do it: map a function to the two lists
product_list = list(map(lambda x, y: x*y, list_1, list_2))
print(product_list)
using array¶
# The numpy array way to do it: simply multiply the two arrays
array_1 = np.array(list_1)
array_2 = np.array(list_2)
array_3 = array_1*array_2
print(array_3)
print(type(array_3))
As you can see, the NumPy way is clearly more concise.
Even simple mathematical operations on lists require for loops, unlike with arrays. For example, to calculate the square of every number in a list:
# Square a list
list_squared = [i**2 for i in list_1]
# Square a numpy array
array_squared = array_1**2
print(list_squared)
print(array_squared)
Compare Computation Times in NumPy and Standard Python Lists¶
We mentioned that the key advantages of numpy are convenience and speed of computation.
You’ll often work with extremely large datasets, and thus it is important point for you to understand how much computation time (and memory) you can save using numpy, compared to standard python lists.
Let’s compare the computation times of arrays and lists for a simple task of calculating the element-wise product of numbers.
list_1 = [i for i in range(10000000)]
list_2 = [j**2 for j in range(10000000)]
import time
# store start time, time after computation, and take the difference
t0 = time.time()
product_list = list(map(lambda x, y: x*y, list_1, list_2))
t1 = time.time()
list_time = t1 - t0
print("Time Taken:",t1-t0)
Using numpy array¶
array_1 = np.array(list_1)
array_2 = np.array(list_2)
t0 = time.time()
array_3 = array_1*array_2
t1 = time.time()
numpy_time = t1 - t0
print("Time Taken:",t1-t0)
print("The ratio of time taken is {}".format(list_time/numpy_time))
In this case, numpy is an order of magnitude faster than lists. This is with arrays of size in millions, but you may work on much larger arrays of sizes in order of billions. Then, the difference is even larger.
Some reasons for such difference in speed are:
- NumPy is written in C, which is basically being executed behind the scenes
- NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists
The following discussions demonstrate the differences in speeds of NumPy and standard python: