4 NumPy

So far we have only used commands that come with the standard Python distribution. However, Python is a very popular language for scientific computing and there are many packages that can be used to extend its capabilities. One of the most fundamental packages for extending Pythons capabilities in this domain is NumPy. NumPy is also the cornerstone for many packages which are important for doing data analysis with Python, including but not limited to Pandas, SciPy, and Matplotlib.

4.1 Installation and Loading

You can install Python packages using the pip command. Another popular package manager is conda. Either of these package managers will do the trick.

Tip

Installing NumPy using pip in rather simple. Just run the following command in your terminal or command prompt:

pip install numpy

Once you have installed the NumPy package, you can load it using the following command.

import numpy as np

We have imported the NumPy package and aliased it as np. This is a common convention when working with NumPy. We can now use the functions and classes provided by NumPy by prefixing them with np.. For example, generating a random number using NumPy would look like this:

np.random.rand(5)

array([0.97880901, 0.04743252, 0.94166398, 0.76965014, 0.43116348])

4.2 What can NumPy do?

One of the key concepts in NumPy are array and matrix data structures. Moreover, Numpy provides tools for working with these structures. The array is in principle quite similar to a list in Python, with a few key differences:

Arrays can be multidimensional
Arrays can only contain elements of the same type, whereas lists can contain elements of different types
Arrays are optimized for numerical operations, whereas lists are not. This makes arrays much faster for numerical operations than lists, and also more memory efficient.

4.3 NumPy Arrays

You can create a 1-D NumPy array from a list using the np.array() function. For example:

a = np.array([1, 2, 3, 4, 5])
a

array([1, 2, 3, 4, 5])

You might have noticed that we actually used a list to create the NumPy array. We can naturally create a NumPy array from a list which has been assigned to a variable. For example, below we will cast a list to an array using the variable my_list:

my_list = [0, 1, 2, 3, 4, 5]
a = np.array(my_list)
print(a)

[0 1 2 3 4 5]

Now accessing the elements of the array is similar to accessing the elements of a list. For example, to access the first element of the array, you can use the following code:

a[0]

This works just like it would for a list:

my_list[0]

You can also use the slice notation to access a range of elements in the array. For example, the following code will access the elements from the second to the fourth element of the array. So in other words from index one to index three (it can be confusing I know).

a[1:4]

array([1, 2, 3])

With arrays you can also do something called broadcasting. This means that you can apply an operation to every element in the array. For example, the following code will multiply every element in the array by 2.

a * 2

array([ 0,  2,  4,  6,  8, 10])

This is not possible with lists as you will see by looking at the example below.

my_list * 2

[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]

You can also apply mathematical functions to the array. For example, the following code will calculate the square of first three elements in the array.

square_first_three = np.square(a[:3])
print(square_first_three)

[0 1 4]

You should be aware that the array is not changed by these operations, as we can see by printing the array.

array([0, 1, 2, 3, 4, 5])

Warning

However, when you broadcast on a slice of an array, the original array is changed. For example, the following code will change the first three elements of the array to their squares.

squared_slice = a[:3]
squared_slice **= 2

print(squared_slice)
print(a)

[0 1 4]
[0 1 4 3 4 5]

You can use the copy() method to create a copy of the array. This way you can change the copy without changing the original array. For example, the following code will create a copy of the array and change the copy without changing the original array.

a_copy = a.copy()
a_copy[0] = 100
print(a_copy)
print(a)

[100   1   4   3   4   5]
[0 1 4 3 4 5]

4.3.1 Other Ways to Create 1-D Arrays

np.array() is not the only way to create a NumPy array. We can also create a one dimensional NumPy array for a range of numbers conveniently using the np.arange() function.

a = np.arange(1, 6)
print(a)

[1 2 3 4 5]

You can also determine a step size for the range of numbers. For example, the following code will create an array with numbers from 0 to 10 with a step size of 2.

a_steps = np.arange(0, 11, 2)
print(a_steps)

[ 0  2  4  6  8 10]

4.3.2 2-D Matrices

You can create a 2-D NumPy array from a list of lists by using the np.array() function.

b = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

You can access elements of a 2-D array using two indices. For example, the following code will access the element in the second row and third column of the array.

# index for the second row and third column
b[1, 2]

There is also a double bracket notation for accessing elements in a 2-D array. For example, the following code will access the element in the second row and third column of the array.

b[1][2]

Either of the two methods will work, but the first method should be more efficient.

4.4 Some convenient functions

Here are some convenient functions that you can use to create NumPy arrays:

np.zeros(): Creates an array of zeros
np.ones(): Creates an array of ones
np.linspace(): Creates an array of evenly spaced numbers over a specified range
np.eye(): Creates an identity matrix

Let’s see some examples.

4.4.1 Zeros and Ones

You can create an array of zeros using the np.zeros() function. For example, the following code will create an array of zeros with 5 elements.

np.zeros(5)

array([0., 0., 0., 0., 0.])

For a 2-D array, you can specify the shape of the array as a tuple. For example, the following code will create a 2-D array of zeros with 3 rows and 4 columns.

np.zeros((3, 4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

Using ones is similar to using zeros. Let’s create a 5 by 6 matrix of ones.

np.ones((5, 6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

4.4.2 Linspace

The np.linspace() function is used to create an array of evenly spaced numbers over a specified range. For example, the following code will create an array of 10 numbers between 0 and 5.

np.linspace(0, 5, 10)

array([0.        , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
       2.77777778, 3.33333333, 3.88888889, 4.44444444, 5.        ])

How does it differ from np.arange() you might ask? The np.linspace() function will always include the start and end values, whereas the np.arange() function will not include the end value.

4.4.3 Eye

This function is used for creating an identity matrix. An identity matrix is a square matrix with ones on the diagonal and zeros elsewhere. We can create a 4 by 4 identity matrix with the following code.

np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

The identity matrix has many important uses in linear algebra and other areas of mathematics.

4.5 NumPy for Random Number Generation

Random numbers are needed for a variety of purposes in data analysis and machine learning. NumPy provides a number of functions for generating random numbers. Here are some of the most commonly used functions:

np.random.rand(): Generates random numbers from a uniform distribution
np.random.randn(): Generates random numbers from a standard normal distribution
np.random.randint(): Generates random integers

These functions allow us to create NumPy arrays with random numbers taken from different distributions.

4.5.1 Uniform Distribution

The uniform distribution refers generally to random numbers between 0 and 1. The np.random.rand() function generates random numbers from a uniform distribution. For example, the following code will generate an array of 5 random numbers between 0 and 1.

np.random.rand(5)

array([0.92448441, 0.90400502, 0.92213011, 0.92029374, 0.74739645])

You can also generate a 2-D array of random numbers. For example, the following code will generate a 3 by 4 array of random numbers.

arr = np.random.rand(3, 4)
arr

array([[0.53633468, 0.80503591, 0.48212329, 0.49865715],
       [0.4211912 , 0.69433433, 0.87418408, 0.23662667],
       [0.14847428, 0.61232323, 0.6030776 , 0.458986  ]])

We can always check the shape of the array using the shape attribute.

arr.shape

(3, 4)

The shape attribute returns a tuple with the dimensions of the array. In this case, the array has 3 rows and 4 columns. The function reshape() allows us to change the shape of the array. For example, we can reshape the array to have 4 rows and 3 columns, or to be one dimensional.

arr.reshape(4, 3)

array([[0.53633468, 0.80503591, 0.48212329],
       [0.49865715, 0.4211912 , 0.69433433],
       [0.87418408, 0.23662667, 0.14847428],
       [0.61232323, 0.6030776 , 0.458986  ]])

arr.reshape(12)

array([0.53633468, 0.80503591, 0.48212329, 0.49865715, 0.4211912 ,
       0.69433433, 0.87418408, 0.23662667, 0.14847428, 0.61232323,
       0.6030776 , 0.458986  ])

You might have noticed, but the two dimensional array has two square brackets on the outer edges, whereas the one dimensional array has only one square bracket per side.

Finally, we can also check the data type of the array using the dtype attribute. In case we want to change the data type of the array, we can use the astype() method.

arr.dtype

dtype('float64')

arr.astype(int)

array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

4.5.2 Normal Distribution

The normal distribution is quite possibly the most important distribution in statistics. The np.random.randn() function generates random numbers from a standard normal distribution. If we want to save ourselves some typing we can import the function directly.

from numpy.random import randn

This allows us to use the function without the np. prefix. Like this:

# create 5 random numbers from a standard normal distribution
randn(5)

array([ 1.16773318, -1.03230255,  1.18685319,  0.27572565, -1.89376463])

The standard normal distribution is centered around zero and has a standard deviation of one. So, if we want to change the mean and standard deviation of our normal distribution, we can multiply the random numbers by the standard deviation and shift the mean by addition. For example, the following code will generate 5 random numbers from a normal distribution with a mean of 10 and a standard deviation of 2.

# multiply by sd and add mean
randn(5) * 2 + 10

array([10.73838116,  8.23491577,  9.79708733, 10.38771634,  9.32476956])

4.5.3 Random Integers

The np.random.randint() function generates random integers. For example, the following code will generate eight random integers between 0 and 10.

from numpy.random import randint

randint(0, 10, 8)

array([7, 7, 9, 4, 0, 3, 3, 0])

If you want the results to be reproducible, you can set the seed using the np.random.seed() function. For example, the following code will generate the same random numbers every time you run it.

np.random.seed(42)
randint(0, 10, 8)

array([6, 3, 7, 4, 6, 9, 2, 6])

The this code draws numbers with replacement from the integers between 0 and 10. If you want to draw numbers without replacement, you can use the choice() function. For example, the following code will draw 5 random numbers without replacement from the integers between 0 and 10.

np.random.choice(10, 5, replace=False)

array([0, 6, 9, 1, 8])

4.6 Array Operations

You can perform element-wise operations on NumPy arrays. For example, you can add two arrays together, subtract one array from another, multiply two arrays, and divide one array by another. Let’s see some examples.

4.6.1 Basic Operations

You can perform basic arithmetic operations on NumPy arrays. For example, the following code will add two arrays together.

a = np.array([1, 2, 3, 4, 5])

a + a

array([ 2,  4,  6,  8, 10])

The same goes for subtraction and multiplication.

# subtraction
a - a

array([0, 0, 0, 0, 0])

# multiplication
a * a

array([ 1,  4,  9, 16, 25])

You can also divide two arrays:

a / a

array([1., 1., 1., 1., 1.])

Scalar operations are also possible. For example, the following code will multiply every element in the array by 2.

a * 2

array([ 2,  4,  6,  8, 10])

You can also add or subtract a scalar from an array.

# addition
a + 2

array([3, 4, 5, 6, 7])

4.6.2 Universal Functions

NumPy provides a number of universal functions that can be applied to arrays. For example, the np.sqrt() function calculates the square root of every element in the array.

np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798])

You can do things like finding the maximum or minimum value in an array.

# maximum value
np.max(a)

This is equivalent to using the max() method.

a.max()

You can also find the index of the maximum value in the array.

np.argmax(a)

Trigonometric functions are also available, such as np.sin(), np.cos(), and np.tan().

np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219])

There are many more universal functions available in NumPy. You can find a list of them in the NumPy documentation. That’s it for NumPy. In the next section, we will look at Pandas, which will introduce us to DataFrames, a powerful data structure for data analysis in Python.