import numpy as np
4 NumPy
So far we have only used commands that come with the standard Python distribution. However, Python is a very popular language for scientific computing and there are many packages that can be used to extend its capabilities. One of the most fundamental packages for extending Pythons capabilities in this domain is NumPy. NumPy is also the cornerstone for many packages which are important for doing data analysis with Python, including but not limited to Pandas, SciPy, and Matplotlib.
4.1 Installation and Loading
You can install Python packages using the pip
command. Another popular package manager is conda
. Either of these package managers will do the trick.
Installing NumPy using pip in rather simple. Just run the following command in your terminal or command prompt:
pip install numpy
Once you have installed the NumPy package, you can load it using the following command.
We have imported the NumPy package and aliased it as np
. This is a common convention when working with NumPy. We can now use the functions and classes provided by NumPy by prefixing them with np.
. For example, generating a random number using NumPy would look like this:
5) np.random.rand(
array([0.97880901, 0.04743252, 0.94166398, 0.76965014, 0.43116348])
4.2 What can NumPy do?
One of the key concepts in NumPy are array and matrix data structures. Moreover, Numpy provides tools for working with these structures. The array is in principle quite similar to a list in Python, with a few key differences:
- Arrays can be multidimensional
- Arrays can only contain elements of the same type, whereas lists can contain elements of different types
- Arrays are optimized for numerical operations, whereas lists are not. This makes arrays much faster for numerical operations than lists, and also more memory efficient.
4.3 NumPy Arrays
You can create a 1-D NumPy array from a list using the np.array()
function. For example:
= np.array([1, 2, 3, 4, 5])
a a
array([1, 2, 3, 4, 5])
You might have noticed that we actually used a list to create the NumPy array. We can naturally create a NumPy array from a list which has been assigned to a variable. For example, below we will cast a list to an array using the variable my_list
:
= [0, 1, 2, 3, 4, 5]
my_list = np.array(my_list)
a print(a)
[0 1 2 3 4 5]
Now accessing the elements of the array is similar to accessing the elements of a list. For example, to access the first element of the array, you can use the following code:
0] a[
0
This works just like it would for a list:
0] my_list[
0
You can also use the slice notation to access a range of elements in the array. For example, the following code will access the elements from the second to the fourth element of the array. So in other words from index one to index three (it can be confusing I know).
1:4] a[
array([1, 2, 3])
With arrays you can also do something called broadcasting. This means that you can apply an operation to every element in the array. For example, the following code will multiply every element in the array by 2.
* 2 a
array([ 0, 2, 4, 6, 8, 10])
This is not possible with lists as you will see by looking at the example below.
* 2 my_list
[0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5]
You can also apply mathematical functions to the array. For example, the following code will calculate the square of first three elements in the array.
= np.square(a[:3])
square_first_three print(square_first_three)
[0 1 4]
You should be aware that the array is not changed by these operations, as we can see by printing the array.
a
array([0, 1, 2, 3, 4, 5])
However, when you broadcast on a slice of an array, the original array is changed. For example, the following code will change the first three elements of the array to their squares.
= a[:3]
squared_slice **= 2
squared_slice
print(squared_slice)
print(a)
[0 1 4]
[0 1 4 3 4 5]
You can use the copy()
method to create a copy of the array. This way you can change the copy without changing the original array. For example, the following code will create a copy of the array and change the copy without changing the original array.
= a.copy()
a_copy 0] = 100
a_copy[print(a_copy)
print(a)
[100 1 4 3 4 5]
[0 1 4 3 4 5]
4.3.1 Other Ways to Create 1-D Arrays
np.array()
is not the only way to create a NumPy array. We can also create a one dimensional NumPy array for a range of numbers conveniently using the np.arange()
function.
= np.arange(1, 6)
a print(a)
[1 2 3 4 5]
You can also determine a step size for the range of numbers. For example, the following code will create an array with numbers from 0 to 10 with a step size of 2.
= np.arange(0, 11, 2)
a_steps print(a_steps)
[ 0 2 4 6 8 10]
4.3.2 2-D Matrices
You can create a 2-D NumPy array from a list of lists by using the np.array()
function.
= np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b b
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can access elements of a 2-D array using two indices. For example, the following code will access the element in the second row and third column of the array.
# index for the second row and third column
1, 2] b[
6
There is also a double bracket notation for accessing elements in a 2-D array. For example, the following code will access the element in the second row and third column of the array.
1][2] b[
6
Either of the two methods will work, but the first method should be more efficient.
4.4 Some convenient functions
Here are some convenient functions that you can use to create NumPy arrays:
np.zeros()
: Creates an array of zerosnp.ones()
: Creates an array of onesnp.linspace()
: Creates an array of evenly spaced numbers over a specified rangenp.eye()
: Creates an identity matrix
Let’s see some examples.
4.4.1 Zeros and Ones
You can create an array of zeros using the np.zeros()
function. For example, the following code will create an array of zeros with 5 elements.
5) np.zeros(
array([0., 0., 0., 0., 0.])
For a 2-D array, you can specify the shape of the array as a tuple. For example, the following code will create a 2-D array of zeros with 3 rows and 4 columns.
3, 4)) np.zeros((
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
Using ones is similar to using zeros. Let’s create a 5 by 6 matrix of ones.
5, 6)) np.ones((
array([[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]])
4.4.2 Linspace
The np.linspace()
function is used to create an array of evenly spaced numbers over a specified range. For example, the following code will create an array of 10 numbers between 0 and 5.
0, 5, 10) np.linspace(
array([0. , 0.55555556, 1.11111111, 1.66666667, 2.22222222,
2.77777778, 3.33333333, 3.88888889, 4.44444444, 5. ])
How does it differ from np.arange()
you might ask? The np.linspace()
function will always include the start and end values, whereas the np.arange()
function will not include the end value.
4.4.3 Eye
This function is used for creating an identity matrix. An identity matrix is a square matrix with ones on the diagonal and zeros elsewhere. We can create a 4 by 4 identity matrix with the following code.
4) np.eye(
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
The identity matrix has many important uses in linear algebra and other areas of mathematics.
4.5 NumPy for Random Number Generation
Random numbers are needed for a variety of purposes in data analysis and machine learning. NumPy provides a number of functions for generating random numbers. Here are some of the most commonly used functions:
np.random.rand()
: Generates random numbers from a uniform distributionnp.random.randn()
: Generates random numbers from a standard normal distributionnp.random.randint()
: Generates random integers
These functions allow us to create NumPy arrays with random numbers taken from different distributions.
4.5.1 Uniform Distribution
The uniform distribution refers generally to random numbers between 0 and 1. The np.random.rand()
function generates random numbers from a uniform distribution. For example, the following code will generate an array of 5 random numbers between 0 and 1.
5) np.random.rand(
array([0.92448441, 0.90400502, 0.92213011, 0.92029374, 0.74739645])
You can also generate a 2-D array of random numbers. For example, the following code will generate a 3 by 4 array of random numbers.
= np.random.rand(3, 4)
arr arr
array([[0.53633468, 0.80503591, 0.48212329, 0.49865715],
[0.4211912 , 0.69433433, 0.87418408, 0.23662667],
[0.14847428, 0.61232323, 0.6030776 , 0.458986 ]])
We can always check the shape of the array using the shape
attribute.
arr.shape
(3, 4)
The shape
attribute returns a tuple with the dimensions of the array. In this case, the array has 3 rows and 4 columns. The function reshape()
allows us to change the shape of the array. For example, we can reshape the array to have 4 rows and 3 columns, or to be one dimensional.
4, 3) arr.reshape(
array([[0.53633468, 0.80503591, 0.48212329],
[0.49865715, 0.4211912 , 0.69433433],
[0.87418408, 0.23662667, 0.14847428],
[0.61232323, 0.6030776 , 0.458986 ]])
12) arr.reshape(
array([0.53633468, 0.80503591, 0.48212329, 0.49865715, 0.4211912 ,
0.69433433, 0.87418408, 0.23662667, 0.14847428, 0.61232323,
0.6030776 , 0.458986 ])
You might have noticed, but the two dimensional array has two square brackets on the outer edges, whereas the one dimensional array has only one square bracket per side.
Finally, we can also check the data type of the array using the dtype
attribute. In case we want to change the data type of the array, we can use the astype()
method.
arr.dtype
dtype('float64')
int) arr.astype(
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
4.5.2 Normal Distribution
The normal distribution is quite possibly the most important distribution in statistics. The np.random.randn()
function generates random numbers from a standard normal distribution. If we want to save ourselves some typing we can import the function directly.
from numpy.random import randn
This allows us to use the function without the np.
prefix. Like this:
# create 5 random numbers from a standard normal distribution
5) randn(
array([ 1.16773318, -1.03230255, 1.18685319, 0.27572565, -1.89376463])
The standard normal distribution is centered around zero and has a standard deviation of one. So, if we want to change the mean and standard deviation of our normal distribution, we can multiply the random numbers by the standard deviation and shift the mean by addition. For example, the following code will generate 5 random numbers from a normal distribution with a mean of 10 and a standard deviation of 2.
# multiply by sd and add mean
5) * 2 + 10 randn(
array([10.73838116, 8.23491577, 9.79708733, 10.38771634, 9.32476956])
4.5.3 Random Integers
The np.random.randint()
function generates random integers. For example, the following code will generate eight random integers between 0 and 10.
from numpy.random import randint
0, 10, 8) randint(
array([7, 7, 9, 4, 0, 3, 3, 0])
If you want the results to be reproducible, you can set the seed using the np.random.seed()
function. For example, the following code will generate the same random numbers every time you run it.
42)
np.random.seed(0, 10, 8) randint(
array([6, 3, 7, 4, 6, 9, 2, 6])
The this code draws numbers with replacement from the integers between 0 and 10. If you want to draw numbers without replacement, you can use the choice()
function. For example, the following code will draw 5 random numbers without replacement from the integers between 0 and 10.
10, 5, replace=False) np.random.choice(
array([0, 6, 9, 1, 8])
4.6 Array Operations
You can perform element-wise operations on NumPy arrays. For example, you can add two arrays together, subtract one array from another, multiply two arrays, and divide one array by another. Let’s see some examples.
4.6.1 Basic Operations
You can perform basic arithmetic operations on NumPy arrays. For example, the following code will add two arrays together.
= np.array([1, 2, 3, 4, 5])
a
+ a a
array([ 2, 4, 6, 8, 10])
The same goes for subtraction and multiplication.
# subtraction
- a a
array([0, 0, 0, 0, 0])
# multiplication
* a a
array([ 1, 4, 9, 16, 25])
You can also divide two arrays:
/ a a
array([1., 1., 1., 1., 1.])
Scalar operations are also possible. For example, the following code will multiply every element in the array by 2.
* 2 a
array([ 2, 4, 6, 8, 10])
You can also add or subtract a scalar from an array.
# addition
+ 2 a
array([3, 4, 5, 6, 7])
4.6.2 Universal Functions
NumPy provides a number of universal functions that can be applied to arrays. For example, the np.sqrt()
function calculates the square root of every element in the array.
np.sqrt(a)
array([1. , 1.41421356, 1.73205081, 2. , 2.23606798])
You can do things like finding the maximum or minimum value in an array.
# maximum value
max(a) np.
5
This is equivalent to using the max()
method.
max() a.
5
You can also find the index of the maximum value in the array.
np.argmax(a)
4
Trigonometric functions are also available, such as np.sin()
, np.cos()
, and np.tan()
.
np.cos(a)
array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362, 0.28366219])
There are many more universal functions available in NumPy. You can find a list of them in the NumPy documentation. That’s it for NumPy. In the next section, we will look at Pandas, which will introduce us to DataFrames, a powerful data structure for data analysis in Python.