8. Introduction to NumPy#

The content of this section is derived from the book “Python data science hanbdbook”. Details of this book can be found in the further reading section.

NumPy is a scientific computing library and is the defacto library for numerical analysis and linear algebra in python i.e. it is “__Nu__merical __Py__thon”. It is a bit like Matlab for python. One of its primary features is the provision of efficient array data structures which are not particularly well supported in base python.

8.1. Arrays using Python Lists#

An array like structure can be created in python using lists.

8.1.1. Example 1#

A = [[1,2,3],[4,5,6],[7,8,9],[10,11,12]]
print(A)
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
A[1][2]
6

In python, arrays represented as lists or, in the multidimensianal setting, lists of lists of lists …, have the advantage of being inhomogenous. However, when compared to arrays in other languages (which are commonly homogenous), they are not very efficient for random access.

The following diagram shows a one dimensional array and the equivalent one dimensional list. Because an array is contiguous in memory, (no gaps in between successive data elements) it is possible to access the data very quickly using simple arithmetic. However, in a list, to get to value 17, you would have to start at 23 and march forward 2 places visiting 3 on the way. As the list grows, this repeated starting at the beginning and moving on becomes very expensive.

Python does come with an array type, but it is not particularly easy to use and is not well optimized for some of the more common array like operations.

array arrays vs lists list

source

8.2. NumPy arrays#

Installing numpy.

! python -m pip install numpy
Requirement already satisfied: numpy in /home/grosedj/work/work-env/env/lib/python3.9/site-packages (1.23.2)
WARNING: You are using pip version 21.2.4; however, version 23.3.2 is available.
You should consider upgrading via the '/home/grosedj/work/work-env/env/bin/python -m pip install --upgrade pip' command.

Importing the numpy package (note the use of an the “alias” np).

import numpy as np

8.2.1. Exercise 1#

Install and import the numpy package.

8.3. Why NumPy Instead Of Python Lists?#

In general, there seem to be four possible reasons to prefer NumPy arrays over lists in Python:

  • NumPy arrays are more compact than lists.

  • access in reading and writing items is faster with NumPy.

  • NumPy can be more convenient to work with, thanks to the fact that you get a lot of vector and matrix operations for free

  • NumPy can be more efficient to work with because they are implemented more efficiently.

8.4. Creating and acessing NumPy arrays#

8.4.1. Example 2 - creating NumPy array from a list#

X = np.array([[1,2,3],[4,5,6]])
print(X)
[[1 2 3]
 [4 5 6]]

8.4.2. Exercise 2 - indexing tnto a NumPy array#

What do you think the output of the following code will be ?

print(X[1,0])
print(X[1])
print(X[1][0])
X[0,2] = 10
print(X)
4
[4 5 6]
4
[[ 1  2 10]
 [ 4  5  6]]

8.4.3. Exercise 3#

Use help function to find out about the following functions that can be used for initialising NumPy arrays.

  • ones

  • zeros

  • empty

help(np.empty)
Help on built-in function empty in module numpy:

empty(...)
    empty(shape, dtype=float, order='C', *, like=None)
    
    Return a new array of given shape and type, without initializing entries.
    
    Parameters
    ----------
    shape : int or tuple of int
        Shape of the empty array, e.g., ``(2, 3)`` or ``2``.
    dtype : data-type, optional
        Desired output data-type for the array, e.g, `numpy.int8`. Default is
        `numpy.float64`.
    order : {'C', 'F'}, optional, default: 'C'
        Whether to store multi-dimensional data in row-major
        (C-style) or column-major (Fortran-style) order in
        memory.
    like : array_like, optional
        Reference object to allow the creation of arrays which are not
        NumPy arrays. If an array-like passed in as ``like`` supports
        the ``__array_function__`` protocol, the result will be defined
        by it. In this case, it ensures the creation of an array object
        compatible with that passed in via this argument.
    
        .. versionadded:: 1.20.0
    
    Returns
    -------
    out : ndarray
        Array of uninitialized (arbitrary) data of the given shape, dtype, and
        order.  Object arrays will be initialized to None.
    
    See Also
    --------
    empty_like : Return an empty array with shape and type of input.
    ones : Return a new array setting values to one.
    zeros : Return a new array setting values to zero.
    full : Return a new array of given shape filled with value.
    
    
    Notes
    -----
    `empty`, unlike `zeros`, does not set the array values to zero,
    and may therefore be marginally faster.  On the other hand, it requires
    the user to manually set all the values in the array, and should be
    used with caution.
    
    Examples
    --------
    >>> np.empty([2, 2])
    array([[ -9.74499359e+001,   6.69583040e-309],
           [  2.13182611e-314,   3.06959433e-309]])         #uninitialized
    
    >>> np.empty([2, 2], dtype=int)
    array([[-1073741821, -1067949133],
           [  496041986,    19249760]])                     #uninitialized

8.4.4. Exercise 4 - creating a NumPy array from scratch#

Create a three dimensional numpy array with dimension 3,5,2 that contains floating point numbers. What do the contents look like ? How does this compare with using zeros ? How you create the same array but for use with integer values ?

X = np.empty((3,5,2))
print(X[0])
X = np.empty((3,5,2),dtype="int")
print(X[0])
[[4.67592250e-310 0.00000000e+000]
 [4.67648006e-310 0.00000000e+000]
 [0.00000000e+000 0.00000000e+000]
 [4.94065646e-324 5.20521812e+223]
 [4.19338351e+228 0.00000000e+000]]
[[ 94641744298500               0]
 [140290495184816 140290496109296]
 [140290495589952 140290468620928]
 [140290468621008 140290468621088]
 [140290468607168 140290494606832]]

NumPy has many more basic types of data than base python. They typically map to the basic C types that they are built on.

Numpy type

C type

Description

numpy.bool_

bool

Boolean (True or False) stored as a byte

numpy.byte

signed char

Platform-defined

numpy.ubyte

unsigned char

Platform-defined

numpy.short

short

Platform-defined

numpy.ushort

unsigned short

Platform-defined

numpy.intc

int

Platform-defined

numpy.uintc

unsigned int

Platform-defined

numpy.int_

long

Platform-defined

numpy.uint

unsigned long

Platform-defined

numpy.longlong

long long

Platform-defined

numpy.ulonglong

unsigned long long

Platform-defined

numpy.half / numpy.float16

Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

numpy.single

float

Platform-defined single precision float: typically sign bit, 8 bits exponent, 23 bits mantissa

numpy.double

double

Platform-defined double precision float: typically sign bit, 11 bits exponent, 52 bits mantissa.

numpy.longdouble

long double

Platform-defined extended-precision float

numpy.csingle

float complex

Complex number, represented by two single-precision floats (real and imaginary components)

numpy.cdouble

double complex

Complex number, represented by two double-precision floats (real and imaginary components).

numpy.clongdouble

long double complex

Complex number, represented by two extended-precision floats (real and imaginary components).

8.5. Slicing and accessing subarrays#

Subarrays can be access using the format [start:stop:step].

8.5.1. Example 3 - slicing#

X = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
print(X[0:2:1])
[[1 2 3 4]
 [5 6 7 8]]

8.5.2. Exercise 5#

Predict the output of the following code.

print(X[0:3:2])
[[ 1  2  3  4]
 [ 9 10 11 12]]
print(X[0:3:1][:2:1])
[[1 2 3 4]
 [5 6 7 8]]
print(X[0:3:1][:2:2])
[[1 2 3 4]]

print(X[:3:][:2:2])
[[1 2 3 4]]
print(X[:3:,:2:2])
[[1]
 [5]
 [9]]
print(X[::2,::2])
[[ 1  3]
 [ 9 11]]
print(X[1:3:,1:3:])
[[ 6  7]
 [10 11]]
print(X[:,1])
[ 2  6 10 14]
print(X[1,:])
[5 6 7 8]
Z = X[1:3:,1:3:]
Z[1,1] = 20
print(X)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 20 12]
 [13 14 15 16]]

Note that in the last example, Z is NOT a copy of the data inside X, it is a “view” on the data contained in X. You can create a copy using copy.

8.5.3. Example 4 - copying#

Z = X[1:3:,1:3:].copy()
Z[1,1] = 50
print(X)
print(Z)
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 20 12]
 [13 14 15 16]]
[[ 6  7]
 [10 50]]

8.5.4. Exercise 6#

Use help to find out how to use the NumPy reshape, concatenate, vstack, and hstack functions.

help(np.concatenate)
Help on function concatenate in module numpy:

concatenate(...)
    concatenate((a1, a2, ...), axis=0, out=None, dtype=None, casting="same_kind")
    
    Join a sequence of arrays along an existing axis.
    
    Parameters
    ----------
    a1, a2, ... : sequence of array_like
        The arrays must have the same shape, except in the dimension
        corresponding to `axis` (the first, by default).
    axis : int, optional
        The axis along which the arrays will be joined.  If axis is None,
        arrays are flattened before use.  Default is 0.
    out : ndarray, optional
        If provided, the destination to place the result. The shape must be
        correct, matching that of what concatenate would have returned if no
        out argument were specified.
    dtype : str or dtype
        If provided, the destination array will have this dtype. Cannot be
        provided together with `out`.
    
        .. versionadded:: 1.20.0
    
    casting : {'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional
        Controls what kind of data casting may occur. Defaults to 'same_kind'.
    
        .. versionadded:: 1.20.0
    
    Returns
    -------
    res : ndarray
        The concatenated array.
    
    See Also
    --------
    ma.concatenate : Concatenate function that preserves input masks.
    array_split : Split an array into multiple sub-arrays of equal or
                  near-equal size.
    split : Split array into a list of multiple sub-arrays of equal size.
    hsplit : Split array into multiple sub-arrays horizontally (column wise).
    vsplit : Split array into multiple sub-arrays vertically (row wise).
    dsplit : Split array into multiple sub-arrays along the 3rd axis (depth).
    stack : Stack a sequence of arrays along a new axis.
    block : Assemble arrays from blocks.
    hstack : Stack arrays in sequence horizontally (column wise).
    vstack : Stack arrays in sequence vertically (row wise).
    dstack : Stack arrays in sequence depth wise (along third dimension).
    column_stack : Stack 1-D arrays as columns into a 2-D array.
    
    Notes
    -----
    When one or more of the arrays to be concatenated is a MaskedArray,
    this function will return a MaskedArray object instead of an ndarray,
    but the input masks are *not* preserved. In cases where a MaskedArray
    is expected as input, use the ma.concatenate function from the masked
    array module instead.
    
    Examples
    --------
    >>> a = np.array([[1, 2], [3, 4]])
    >>> b = np.array([[5, 6]])
    >>> np.concatenate((a, b), axis=0)
    array([[1, 2],
           [3, 4],
           [5, 6]])
    >>> np.concatenate((a, b.T), axis=1)
    array([[1, 2, 5],
           [3, 4, 6]])
    >>> np.concatenate((a, b), axis=None)
    array([1, 2, 3, 4, 5, 6])
    
    This function will not preserve masking of MaskedArray inputs.
    
    >>> a = np.ma.arange(3)
    >>> a[1] = np.ma.masked
    >>> b = np.arange(2, 5)
    >>> a
    masked_array(data=[0, --, 2],
                 mask=[False,  True, False],
           fill_value=999999)
    >>> b
    array([2, 3, 4])
    >>> np.concatenate([a, b])
    masked_array(data=[0, 1, 2, 2, 3, 4],
                 mask=False,
           fill_value=999999)
    >>> np.ma.concatenate([a, b])
    masked_array(data=[0, --, 2, 2, 3, 4],
                 mask=[False,  True, False, False, False, False],
           fill_value=999999)
The shape of a NumPy array can be found quite easily.
  Cell In [22], line 1
    The shape of a NumPy array can be found quite easily.
        ^
SyntaxError: invalid syntax
print(X.shape)
(4, 4)

8.6. Random numbers.#

The NumPy library is more than an array library. For example, it offers random number generation.

8.6.1. Example 5 - uniform random numbers#

np.random.rand(3,2)
array([[0.3150833 , 0.33404232],
       [0.44625496, 0.54454593],
       [0.56114458, 0.14973511]])

8.6.2. Exercise 7#

Use help(np.random)) to find out about random number generation in NumPy. Generate large matrices of random numbers from some of these distributions.

help(np.random.rand)
Help on built-in function rand:

rand(...) method of numpy.random.mtrand.RandomState instance
    rand(d0, d1, ..., dn)
    
    Random values in a given shape.
    
    .. note::
        This is a convenience function for users porting code from Matlab,
        and wraps `random_sample`. That function takes a
        tuple to specify the size of the output, which is consistent with
        other NumPy functions like `numpy.zeros` and `numpy.ones`.
    
    Create an array of the given shape and populate it with
    random samples from a uniform distribution
    over ``[0, 1)``.
    
    Parameters
    ----------
    d0, d1, ..., dn : int, optional
        The dimensions of the returned array, must be non-negative.
        If no argument is given a single Python float is returned.
    
    Returns
    -------
    out : ndarray, shape ``(d0, d1, ..., dn)``
        Random values.
    
    See Also
    --------
    random
    
    Examples
    --------
    >>> np.random.rand(3,2)
    array([[ 0.14022471,  0.96360618],  #random
           [ 0.37601032,  0.25528411],  #random
           [ 0.49313049,  0.94909878]]) #random
X = np.random.rand(1000,1000)
print(X[1:10,1:10])
[[0.93725918 0.06203331 0.83580012 0.03160988 0.7798329  0.30640705
  0.7414987  0.99160143 0.14266476]
 [0.95206452 0.14178445 0.09118578 0.17457443 0.76104316 0.28949847
  0.29942323 0.07608373 0.05289887]
 [0.25530272 0.66924424 0.28340527 0.71735581 0.30488826 0.30623068
  0.47408782 0.70436219 0.58647558]
 [0.26812957 0.87237088 0.51070945 0.53435651 0.73995196 0.75998788
  0.58319684 0.65992788 0.70158033]
 [0.24103802 0.19956387 0.74662767 0.84646724 0.12328907 0.26932249
  0.38748918 0.72313705 0.40867608]
 [0.63366512 0.05276984 0.1132461  0.08810961 0.320329   0.31010316
  0.64176265 0.22798977 0.95000039]
 [0.45652143 0.00195218 0.2580788  0.87617122 0.41812466 0.96003646
  0.85651487 0.51288647 0.98163307]
 [0.7487736  0.90593057 0.34325887 0.79687955 0.57725936 0.76379036
  0.73959716 0.75815155 0.50633209]
 [0.60686795 0.98670865 0.01420011 0.23918258 0.77386063 0.12878113
  0.37959095 0.14720258 0.7509649 ]]

8.6.3. Exercise 8#

Write a function that calculates the sine of each value one of the large matrices you generated for excercise 7.

from math import sin
def matsin(X) :
    (n,m) = X.shape
    Y = np.empty((n,m))
    for i in range(n) :
        for j in range(m) :
            Y[i,j] = sin(X[i,j])
    return(Y)

Y = matsin(X)
print(Y[1:10,1:10])
[[0.80593856 0.06199353 0.7418333  0.03160461 0.70316061 0.30163499
  0.6753939  0.8369036  0.14218131]
 [0.81461467 0.14130988 0.09105946 0.17368905 0.68967719 0.2854716
  0.29496915 0.07601035 0.0528742 ]
 [0.25253833 0.62039343 0.27962669 0.65739445 0.30018659 0.30146683
  0.45652706 0.64754793 0.55342899]
 [0.26492831 0.76585559 0.48879629 0.50928736 0.67425243 0.68891266
  0.55069517 0.61305988 0.64542558]
 [0.23871077 0.19824188 0.67916739 0.74894416 0.12297697 0.26607841
  0.37786494 0.66173988 0.39739478]
 [0.59210231 0.05274536 0.1130042  0.08799565 0.31487884 0.30515688
  0.59860833 0.22601977 0.81341573]
 [0.44082845 0.00195218 0.25522345 0.76829373 0.40604739 0.81921248
  0.75556414 0.49069436 0.83140592]
 [0.6807409  0.78699961 0.33655763 0.71517857 0.54572944 0.69166388
  0.67399037 0.68758045 0.48497282]
 [0.57029747 0.83421552 0.01419964 0.23690855 0.69890163 0.12842546
  0.37054057 0.14667154 0.68234445]]
It is possibe to measure how long code takes to run using %timeit.

8.6.4. Example 6 - %timeit#

%timeit matsin(X)
250 ms ± 6.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

8.7. NumPy UFuncs#

NumPy’s ufuncs can be used to make repeated calculations on array elements much more efficient. ufuncs apply use vectorization to undertake large numbers of operations using fast compiled C code without having to call the underlying C functions repeatedly. This can make things much faster.

8.7.1. Exercise 9#

Use %timeit to find out how long the following python code takes to run. Compare it to the reuslts you obtained from exercise 8.

%timeit Y = np.sin(X)
8.88 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

There are lots of ufuncs in NumPy. They include various operators for example.

8.7.2. Example 7#

Y = X + X
Y = 2*X
Y = X*X

Note that the * is NOT matrix multiplication. For this you have to use the matmul function.