Many data science workflows in Python include the NumPy library as it is a central building block in the Python data ecosystem. NumPy arrays are also utilized in EarthAI and can be used to generate Pandas DataFrames. In this article, we provide a quick primer on NumPy arrays. This primer can be downloaded as a Python Notebook from the link provided at the end of the article.
NumPy is a linear algebra library whose main data structure is the n-dimensional array (ndarray). Unlike a list in Python, ndarrays are of one data type (for example, either all integer or all floats).
Before diving into creating arrays we need to import the NumPy library into our workspace. NumPy is included in the EarthAI environment, so there is no need to install it. By common convention numpy is aliased as np when importing. This is nothing more than to save typing a few characters here and there.
import numpy as np
First, we can create a NumPy array from a common Python list:
my_python_list = [1, 2, 3] my_python_list
my_numpy_array = np.array(my_python_list) my_numpy_array
type(my_python_list)
type(my_numpy_array)
While both objects are a sequence of the same three integers one is of type list and the other is of type numpy.ndarray. So why would you want to use a NumPy array over a simple list? The two main reasons are that operations on NumPy arrays are much faster than on lists as NumPy is built on top of the fast C language. The other reason is that NumPy arrays make more efficient use of memory storage. For a simple three element data structure these differences are negligible. But when using much larger datasets, i.e. imagery, the differences can start to become very apparent and important.
We can construct a matrix with a list of lists:
my_python_matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] # simply a list of lists
my_new_array = np.array(my_python_matrix) my_new_array
my_new_array.shape
NumPy also provides numerous ways to create arrays without needing to create Python lists and lists of lists first.
np.arange(0, 100) # constructs an array of sequential integers between 0, and 99.
np.arange(0, 100).reshape(10, 10) # this creates the array above but reshapes it to a 10 x 10 matrix.
np.linspace(0, 20, 50) # returns an array of 50 evenly spaced elements between 0 and 20.
There are also a number of random number generators available in NumPy:
np.random.rand(5) # produces an array of 5 random numbers from a uniform distribution between 0 and 1.
np.random.randn(5) # produces an array of 5 random numbers from a standard normal distribution.
my_random_array = np.random.randint(0, 1000, 10) # produces 10 random integers over the range 0 to 1000. my_random_array
Finally, there are methods for finding the minimum, maximum and indices (array locations) of values in NumPy arrays. These demonstrate that once a NumPy object is created there are a multitude of methods that can be applied to the object.
my_random_array.max()
my_random_array.min()
my_random_array.argmax() # the index of the max value (starting from 0)
my_random_array.argmin() # the index of the min value (starting from 0)
Comments
0 comments
Please sign in to leave a comment.