This article describes the steps for querying imagery data in an EarthAI Notebook using the EarthAI Catalog API.
First, log into EarthAI Notebook and open a new Notebook. As a first step we import the necessary libraries by running:
from earthai.init import *
You will see dialog generated mentioning importing EarthAI libraries and creating a SparkSession.
Running the next code cell shows what image collections are available through the Earth OnDemand API.
Executing this cell returns a DataFrame listing all of the collections available through the API. Your list may look slightly different as collections and their names can change over time. The column "allowed" indicates if access to a collection is available to the user. At this time the EarthAI Catalog includes only publicly available datasets, so all of the collections listed here are available. Other columns give descriptions of the collection, names, id, links to the data, spatial extent, and temporal extent.
These collections may look familiar as they are the same data sets available through Earth OnDemand. As we'll show below, making queries through the EarthAI Catalog API is an alternative, but largely equivalent method of loading data into the EarthAI Notebook environment that utilizes code in notebook cells as opposed to the graphical user interface (GUI) of Earth OnDemand. In both cases, queries can be made on the same data sets and made available for analysis in the notebook environment.
The code below queries the EarthAI Catalog using basically the same parameters as the Yellowstone example using the Earth OnDemand GUI, and assigns it to the variable catalog. Remember: the collection name may have changed slightly. We'll be looking at a location within the Yellowstone region, using Landsat 8, maximum cloud cover of 10%, over the month of August 2018. The primary difference is that the Earth OnDemand query returns the coordinates of a bounding box, whereas here we specified a single longitude-latitude point within the region of interest:
catalog = earth_ondemand.read_catalog(
By convention the query is assigned the variable name catalog but this is entirely at the user's discretion. This
read_catalog()function returns a GeoDataFrame with the results of the query (you can confirm this by running
type(catalog)). You can run
?earth_ondemand.read_catalog in a cell to get more information about the parameters that the
read_catalog() function can take.
Next we want to see what exactly got returned from our query in the catalog object. This is as simple as running
catalog in a new cell.
The geometry attribute is a very long string which makes the output in the cell difficult to read. To alleviate this we can drop the geometry column from what is displayed by running
catalog.drop(columns='geometry'). This displays the GeoDataFrame from the query, but now in a more readable view as the geometry column is now absent.
There are two rows that correspond to the Landsat 8 scenes that satisfy the query parameters. This is similar to the query results obtained using Earth OnDemand, which also returned two scenes.