Raster and vector data files can be loaded into Python environments (such as EarthAI Notebook) using several different geospatial libraries.
DataFrames are a common data structure in Python data science workflows and are critical in working with Earth observation imagery data in EarthAI. A DataFrame is a two-dimensional data structure aligned in a tabular fashion with labeled rows and columns. A spreadsheet can be thought of as a type of DataFrame (or vice versa).
While DataFrames are not a native data type in Python, they can be constructed and manipulated through other Python libraries. The three types of DataFrames that are the most commonly used in EarthAI workflows are Pandas, GeoPandas, and Spark DataFrames.
Pandas is perhaps the most straightforward way to utilize DataFrames in Python; however, Pandas does not directly handle spatial operations on geometric data types. GeoPandas DataFrames are an extension of Pandas DataFrames with an additional geometry column and can be used to represent vector data in Python. We demonstrate the use of GeoPandas in more detail in this related article.
Spark DataFrames allow for distributed computing across multiple machines in big data environments where DataFrames are too large to fit into the memory of a user's single machine, which is not uncommon with imagery data analysis. While it does not natively handle geospatial data, Astraea's RasterFrames project extends Spark DataFrames to enable raster data operations. Spark DataFrames and RasterFrames are described in more detail in another article.
Comments
0 comments
Please sign in to leave a comment.