This example demonstrates how to write out the contents of a Spark DataFrame to a GeoJSON file. This is useful for viewing columns of type geometry in your RasterFrame in an external GIS software.
In this example, we run through the steps of 1) acquiring imagery scenes from the EarthAI Catalog, 2) using RasterFrames to read imagery, and 3) writing a RasterFrame to GeoJSON.
Import Libraries
We will start by importing all of the Python libraries used in this example.
from earthai.init import * import earthai.chipping.strategy from shapely.geometry import Polygon import pyspark.sql.functions as F import ipyleaflet import geopandas
Query the EarthAI Catalog
We read in a GeoJSON file containing U.S. state boundaries and filter the GeoDataFrame to the continental U.S. The chip reader requires that the geometries passed to it are of type Polygon, so we convert the MultiPolygon representing Virginia to a Polygon.
We use the geometry column in the GeoDataFrame to query the EarthAI catalog for MODIS surface reflectance data from September 1, 2020 covering the continental United States.
states_url ='https://raw.githubusercontent.com/datasets/geo-admin1-us/master/data/admin1-us.geojson' states_gdf = geopandas.read_file(states_url) states_gdf = states_gdf[~(states_gdf["name"].isin(['Hawaii', 'Alaska']))] # convert MultiPolygon to Polygon in Virginia va_polygons = list(states_gdf[states_gdf.name == 'Virginia'].iloc[0].geometry) states_gdf.loc[states_gdf.name == 'Virginia', ['geometry']] = va_polygons[1] cat = earth_ondemand.read_catalog(states_gdf.geometry, start_datetime='2020-09-01', end_datetime='2020-09-01', collections='mcd43a4')
Read in MODIS Imagery
We join the catalog back to the GeoDataFrame containing state boundaries in order to match the state boundary to the intersecting image scene. This step is critical for use of the chip reader since the chipping strategy needs the state boundary polygon.
cat = geopandas.sjoin(cat, states_gdf, how='right').rename(columns={"geometry":"us_bounds"})
We use spark.read.chip
to read only the imagery intersecting state boundaries. We read in the B01 (red) band. The feature-aligned grid strategy creates a grid across each state boundary polygon using the specified tile_dimensions
, and returns the generated chips.
To view all of the available bands for the MODIS collection, you can run earth_ondemand.item_assets('mcd43a4')
.
rf = spark.read.chip(cat, catalog_col_names=['B01'], geometry_col_name='us_bounds', chipping_strategy=earthai.chipping.strategy.FeatureAlignedGrid(256), tile_dimensions=(256,256))
Write a RasterFrame to GeoJSON
We use Geopandas to write out our data as GeoJSON because GeoPandas has a built-in method to write out a GeoDataFrame as a GeoJSON file.
The expected coordinate reference system (CRS) for GeoJSON objects is "EPSG:4326", so as a first step, we reproject our chip outlines to this CRS.
rf = rf.select(F.col('B01').alias('red')) \ .withColumn('chip_outline_4326', st_reproject(rf_geometry('red'), rf_crs('red'), F.lit('EPSG:4326')))
Then, we select chip_outline_4326 and transform our RasterFrame to a Pandas DataFrame using toPandas()
.
chips_df = rf.select('chip_outline_4326').toPandas()
Next, we transform our Pandas DataFrame, chips_df, to a GeoPandas GeoDataFrame, chips_gdf, and specify a geometry
and crs
column.
chips_gdf = geopandas.GeoDataFrame(chips_df, geometry='chip_outline_4326', crs='EPSG:4326')
Finally, we write our GeoDataFrame as a GeoJSON using the built-in to_file
function and specifying "GeoJSON" as the driver
.
chips_gdf.to_file("chip_boundaries.json", driver="GeoJSON")
chip_boundaries.json will be written out to the same directory where your notebook resides. You can right click on the file in the left menu and select Download to save the file to your local machine for viewing in an external GIS software.
Comments
0 comments
Please sign in to leave a comment.