In a previous article, we introduced the spark.read.chip
function for reading in subsets of scenes from Earth observation data. In this article, we demonstrate the different chipping strategies available with the spark.read.chip
function. These strategies were designed to address some of the most common image processing and machine learning use patterns.
Note: if you would like to run through this example in EarthAI Notebook, you can download the companion notebook and vector data source from the attachments provided at the end of this article.
Import Libraries
from earthai.init import * import geopandas import earthai.chipping.strategy import pyspark.sql.functions as F import ipyleaflet
Query Imagery at STEP Sites
In a previous article, we introduced the System for Terrestrial Ecosystem Parameterization (STEP) data set, and used it to query the EarthAI Catalog to identify Landsat 8 scenes that intersect with urban and cropland sites around the world. The code in the cell block below replicates those steps for use in the following sections. Please refer to the previous article for more details on these operations.
# Read in the STEP data set step_gdf = geopandas.read_file("data/step_september152014_70rndsel_igbpcl.geojson") # Filter to include only the urban and cropland classes step_subset_gdf = step_gdf[step_gdf.igbp.isin([12, 13])] # Query Landsat 8 imagery at STEP sites cat = earth_ondemand.read_catalog( step_subset_gdf.geometry, start_datetime='2014-06-01', end_datetime='2014-06-15', max_cloud_cover=10, collections='landsat8_c2l1t1' ) # Join the imagery catalog back to the STEP data step_cat = geopandas.sjoin(step_subset_gdf, cat)
step_cat can include multiple Landsat 8 scenes for each STEP site, taken at different dates/times. For simplicity in demonstrating chipping strategies below, we select just a single scene for each site. The code below selects the scene with the least cloud coverage.
step_cat['grp_col'] = step_cat['siteid'] step_cat = step_cat.sort_values('eo_cloud_cover').groupby(['grp_col']).first()
Chipping Strategies
There are four chipping functions currently implemented in EarthAI:
- Intersecting Extent:
earthai.chipping.strategy.IntersectingExtent
- Scene-aligned Grid:
earthai.chipping.strategy.SceneAlignedGrid
- Feature-aligned Grid:
earthai.chipping.strategy.FeatureAlignedGrid
- Centroid-centered:
earthai.chipping.strategy.CentroidCentered
To see a list of these strategies and a description of their behavior, run earthai.chipping.chipping_strategies()
in a code cell within a notebook:
earthai.chipping.chipping_strategies()

To specify which chipping strategy to use, we pass the function name to the chipping_strategy
parameter in spark.read.chip
. The following subsections demonstrate how to implement each chipping method for the STEP sites, and describes the strategies in more detail.
To better understand the returns from these different chipping strategies, we'll use ipyleaflet to create map layers showing the tile extents for a single STEP site. We start by initializing an ipyleaflet.Map
object and adding a layer (gray) to show the boundary of the STEP site.
# STEP siteid = 103349714233 step_bounds_gdf = step_subset_gdf[step_subset_gdf.siteid == 103349714233][['siteid','geometry']] # ipyleaflet.Map initialization m = ipyleaflet.Map(center=(14.231918, 33.497561), zoom=14) # Add layer to map site_layer = ipyleaflet.GeoData(geo_dataframe=step_bounds_gdf, style={'color': 'gray'}, name='STEP Site Boundary') m.add_layer(site_layer) m.add_control(ipyleaflet.LayersControl()) m

Intersecting Extent
The intersecting extent strategy returns a set of chips that intersect with the input geometries. It returns a RasterFrame where each row represents a single element, or polygon, in the input vector data set. Each tile is centered on the geometry, and the dimensions of the tiles vary and are determined by the extent of the polygons. This method is particularly helpful for analyses involving zonal statistics, and for supervised machine learning applications where each geometry maps to target label.
The code below demonstrates the use of earthai.chipping.strategy.IntersectingExtent
.
chip_ie_rf = spark.read.chip(step_cat, ['B4', 'B3', 'B2'], chipping_strategy=earthai.chipping.strategy.IntersectingExtent()) \ .withColumnRenamed('B4', 'red') \ .withColumnRenamed('B3', 'green') \ .withColumnRenamed('B2', 'blue').cache()
Inspecting four of the STEP sites using the select
statement below, we see that each row corresponds to a different site, and the chip dimensions vary according to the size of the geometries.
chip_ie_rf.select('siteid', 'igbp', 'red', 'green', 'blue', rf_dimensions('blue')) \ .filter(F.col('siteid').isin([103878414119, 103117130970, 103349714233, 103916721582]))

For the intersecting extent chipping method (blue), we can see that a single tile is returned, clipped around the input STEP site boundary.
# Convert tile extents to EPSG:4326 chip_ie_df = chip_ie_rf.filter(F.col('siteid') == 103349714233) \ .withColumn('extent_4326', st_reproject(st_geometry(rf_extent('red')), rf_crs('red'), lit('EPSG:4326'))) \ .select('extent_4326').toPandas() chip_ie_gdf = geopandas.GeoDataFrame(chip_ie_df, geometry="extent_4326", crs="EPSG:4326") # Add layer to map chip_ie_layer = ipyleaflet.GeoData(geo_dataframe=chip_ie_gdf, style={'color': 'blue', 'dashArray':'10'}, name='Intersecting Extent') m.add_layer(chip_ie_layer) m

Centroid-centered
The centroid-centered chipping strategy takes chipCols
and chipRows
as input. If the input geometries are points, it creates chips of the specified dimensions centered on each of the points. If the input geometries are polygons, it creates chips of the specified dimensions centered at the centroids of each polygon. The returned RasterFrame will have chips of uniform dimensions - one for each input geometry - and is useful for deep learning applications.
The code below demonstrates the use of earthai.chipping.strategy.CentroidCentered
.
chip_cc_rf = spark.read.chip(step_cat, ['B4', 'B3', 'B2'], chipping_strategy=earthai.chipping.strategy.CentroidCentered(50, 50)) \ .withColumnRenamed('B4', 'red') \ .withColumnRenamed('B3', 'green') \ .withColumnRenamed('B2', 'blue').cache()
Inspecting the same four STEP sites that we looked at in the intersecting extent example, we see that with the centroid-centered method, a chip with dimensions of 50 x 50 is generated for each STEP site, regardless of the polygon extents.
chip_cc_rf.select('siteid', 'igbp', 'red', 'green', 'blue', rf_dimensions('blue')) \ .filter(F.col('siteid').isin([103878414119, 103117130970, 103349714233, 103916721582]))

For the centroid-centered chipping method (red), we can see that a single tile with user-supplied dimensions of 50 x 50 is returned, centered at the centroid of the STEP polygon.
# Convert tile extents to EPSG:4326 chip_cc_df = chip_cc_rf.filter(F.col('siteid') == 103349714233) \ .withColumn('extent_4326', st_reproject(st_geometry(rf_extent('red')), rf_crs('red'), lit('EPSG:4326'))) \ .select('extent_4326').toPandas() chip_cc_gdf = geopandas.GeoDataFrame(chip_cc_df, geometry="extent_4326", crs="EPSG:4326") # Add layer to map chip_cc_layer = ipyleaflet.GeoData(geo_dataframe=chip_cc_gdf, style={'color': 'red', 'dashArray':'10'}, name='Centroid-centered') m.add_layer(chip_cc_layer) m

Scene-aligned Grid
The scene-aligned grid strategy takes chipCols
and chipRows
as input. It creates a grid of the entire scene(s) using these chip dimensions, then returns only the chips that intersect with one or more of the geometries. The returned chip dimensions will usually be equal to the input parameters, but may be smaller for tiles on the edge of scenes. This method is useful for analyses that involve local map algebra (pixel-to-pixel) transformations.
The code below demonstrates the use of earthai.chipping.strategy.SceneAlignedGrid
.
chip_sa_rf = spark.read.chip(step_cat, ['B4', 'B3', 'B2'], chipping_strategy=earthai.chipping.strategy.SceneAlignedGrid(50, 50)) \ .withColumnRenamed('B4', 'red') \ .withColumnRenamed('B3', 'green') \ .withColumnRenamed('B2', 'blue').cache()
Inspecting a single STEP site in the select
statement below, we see that a single site can intersect with multiple chips, and the chip dimensions are uniform.
chip_sa_rf.select('siteid','igbp', 'red', 'green', 'blue', rf_dimensions('blue')) \ .filter(F.col('siteid').isin([103349714233]))

We add in a layer (green) to show the tiles returned using the scene-aligned grid chipping strategy. As expected we get four tiles of uniform dimensions (50 x 50) that intersect with, and extent beyond, the boundaries of the STEP site.
# Convert tile extents to EPSG:4326 chip_sa_df = chip_sa_rf.filter(F.col('siteid') == 103349714233) \ .withColumn('extent_4326', st_reproject(st_geometry(rf_extent('red')), rf_crs('red'), lit('EPSG:4326'))) \ .select('extent_4326').toPandas() chip_sa_gdf = geopandas.GeoDataFrame(chip_sa_df, geometry="extent_4326", crs="EPSG:4326") # Add layer to map chip_sa_layer = ipyleaflet.GeoData(geo_dataframe=chip_sa_gdf, style={'color': 'green'}, name='Scene-aligned Grid') m.add_layer(chip_sa_layer) m

Feature-aligned Grid
The feature-aligned grid strategy takes chipCols
and chipRows
as input. It extracts subregions of the scene(s) that intersect with each of the input polygons, creates a separate grid for each polygon using the specified chip dimensions, and returns the generated tiles. Like the scene-aligned grid strategy, this method is useful for analyses that involve local map algebra (pixel-to-pixel) transformations. However, the returned chip dimensions may be less than or equal to the input grid size, as the tiles are clipped to include only the pixels that intersect with the geometries.
The code below demonstrates the use of earthai.chipping.strategy.FeatureAlignedGrid
.
chip_fa_rf = spark.read.chip(step_cat, ['B4', 'B3', 'B2'], chipping_strategy=earthai.chipping.strategy.FeatureAlignedGrid(50, 50)) \ .withColumnRenamed('B4', 'red') \ .withColumnRenamed('B3', 'green') \ .withColumnRenamed('B2', 'blue').cache()
Inspecting the same STEP site that we looked at in the scene-aligned grid example, a single site can intersect with multiple chips. However, with the feature-aligned grid method, the output chip dimensions are less than or equal to the input chip dimensions.
chip_fa_rf.select('siteid', 'igbp', 'red', 'green', 'blue', rf_dimensions('blue')) \ .filter(F.col('siteid').isin([103349714233]))

Finally, we add in a layer (yellow) showing the tile extents returned from the feature-aligned grid chipping strategy. In this case we also get four tiles that intersect the site boundary, but they have different dimensions since the tiles are clipped tightly around the input geometry.
# Convert tile extents to EPSG:4326
chip_fa_df = chip_fa_rf.filter(F.col('siteid') == 103349714233) \
.withColumn('extent_4326', st_reproject(st_geometry(rf_extent('red')),
rf_crs('red'), lit('EPSG:4326'))) \
.select('extent_4326').toPandas()
chip_fa_gdf = geopandas.GeoDataFrame(chip_fa_df, geometry="extent_4326", crs="EPSG:4326")
# Add layer to map
chip_fa_layer = ipyleaflet.GeoData(geo_dataframe=chip_fa_gdf,
style={'color': 'yellow'},
name='Feature-aligned Grid')
m.add_layer(chip_fa_layer)
m

Comments
0 comments
Article is closed for comments.