This example demonstrates how to create an RGB mosaic from Landsat 8 imagery covering the Chobe National Park in Northern Botswana. It runs through the steps of 1) acquiring imagery scenes from the EarthAI Catalog, 2) using RasterFrames to read in and clip the imagery to the park boundary, and 3) using rasterio to stitch together the scenes and write out the mosaic as a GeoTIFF.
A mosaic is any combination of two or more images, and "mosaicking" refers to the process of creating a mosaic. In mosaicking with Earth observation data, these are some of the common operations involved:
- acquire scenes that cover a specified area of interest;
- clip the scenes to the area of interest;
- mask the imagery to remove low quality data such as those affected by cloud coverage or shadows (optional);
- normalize the color contrast among different scenes (optional);
- stitch together multiple scenes into a single image; and
- write the mosaic to a new file.
In this basic example, we do not cover masking and color normalization, which are discussed in other tutorials.
Note: If you are following along by running the attached companion notebook in EarthAI Notebook, this example takes about 6-8 minutes to run end-to-end on a Dedicated Instance type.
Import Libraries
We will start by importing all of the Python libraries used in this example.
from earthai.init import * import earthai.chipping.strategy import pyspark.sql.functions as F import osmnx import pandas import geopandas import folium import glob import rasterio from rasterio.merge import merge from rasterio.plot import show
Query the EarthAI Catalog
We get boundary information for the Chobe National Park from OpenStreetMap using the osmnx library.
chobe_gdf = osmnx.geocode_to_gdf('Chobe National Park')
We will use the EarthAI Catalog API to locate Landsat 8 scenes that cover the park's boundary. To use the above GeoDataFrame in our query, we must verify that the projection is "EPSG: 4326":
print(chobe_gdf.crs)
We can now input this geometry into earth_ondemand.read_catalog
. We use Landsat 8 Collection 1 Tier 1, and select imagery taken during the summer season of 2019-2020. We set max_cloud_cover = 15
to select scenes with low cloud coverage.
l8_cat = earth_ondemand.read_catalog(chobe_gdf, start_datetime = '2019-12-21', end_datetime = '2020-03-21', max_cloud_cover = 15, collections = 'landsat8_l1tp')
The code block below uses folium to display the park boundary and the extents of the intersecting Landsat 8 scenes.
m = folium.Map(location=[chobe_gdf.geometry.centroid.y.mean(), chobe_gdf.geometry.centroid.x.mean()], zoom_start=7) style_function = lambda x: {'fillColor': '#f003fc', 'color': '#f003fc'} chobe_poly = folium.features.GeoJson(chobe_gdf.to_json(), style_function=style_function) style_function = lambda x: {'fillColor': '#32a852', 'color': '#32a852'} scenes_poly = folium.features.GeoJson((l8_cat[['id','geometry']]).to_json(), style_function=style_function) m.add_children(chobe_poly) m.add_children(scenes_poly) m

Read in Landsat 8 Imagery
Since the Landsat 8 scenes cover a much larger geographic area than the Chobe National Park, we will make use of EarthAI's chip reader to load in only the data that intersects with the park boundary. First, we perform a spatial join to attach the park boundary geometry to the raster catalog:
l8_cat = geopandas.sjoin(l8_cat, chobe_gdf, how='right').rename(columns={"geometry":"chobe_bounds"})
Next, we use spark.read.chip
with geometry_col_name = 'chobe_bounds'
to specify the region of the scenes to load in. The data will be read in as a RasterFrame with uniform tiles with dimensions of 512 x 512. We use the FeatureAlignedGrid
chipping strategy since we want the mosaic to be gridded in reference to the park geometry. We read in bands B4 (red), B3 (green), and B2 (blue) in order to create a natural color image.
chobe_rf = spark.read.chip(l8_cat, catalog_col_names=['B4', 'B3', 'B2'], geometry_col_name='chobe_bounds', chipping_strategy=earthai.chipping.strategy.FeatureAlignedGrid(512), tile_dimensions=(512, 512))
The code below pulls out the columns of interest and the tile CRS and geometry, and we remove tiles that have only zeros. We also reproject the tile geometries to WGS84 for use in spatial join operations and visualization.
chobe_rf = chobe_rf.select('id', 'datetime', F.col('B4').alias('red'), F.col('B3').alias('green'), F.col('B2').alias('blue'), rf_crs('B4').alias('tile_crs'), rf_geometry('B4').alias('tile_geom')) \ .filter(rf_tile_max('red') > 0) \ .withColumn('tile_geom_wgs84', st_reproject('tile_geom', 'tile_crs', st_wgs84_crs()))
Adding the tile boundaries to our map, we can see that we've successfully loaded just the subset of imagery that intersects with the park. Note that adjacent Landsat 8 scenes can overlap, and scenes covering the same regions on different dates may not have the exact same extents. To create a mosaic, these tiles must be transformed to lie on the same grid and merged to produce a single image, which is handled by rasterio's merge
function later in this tutorial.
chips_gdf = geopandas.GeoDataFrame(chobe_rf.select('tile_geom_wgs84').toPandas(), geometry='tile_geom_wgs84', crs='OGC:CRS84') style_function = lambda x: {'fillColor': '#3498eb', 'color': '#3498eb'} chip_poly = folium.features.GeoJson(chips_gdf.to_json(), style_function=style_function) m.add_children(chip_poly) m

Clip the Tiles by the Park Boundary
We use RasterFrames operations to clip the tiles to the Chobe National Park boundary. These steps set all cells that lie outside the park boundary to NoData.
First, we take the park boundary GeoDataFrame and convert it to a Spark DataFrame so that we can use it in conjunction with the imagery chips.
chobe_sdf = spark.createDataFrame(pandas.DataFrame(chobe_gdf)) \ .withColumnRenamed('geometry', 'chobe_geom') \ .withColumn('in_bounds', lit(1))
Next, we spatially join the above park geometry to the tile geometries, making sure to use the tile_geom_wgs84 column for the latter so that the CRS matches. We also add a column that reprojects the park's boundary into to WGS 84 / UTM zone 35S - the CRS for the Landsat 8 imagery - for the next step.
Hint: to view the CRS of the image tiles run chobe_rf.select('tile_crs').limit(1)
in a notebook cell.
chobe_rf = chobe_rf.join(chobe_sdf, st_intersects(chobe_sdf.chobe_geom, chobe_rf.tile_geom_wgs84)) \ .withColumn('chobe_geom_utm35', st_reproject('chobe_geom', rf_mk_crs(str(chobe_gdf.crs)), 'tile_crs'))
The next lines create a new tile column called clip. We use the rf_rasterize
function to "burn-in" the park boundary into a tile of the same extent and CRS of the imagery tiles. We then use the clip column and rf_mask
to set the cell values in the Landsat 8 red, green, and blue bands to NoData if they are outside the park geometry.
Note: The Landsat 8 band native data types are uint16raw, which means they do not have a NoData value. In the code below we first convert them to uint16 in order to clip the imagery.
chobe_rf = chobe_rf.withColumn('clip', rf_rasterize('chobe_geom_utm35', 'tile_geom', 'in_bounds', rf_dimensions('red').cols, rf_dimensions('red').rows)) \ .withColumn('red_uint16', rf_convert_cell_type('red', 'uint16')) \ .withColumn('red_clipped', rf_mask('red_uint16', 'clip')) \ .withColumn('green_uint16', rf_convert_cell_type('green', 'uint16')) \ .withColumn('green_clipped', rf_mask('green_uint16', 'clip')) \ .withColumn('blue_uint16', rf_convert_cell_type('blue', 'uint16')) \ .withColumn('blue_clipped', rf_mask('blue_uint16', 'clip')).cache()
We can view some sample tiles to see the results of the above operations.
chobe_rf.select('red', 'clip', 'red_clipped')
Note: If you are following along by running the companion notebook in EarthAI Notebook, the select
job could take a few minutes to run on a Dedicated Instance type. This is because it is the first action invoked. The .cache()
at the end of the code chain before this step ensures that after completing the select
command above, the chobe_rf RasterFrame will be stored in memory, which will speed up all subsequent actions below.
Write the Image Chips to GeoTIFFs
rasterio's merge
function operates on a list of GeoTIFF files, so we need to write out the image tiles to GeoTIFF before stitching. We do this below using EarthAI's chip writer.
We start by defining an output folder to write the chips to. We then select only the columns in the RasterFrame that we want to output: the clipped imagery tiles (red_clipped, green_clipped, and blue_clipped), plus the Landsat 8 scene id and datetime information to store as image metadata.
We also create a unique row_index, and use it to define a file name for each chip.
output_path = r'Chobe_Landsat8_RGB_chips' chobe_chips_write = chobe_rf.select('id', 'datetime', 'red_clipped', 'green_clipped', 'blue_clipped') \ .withColumn('row_index', F.monotonically_increasing_id()) \ .withColumn('file_path_name', F.concat_ws('_', lit('chobe_l8_chip'), 'row_index')) \ .cache()
The code below writes each row to a 3-band GeoTIFF.
chobe_chips_write.write.chip(output_path, filenameCol='file_path_name', catalog=True, metadata=['id', 'datetime'])
Note: If you are following along by running the companion notebook in EarthAI Notebook, the job above will take about 6-7 minutes to run on a Dedicated Instance type.
Stitch Image Chips into a Single Mosaic
We use tools in the rasterio library to open the image chip GeoTIFFs created in the previous step and stitch them together into a single image.
We first get a list of all the image chips:
chobe_tifs = glob.glob(output_path+'/*.tif') chobe_tifs
We next open these files in read mode and populate a source list:
chobe_src_files = [] for tif in chobe_tifs: src = rasterio.open(tif) chobe_src_files.append(src)
Now we can merge these together and create a mosaic using rasterio's merge
function. Note that method='min'
combines overlapping cells by selecting the minimum value; this will usually result in picking the cell with the lowest haze or cloud coverage.
chobe_mosaic, out_trans = merge(chobe_src_files, method='min')
chobe_mosaic is the single mosaic array, and out_trans is the transformation needed to map cell coordinates into another coordinate system.
Write Mosaic Image to a GeoTIFF
In order to write the mosaic to a GeoTIFF, we need to update the image metadata. We start by grabbing the metadata from one of the source files:
out_meta = src.meta.copy() out_meta
The driver, dtype, nodata, and count field will be unchanged. We need to update the width, height, crs, and transform fields. The height and the width can be determined from the shape of the output mosaic array. The transform field should be set to the transformation output from the merge
function. We will set the crs to WGS 84 / UTM zone 35S to retain the CRS in its native projection.
Hint: to get the projected crs from the original image tiles, run chobe_rf.select('tile_crs').limit(1)
in a Notebook cell.
out_meta.update({"height": chobe_mosaic.shape[1], "width": chobe_mosaic.shape[2], "transform": out_trans, "crs": "+proj=utm +zone=35 +datum=WGS84 +units=m +no_defs " } )
We can now write out the final mosaic to a GeoTIFF:
chobe_mosaic_tif = r'Chobe_Landsat8_RGB_Mosaic_Summer2020.tif' with rasterio.open(chobe_mosaic_tif, "w", **out_meta) as dest: dest.write(chobe_mosaic)
This mosaic GeoTIFF can be used within EarthAI Notebook for additional analysis, or loaded into several different GIS applications (such as QGIS, as shown below).
Comments
0 comments
Please sign in to leave a comment.