This example demonstrates how to mask Sentinel-2 Level-1C imagery. It runs through the steps of 1) acquiring imagery scenes from the EarthAI Catalog, 2) reading and converting the cloud mask vector file, and 3) using RasterFrames to read and mask imagery.
Masking is a very common operation in remote sensing data processing to mark certain pixels as null values. One of the most common reasons for masking out pixels is excluding cloudy pixels from your analysis.
Import Libraries
We will start by importing all of the Python libraries used in this example.
from earthai.init import * import pyspark.sql.functions as F from shapely.geometry import Polygon, MultiPolygon import xml.etree.ElementTree as ET import boto3 from tqdm.auto import tqdm prog = tqdm.pandas(unit='row')
Query EarthAI Catalog
This example will use the publicly accessible Sentinel-2 Level-1C collection. The Sentinel-2 Level-1C collection differs from the Sentinel-2 Level-2A collection in that it's not atmostpherically corrected.
We pass a lat-long point near Dar es Salaam, Tanzania to the EarthAI catalog and query it for Sentinel-2 L1C image scenes from September 2019 with a maximum cloud cover of 10%.
cat = earth_ondemand.read_catalog(geo='Point(39.20 -6.80)', start_datetime='2019-09-01', end_datetime='2019-10-01', collections='sentinel2_l1c', max_cloud_cover=10 )
Read and Convert Cloud Mask Layer
The Sentinel-2 Level-2A collection contains a separate raster band, which delineates areas of missing data and probable clouds. Unfortunately, quality assessment in the Sentinel-2 Level-1C collection is not quite as simple.
The Level-1C collection contains a GML file, called cloud_mask, that delineates cloudy areas. GML files are essentially XML files with geographical features and they are not a supported format in RasterFrames.
We have created functions in the code below that read the GML files from the Amazon S3 links provided in the catalog and convert them to Shapely MultiPolygons. We run these functions on the cloud_mask column in cat to convert the the cloud mask into a format we can work with for masking.
session = boto3.Session() s3_client = session.client("s3") # convert GML to list of Shapely Polygons def gml_to_poly(root): poly_list = [] for poly in root.iter('{http://www.opengis.net/gml/3.2}posList'): polystr = poly.text l = list(map(int, polystr.split(' '))) n = 2 poly = Polygon([l[i:i + n] for i in range(0, len(l), n)]).buffer(0) if poly.is_valid: poly_list += [poly] return poly_list # read in GML, call gml_to_poly function, and convert list of Polygons to MultiPolygons def cloud_mask_file_to_mp(cloud_mask_uri): from shapely.wkt import loads s3_object = s3_client.get_object(Bucket='sentinel-s2-l1c', Key=cloud_mask_uri.split('s3://sentinel-s2-l1c/')[1], RequestPayer='requester') body = s3_object['Body'] polys = gml_to_poly(ET.fromstring(body.read())) if polys: return MultiPolygon(polys) else: return loads('MULTIPOLYGON EMPTY') cat['cloud_mask_geom'] = cat.cloud_mask.progress_apply(cloud_mask_file_to_mp)
Read Image Bands
We pass cat to spark.read.raster
to read in bands B04 (red), B03 (green), and B02 (blue). The spark.read.raster
function reads the specified bands into a RasterFrame. The RasterFrame will include all columns from cat.
To view all of the available bands for the Sentinel-2 L1C collection, you can run earth_ondemand.item_assets('sentinel2_l1c')
.
rf = spark.read.raster(cat, catalog_col_names=['B04', 'B03', 'B02'])
Mask Imagery
The first step to mask Sentinel-2 L1C imagery is to convert the cell type. A NoData value is required to perform the masking. By default, Sentinel-2 measurement bands have a cell type of uint16raw
. The "raw" qualifier indicates that there is no NoData value defined. By setting the cell types to uint16
, we are setting the NoData value to 0.
rf = rf.select( rf_convert_cell_type('B04', 'uint16').alias('red'), rf_convert_cell_type('B03', 'uint16').alias('grn'), rf_convert_cell_type('B02', 'uint16').alias('blu'), 'cloud_mask_geom' )
The second step to mask Sentinel-2 L1C imagery is to rasterize the cloud mask geometries into a tile column. We pass our cloud_mask_geom to the rf_rasterize
function as well as the tile bounds, the number of columns, and the number of rows that the resulting tile should have. We also pass it a value that will be "burned-in" to the returned tile where the cloud_mask_geom intersects the tile bounds. Any values outside the cloud_mask_geom will be assigned a NoData value.
In this example, we set the value to 1.0 and pass the tile bounds and dimensions of the red tile column.
rf = rf.withColumn('mask', rf_rasterize('cloud_mask_geom', rf_geometry('red'), F.lit(1.0), rf_dimensions('red').cols, rf_dimensions('red').rows))
The final step in masking Sentinel-2 L1C imagery is to clip each band using the new mask tile column. Since the mask tile column contains 1 where there are clouds and NoData elsewhere, we need to use the rf_inverse_mask
function. This function replaces values in a tile column with NoData where the mask does not contain NoData. We pass each band and the mask tile column to the rf_inverse_mask
function in the code below.
rf = rf.withColumn('red_masked', rf_inverse_mask('red', 'mask')) \ .withColumn('grn_masked', rf_inverse_mask('grn', 'mask')) \ .withColumn('blu_masked', rf_inverse_mask('blu', 'mask'))
We look at a sample of red tiles before and after masking in the code below.
rf.filter(rf_tile_sum('mask') > 0).select('red', 'mask', 'red_masked')
Comments
0 comments
Please sign in to leave a comment.