This example demonstrates how to normalize the color contrast between 2 images from Landsat 8 imagery covering Queenstown, New Zealand. It runs through the steps of 1) acquiring imagery scenes from the EarthAI Catalog, 2) using RasterFrames to read and normalize imagery, 3) viewing imagery on a slippy map, and 4) showing histograms of cell values.
Image normalization is the process of scaling a range of pixel values. Images of the same area acquired at different times of day or taken from different satellites may have high color contrast, which makes it hard to compare. Normalizing the color contrast between images is important for a variety of image applications, such as mosaicking and machine learning.
In this example, we will:
- acquire two Landsat 8 scenes that cover a specified area of interest;
- read in the imagery using the EarthAI chip reader;
- display the imagery covering our area of interest on a slippy map;
- display histograms of pixel values;
- normalize the imagery using z-score normalization; and
- normalize the imagery using min-max normalization.
Note: If you are following along by running the attached companion notebook in EarthAI Notebook, this example takes about 4 minutes to run end-to-end on a Dedicated Instance type.
Import Libraries
We will start by importing all of the Python libraries used in this example.
from earthai.init import * import earthai.chipping.strategy import pyspark.sql.functions as F from shapely.geometry import Polygon import geopandas import ipyleaflet import matplotlib.pyplot as plt import math
Query the EarthAI Catalog
The polygon below is a bounding box surrounding Queenstown, New Zealand in the standard "EPSG:4326" projection. This projection is required for querying the EarthAI Catalog API.
poly = Polygon([(168.59011352062228,-45.05876119993573), (168.71783092617991,-45.05876119993573), (168.71783092617991,-44.982437864876836), (168.59011352062228,-44.982437864876836), (168.59011352062228,-45.05876119993573) ])
We can now input this geometry into earth_ondemand.read_catalog
. We use Landsat 8 Collection 1 Tier 1, and set max_cloud_cover = 2
to select scenes with very low cloud coverage.
It is possible to perform image normalization with cloudy scenes, but the additional step of masking out clouds would be necessary as clouds will skew the statistics used to calculate normalization parameters.
cat = earth_ondemand.read_catalog(poly, max_cloud_cover = 2, collections = 'landsat8_l1tp') cat['queenstown_geometry'] = poly
We filter the catalog down to just two Landsat 8 scenes to speed up calculation in this example.
cat = cat[cat['id'].isin(['LO08_L1TP_075091_20150210_20170413_01_T1_L1TP','LC08_L1TP_076091_20170222_20170301_01_T1_L1TP' ])]
The code block below uses ipyleaflet to display the Queenstown boundary and the extents of the intersecting Landsat 8 scenes. If we wanted to create a mosaic of these Landsat 8 scenes, we would first want to normalize the bands in each scene to prevent the mosaic from looking choppy in the overlapping area.
In this example, we will just look at imagery located within the Queenstown boundary to illustrate 2 different normalization techniques: z-score normalization and min-max normalization.
m = ipyleaflet.Map(center=(poly.centroid.y, poly.centroid.x), zoom=8) queenstown_layer = ipyleaflet.GeoData(geo_dataframe=geopandas.GeoDataFrame([{'geometry':poly}]), name='Queenstown', style={'fillColor': '#f003fc', 'color': '#f003fc'}) scenes_layer = ipyleaflet.GeoData(geo_dataframe=cat[['geometry']], name='Landsat 8 Scenes', style={'fillColor': '#32a852', 'color': '#32a852'}) m.add_layer(queenstown_layer) m.add_layer(scenes_layer) m.add_control(ipyleaflet.LayersControl()) m

Read in Landsat 8 Imagery
Since the Landsat 8 scenes cover a much larger geographic area than Queenstown, we will make use of EarthAI's chip reader to load in only the data that intersects with the Queenstown bounding box. We specify geometry_col_name = 'queenstown_geometry'
to specify the region of the scenes to load in. The data will be read in as a RasterFrame with uniform tiles with dimensions of 256 x 256. We use the FeatureAlignedGrid
chipping strategy since we want the mosaic to be gridded in reference to the Queenstown geometry.
We read in bands B4 (red), B3 (green), and B2 (blue) and convert the cell type for each band to uint16
, whose NoData value is 0. This will cause any zero-valued cells in the bands to be considered NoData. In Landsat 8, these areas correspond to the band quality assurance fill areas and we don't want any zeros to skew the normalization statistics.
rf = spark.read.chip(cat.reset_index(), catalog_col_names=["B4", "B3", "B2"], geometry_col_name = 'queenstown_geometry', chipping_strategy=earthai.chipping.strategy.FeatureAlignedGrid(256, 256)) \ .select('datetime', rf_convert_cell_type('B4', 'uint16').alias('red'), rf_convert_cell_type('B3', 'uint16').alias('grn'), rf_convert_cell_type('B2', 'uint16').alias('blu'))
We split the Landsat 8 scenes into 2 separate RasterFrames.
rf_scene1 = rf.filter(F.year('datetime') == 2015) rf_scene2 = rf.filter(F.year('datetime') == 2017)
To view the imagery, we use ipyleaflet and the RasterFrames slippy map writer to show our imagery overlaid on a basemap. Using the layer control on the top left of the map, you can toggle between the scene from 2015 and the scene from 2017 to see the color contrast between images.
m1 = ipyleaflet.Map(center=(poly.centroid.y, poly.centroid.x), zoom=12) rf_scene1_map = rf_scene1.select('red', 'grn', 'blu').to_map('Scene_2015', rendering_mode=earthai.RenderingMode.UNIFORM) rf_scene2_map = rf_scene2.select('red', 'grn', 'blu').to_map('Scene_2017', rendering_mode=earthai.RenderingMode.UNIFORM) m1.add_layer(rf_scene1_map.layers[-1]) m1.add_layer(rf_scene2_map.layers[-1]) m1.add_control(ipyleaflet.LayersControl()) m1

The rf_agg_approx_histogram
function computes a count of cell values across all of the rows of tile in a RasterFrame. We use this function to compute histograms for each band and Landsat 8 scene. A histogram will show us if the distribution and range of cell values for each band are similar across scenes. If the histograms don't look similar, then image normalization may be required to scale the values to a similar data distribution. This is important for speeding up algorithm convergence and making sure one feature doesn't dominate another in machine learning applications. It can also be used to remove color contrast when creating a mosaic of images.
As you can see in the histograms below, the cell values are shifted to the right for the scene from 2015.
bins = 50 fig, axs = plt.subplots(1, 3, figsize=(20,4)) fig.suptitle('Band Histograms') for i, band in enumerate(['red', 'grn', 'blu']): bins_list1 = rf_scene1.agg( rf_agg_approx_histogram(band)['bins'].alias('bins') ).collect() values1 = [row['value'] for row in bins_list1[0].bins] counts1 = [row['count'] for row in bins_list1[0].bins] bins_list2 = rf_scene2.agg( rf_agg_approx_histogram(band)['bins'].alias('bins') ).collect() values2 = [row['value'] for row in bins_list2[0].bins] counts2 = [row['count'] for row in bins_list2[0].bins] axs[i].hist(values1[:bins], weights=counts1[:bins], bins=bins, alpha = 0.5, label='Scene_2015') axs[i].hist(values2[:bins], weights=counts2[:bins], bins=bins, alpha = 0.5, label='Scene_2017') axs[i].title.set_text(band) axs[i].legend() plt.show()
If we look at the average cell value for each band and compare across scenes, we can see that the average cell value is higher for the scene from 2015 for each band. The mean and distribution of cell values can be transformed using z-score normalization, which we talk about in the next section.
rf_scene1.agg(rf_agg_stats('red').alias('red_stats'), rf_agg_stats('grn').alias('grn_stats'), rf_agg_stats('blu').alias('blu_stats')) \ .select(F.round('red_stats.mean', 0).alias('scene_2015_red_mean'), F.round('grn_stats.mean', 0).alias('scene_2015_grn_mean'), F.round('blu_stats.mean', 0).alias('scene_2015_blu_mean'))
rf_scene2.agg(rf_agg_stats('red').alias('red_stats'), rf_agg_stats('grn').alias('grn_stats'), rf_agg_stats('blu').alias('blu_stats')) \ .select(F.round('red_stats.mean', 0).alias('scene_2017_red_mean'), F.round('grn_stats.mean', 0).alias('scene_2017_grn_mean'), F.round('blu_stats.mean', 0).alias('scene_2017_blu_mean'))
You can also see that the range of cell values for each band is wider for the scene from 2017. The range of cell values can be transformed using min-max normalization, which we talk about in the last section.
rf_scene1.agg(rf_agg_stats('red').alias('red_stats'), rf_agg_stats('grn').alias('grn_stats'), rf_agg_stats('blu').alias('blu_stats')) \ .select(F.round('red_stats.min', 0).alias('scene_2015_red_min'), F.round('red_stats.max', 0).alias('scene_2015_red_max'), F.round('grn_stats.min', 0).alias('scene_2015_grn_min'), F.round('grn_stats.max', 0).alias('scene_2015_grn_max'), F.round('blu_stats.min', 0).alias('scene_2015_blu_min'), F.round('blu_stats.max', 0).alias('scene_2015_blu_max'))
rf_scene2.agg(rf_agg_stats('red').alias('red_stats'), rf_agg_stats('grn').alias('grn_stats'), rf_agg_stats('blu').alias('blu_stats')) \ .select(F.round('red_stats.min', 0).alias('scene_2017_red_min'), F.round('red_stats.max', 0).alias('scene_2017_red_max'), F.round('grn_stats.min', 0).alias('scene_2017_grn_min'), F.round('grn_stats.max', 0).alias('scene_2017_grn_max'), F.round('blu_stats.min', 0).alias('scene_2017_blu_min'), F.round('blu_stats.max', 0).alias('scene_2017_blu_max'))
Z-Score Normalization
Z-score normalization is a normalization technique that accounts for the mean offset and variance of cell values. EarthAI has a built-in function to perform z-score normalization called rf_standardize
. It calculates the mean and variance of cell values, then subtracts the mean from each cell value and divides by the standard deviation. This technique centers the cell values at 0 and puts the range of cell values on roughly the same scale for each scene. It is good for transforming image scenes to similar distributions.
rf_scene1 = rf_scene1.withColumn('red_z_normalized', rf_standardize('red')) \ .withColumn('grn_z_normalized', rf_standardize('grn')) \ .withColumn('blu_z_normalized', rf_standardize('blu')) rf_scene2 = rf_scene2.withColumn('red_z_normalized', rf_standardize('red')) \ .withColumn('grn_z_normalized', rf_standardize('grn')) \ .withColumn('blu_z_normalized', rf_standardize('blu'))
After performing z-score normalization, we see that the distribution of cell values in the band histograms are similar.
bins = 50 fig, axs = plt.subplots(1, 3, figsize=(20,4)) fig.suptitle('Z-Score Normalized Band Histograms') for i, band in enumerate(['red', 'grn', 'blu']): bins_list1 = rf_scene1.agg( rf_agg_approx_histogram(band + '_z_normalized')['bins'].alias('bins') ).collect() values1 = [row['value'] for row in bins_list1[0].bins] counts1 = [row['count'] for row in bins_list1[0].bins] bins_list2 = rf_scene2.agg( rf_agg_approx_histogram(band + '_z_normalized')['bins'].alias('bins') ).collect() values2 = [row['value'] for row in bins_list2[0].bins] counts2 = [row['count'] for row in bins_list2[0].bins] axs[i].hist(values1[:bins], weights=counts1[:bins], bins = bins, alpha = 0.5, label='Scene_2015') axs[i].hist(values2[:bins], weights=counts2[:bins], bins = bins, alpha = 0.5, label='Scene_2017') axs[i].title.set_text(band) axs[i].legend() plt.show()
The mean cell value of each band is zero for both scenes.
rf_scene1.agg(rf_agg_stats('red_z_normalized').alias('red_stats'), rf_agg_stats('grn_z_normalized').alias('grn_stats'), rf_agg_stats('blu_z_normalized').alias('blu_stats')) \ .select(F.round('red_stats.mean', 0).alias('scene_2015_red_mean'), F.round('grn_stats.mean', 0).alias('scene_2015_grn_mean'), F.round('blu_stats.mean', 0).alias('scene_2015_blu_mean'))
rf_scene2.agg(rf_agg_stats('red_z_normalized').alias('red_stats'), rf_agg_stats('grn_z_normalized').alias('grn_stats'), rf_agg_stats('blu_z_normalized').alias('blu_stats')) \ .select(F.round('red_stats.mean', 0).alias('scene_2017_red_mean'), F.round('grn_stats.mean', 0).alias('scene_2017_grn_mean'), F.round('blu_stats.mean', 0).alias('scene_2017_blu_mean'))
Min-Max Normalization
Min-max normalization is a normalization technique that puts all cell values on the same scale by subtracting each value by the minimum cell value and dividing this result by the difference between the maximum and minimum cell values. The minimum value of each band is transformed to 0 and the maximum value of each band is transformed to 1. Unfortunately, min-max normalization does not handle outliers well since the calculation is based on the minimum and maximum cell values, but it does guarantee that all normalized data will be on the exact same scale.
After performing min-max normalization, you can multiply all cell values by the same number to change the range of values. For example, you can multiply the cell values by 5, so that the minimum cell value is 0 and the maximum cell value is 5.
In this example, we perform min-max normalization and then multiply all cell values by 100 because it's easier to view the distribution of cell values in a histogram with a wider range.
def min_max_normalization(dataframe, col, factor=100): stats = dataframe.agg(rf_agg_stats(col).alias('stats')).select('stats.min', 'stats.max').first() mn = stats.min mx = stats.max dataframe = dataframe.withColumn(col + '_min_max_normalized', rf_local_multiply(rf_local_divide(rf_local_subtract(col, mn), (mx - mn)), factor)) return dataframe rf_scene1 = min_max_normalization(rf_scene1, 'red') rf_scene1 = min_max_normalization(rf_scene1, 'grn') rf_scene1 = min_max_normalization(rf_scene1, 'blu') rf_scene2 = min_max_normalization(rf_scene2, 'red') rf_scene2 = min_max_normalization(rf_scene2, 'grn') rf_scene2 = min_max_normalization(rf_scene2, 'blu')
After performing image normalization, you can see that the range of cell values in the band histograms are the same.
fig, axs = plt.subplots(1, 3, figsize=(20,4)) fig.suptitle('Min-Max Normalized Band Histograms') for i, band in enumerate(['red', 'grn', 'blu']): bins_list1 = rf_scene1.agg( rf_agg_approx_histogram(band + '_min_max_normalized')['bins'].alias('bins') ).collect() values1 = [row['value'] for row in bins_list1[0].bins] counts1 = [row['count'] for row in bins_list1[0].bins] bins_list2 = rf_scene2.agg( rf_agg_approx_histogram(band + '_min_max_normalized')['bins'].alias('bins') ).collect() values2 = [row['value'] for row in bins_list2[0].bins] counts2 = [row['count'] for row in bins_list2[0].bins] axs[i].hist(values1, weights=counts1, bins = bins, alpha = 0.5, label='Scene_2015') axs[i].hist(values2, weights=counts2, bins = bins, alpha = 0.5, label='Scene_2017') axs[i].title.set_text(band) axs[i].legend() plt.show()
The minimum cell value is 0 and the maximum cell value is 100 for each band and scene.
rf_scene1.agg(rf_agg_stats('red_min_max_normalized').alias('red_stats'), rf_agg_stats('grn_min_max_normalized').alias('grn_stats'), rf_agg_stats('blu_min_max_normalized').alias('blu_stats')) \ .select(F.round('red_stats.min', 0).alias('scene_2015_red_min'), F.round('red_stats.max', 0).alias('scene_2015_red_max'), F.round('grn_stats.min', 0).alias('scene_2015_grn_min'), F.round('grn_stats.max', 0).alias('scene_2015_grn_max'), F.round('blu_stats.min', 0).alias('scene_2015_blu_min'), F.round('blu_stats.max', 0).alias('scene_2015_blu_max'))
rf_scene2.agg(rf_agg_stats('red_min_max_normalized').alias('red_stats'), rf_agg_stats('grn_min_max_normalized').alias('grn_stats'), rf_agg_stats('blu_min_max_normalized').alias('blu_stats')) \ .select(F.round('red_stats.min', 0).alias('scene_2017_red_min'), F.round('red_stats.max', 0).alias('scene_2017_red_max'), F.round('grn_stats.min', 0).alias('scene_2017_grn_min'), F.round('grn_stats.max', 0).alias('scene_2017_grn_max'), F.round('blu_stats.min', 0).alias('scene_2017_blu_min'), F.round('blu_stats.max', 0).alias('scene_2017_blu_max'))
Comments
0 comments
Please sign in to leave a comment.