This example continues from Part 1, but provides a simple way to expand the time horizon considered in the analysis.
Import Libraries and Create a SparkSession
The imports are the same as Part 1, but the SparkSession has some options to help with performance of larger jobs.
from earthai.utils import create_earthai_spark_session # create with defaults and the below will get the already created spark s3ession # and thus hopefully avoid setting this too-high parallelism for docs build spark = create_earthai_spark_session()
from earthai.all import * spark = earthai.create_earthai_spark_session(**{ 'spark.default.parallelism': 1000, 'spark.sql.shuffle.partitions': 1000, }) import pyspark.sql.functions as F import matplotlib.pyplot as plt
Query MODIS Imagery
We query imagery much the same, but with a few more variables around the time frame. This makes it easier to try shorter or longer time horizons. If you set the big
variable to 0 or False, it will be the same as Part 1.
big = 1 start = '2015-10-01' if big else '2019-08-01' end = '2019-09-30' def weeks(start:str, end:str): from datetime import datetime st = datetime.fromisoformat(start) en = datetime.fromisoformat(end) return (en - st).days // 7 print(start, end, weeks(start, end))
catalog = earth_ondemand.read_catalog( geo="POINT(-50.8 -10.5)", start_datetime=start, end_datetime=end, collections='mcd43a4', )
print(f"`catalog` has {len(catalog.id.unique())} distinct scenes ids starting from {catalog.datetime.min()}")
Load Imagery from the Catalog
We next use the spark.read.raster
as before.
df = spark.read.raster(catalog, catalog_col_names=['B01', 'B02'], ) \ .withColumnRenamed('B01', 'red') \ .withColumnRenamed('B02', 'nir')
Calculate NDVI
We next use the rf_normalized_difference
function in RasterFrames to calculate the NDVI.
df = df.withColumn('ndvi', rf_normalized_difference('nir', 'red'))
We can take a look at the NDVI calculations. Even if you have chosen a much larger set of data this kind of preview will still be limited to a few records.
df.select('ndvi', 'datetime', 'id', rf_extent('ndvi').alias('extent'), rf_crs('ndvi').alias('crs')) \ .filter(rf_no_data_cells(rf_with_no_data('red', 0)) < 800) # show tiles that have lots of valid data
time_series = df.groupBy(F.year('datetime').alias('year'), F.weekofyear('datetime').alias('week')) \ .agg(rf_agg_mean('ndvi').alias('mean_ndvi'))
The next cell will evaluate the job, reading the raster files data and computing the NDVI and mean per week. If the "Extra Large" launch option is active, we expect the 200-week job to take about 10 to 15 minutes to compute.
ts_pd = time_series.toPandas()
Finally, create the same plot.
ts_pd.sort_values(['year', 'week'], inplace=True) # Create a compact label of year and week number yyyy_ww ts_pd['year_week'] = ts_pd.apply(lambda r: '{0:g}_{1:02g}'.format(r.year, r.week), axis=1) plt.figure(figsize=(10,8)) plt.plot(ts_pd.year_week, ts_pd.mean_ndvi, 'g*-', ) xt = plt.xticks(rotation=-45, ) plt.ylabel('NDVI') plt.xlabel('Year and week') plt.title('Palmas Brazil NDVI')
Comments
0 comments
Article is closed for comments.