{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quick Example Part 2: NDVI near Palmas, Brazil" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example continues from [Part 1](https://astraeahelp.zendesk.com/hc/en-us/articles/360043896551), but provides a simple way to expand the time horizon considered in the analysis. \n", "\n", "## Import Libraries and Create a SparkSession\n", "\n", "The imports are the same as [Part 1](https://astraeahelp.zendesk.com/hc/en-us/articles/360043896551), but the [SparkSession](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.SparkSession) has some options to help with performance of larger jobs.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": false, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "echo=False", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "from earthai.utils import create_earthai_spark_session\n", "# create with defaults and the below will get the already created spark s3ession\n", "# and thus hopefully avoid setting this too-high parallelism for docs build\n", "spark = create_earthai_spark_session()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": "init", "option_string": "name = \"init\"", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "from earthai.all import *\n", "\n", "spark = earthai.create_earthai_spark_session(**{\n", " 'spark.default.parallelism': 1000,\n", " 'spark.sql.shuffle.partitions': 1000,\n", "})\n", "\n", "import pyspark.sql.functions as F\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Query MODIS Imagery\n", "\n", "We query imagery much the same, but with a few more variables around the time frame. This makes it easier to try shorter or longer time horizons. If you set the `big` variable to 0 or False, it will be the same as [Part 1](https://astraeahelp.zendesk.com/hc/en-us/articles/360043896551).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "big = 1\n", "start = '2015-10-01' if big else '2019-08-01'\n", "end = '2019-09-30'\n", "\n", "def weeks(start:str, end:str):\n", " from datetime import datetime\n", " st = datetime.fromisoformat(start)\n", " en = datetime.fromisoformat(end)\n", "\n", " return (en - st).days // 7\n", "\n", "print(start, end, weeks(start, end))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": "read_catalog", "option_string": "name = \"read_catalog\"", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "catalog = earth_ondemand.read_catalog(\n", " geo=\"POINT(-50.8 -10.5)\",\n", " start_datetime=start,\n", " end_datetime=end,\n", " collections='mcd43a4',\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "print(f\"`catalog` has {len(catalog.id.unique())} distinct scenes ids starting from {catalog.datetime.min()}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Load Imagery from the Catalog\n", "\n", "We next use the [`spark.read.raster`](https://rasterframes.io///raster-read.html) as before.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "df = spark.read.raster(catalog,\n", " catalog_col_names=['B01', 'B02'],\n", " ) \\\n", " .withColumnRenamed('B01', 'red') \\\n", " .withColumnRenamed('B02', 'nir')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Calculate NDVI\n", "\n", "We next use the [`rf_normalized_difference`](https://rasterframes.io///reference.html#rf-normalized-difference) function in RasterFrames to calculate the NDVI.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "df = df.withColumn('ndvi', rf_normalized_difference('nir', 'red'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "We can take a look at the NDVI calculations. Even if you have chosen a much larger set of data this kind of preview will still be limited to a few records.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": "view_ndvi", "option_string": "name = \"view_ndvi\"", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "df.select('ndvi',\n", " 'datetime',\n", " 'id',\n", " rf_extent('ndvi').alias('extent'),\n", " rf_crs('ndvi').alias('crs')) \\\n", " .filter(rf_no_data_cells(rf_with_no_data('red', 0)) < 800)\n", " # show tiles that have lots of valid data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Time Series\n", "\n", "By using the `groupBy` in our analysis, we don't have to explicitly worry about how long the time series is. The expression of our analysis is the same.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": true, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": null, "option_string": "", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "time_series = df.groupBy(F.year('datetime').alias('year'),\n", " F.weekofyear('datetime').alias('week')) \\\n", " .agg(rf_agg_mean('ndvi').alias('mean_ndvi'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The next cell will evaluate the job, reading the raster files data and computing the NDVI and mean per week. If the \"Extra Large\" launch option is active, we expect the 200-week job to take about 10 to 15 minutes to compute.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "autoscroll": "auto", "collapsed": false, "jupyter": { "outputs_hidden": false }, "options": { "caption": false, "complete": true, "display_data": true, "display_stream": true, "dpi": 200, "echo": true, "evaluate": false, "f_env": null, "f_pos": "htpb", "f_size": [ 6, 4 ], "f_spines": true, "fig": true, "include": true, "name": "to_pandas", "option_string": "name = \"to_pandas\", evaluate=False", "results": "verbatim", "term": false, "wrap": "output" } }, "outputs": [], "source": [ "ts_pd = time_series.toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Finally, create the same plot.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ts_pd.sort_values(['year', 'week'], inplace=True)\n", "# Create a compact label of year and week number yyyy_ww\n", "ts_pd['year_week'] = ts_pd.apply(lambda r: '{0:g}_{1:02g}'.format(r.year, r.week), axis=1)\n", "\n", "fig, ax = plt.subplots(figsize=(10,8))\n", "plt.plot(ts_pd.year_week, ts_pd.mean_ndvi, 'g*-', )\n", "xt = plt.xticks(rotation=-45)\n", "ax.xaxis.set_major_locator(plt.MaxNLocator(10))\n", "plt.ylabel('NDVI')\n", "plt.xlabel('Year and week')\n", "plt.title('Palmas Brazil NDVI')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernel_info": { "name": "echo" }, "kernelspec": { "display_name": "EarthAI Python", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "zendesk": { "id": 360043452552, "position": 150 } }, "nbformat": 4, "nbformat_minor": 4 }