{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In [a previous article](https://astraeahelp.zendesk.com/knowledge/articles/360043452652/en-us?brand_id=360003221551), we introduced the `spark.read.chip` function for reading in subsets of scenes from Earth observation data, and in [another article](https://astraeahelp.zendesk.com/knowledge/articles/360050789172/en-us?brand_id=360003221551), we demonstrated the different chipping strategies available with the `spark.read.chip` function. In this article, we will show how to write out your chips in GeoTIFF format.\n", "\n", "*Note: if you would like to run through this example in EarthAI Notebook, you can download the companion notebook and vector data source from the attachments provided at the end of this article.*\n", "\n", "# Import Libraries\n", "\n", "We will start by importing all of the Python libraries used in this example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from earthai.init import *\n", "import earthai.chipping.strategy\n", "import pyspark.sql.functions as F\n", "\n", "import os\n", "import geopandas\n", "import rasterio\n", "import ipyleaflet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Query Imagery at STEP Sites\n", "\n", "In [a previous article](https://astraeahelp.zendesk.com/knowledge/articles/360043452652/en-us?brand_id=360003221551), we introduced the [System for Terrestrial Ecosystem Parameterization](http://www.gofcgold.wur.nl/sites/gofcgold_refdataportal-step.php#:~:text=The%20System%20for%20Terrestrial%20Ecosystem,%2C%20ecosystems%2C%20and%20vegetation%20types.) (STEP) data set, and used it to query the EarthAI Catalog to identify Landsat 8 scenes that intersect with cropland and urban sites around the world. The code in the cell block below replicates those steps for use in the following sections. Please refer to the previous article for more details on these operations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read in the STEP data set\n", "step_gdf = geopandas.read_file(\"data/step_september152014_70rndsel_igbpcl.geojson\")\n", "\n", "# Filter to include only the cropland and urban classes\n", "step_subset_gdf = step_gdf[step_gdf.igbp.isin([12, 13])]\n", "\n", "# Query Landsat 8 imagery at STEP sites\n", "cat = earth_ondemand.read_catalog(\n", " step_subset_gdf.geometry,\n", " start_datetime='2014-06-01', \n", " end_datetime='2014-06-15',\n", " max_cloud_cover=10,\n", " collections='landsat8_l1tp'\n", ")\n", "\n", "# Join the imagery catalog back to the STEP data\n", "step_cat = geopandas.sjoin(step_subset_gdf, cat)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**step_cat** can include multiple Landsat 8 scenes for each STEP site, taken at different dates/times. For simplicity in demonstrating chip writing, we select just a single scene for each site. The code below selects the scene with the least cloud coverage." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "step_cat['grp_col'] = step_cat['siteid']\n", "step_cat = step_cat.sort_values('eo_cloud_cover').groupby(['grp_col']).first()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read Chips\n", "\n", "We use the centroid-centered chipping strategy, which creates chips of the specified dimensions centered at a point, or at the centroids of each polygon, depending on what input geometry is passed. The returned RasterFrame will have chips of uniform dimensions - one for each input geometry. This chipping strategy is useful for deep learning applications.\n", "\n", "We pass the chipping strategy, `earthai.chipping.strategy.CentroidCentered`, to the `spark.read.chip` function. We specify the chip dimensions as 50 by 50 pixels.\n", "\n", "_To see a list of all chipping strategies and a description of their behavior, run `earthai.chipping.chipping_strategies()`._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rf = spark.read.chip(step_cat, ['B4', 'B3', 'B2'], \n", " chipping_strategy=earthai.chipping.strategy.CentroidCentered(50, 50)) \\\n", " .withColumnRenamed('B4', 'red') \\\n", " .withColumnRenamed('B3', 'green') \\\n", " .withColumnRenamed('B2', 'blue') \\\n", " .filter(rf_tile_max('red') > 0).cache() # filter out chips with all NoData values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Write Chips\n", "\n", "To write chips in GeoTIFF format, we use the `rf.write.chip` function. This function requires a file path and file name column as input. The file path points to the directory that will store the chips when they are written out. The file name column provides the file name to use for each chip. The file name column can also include a subdirectory structure if desired. \n", "\n", "In the cell below, we create the __file_path_name__ column that concatenates the __igbp__ label with the unique __siteid__ value to create a subdirectory structure that organizes the chips by label. The cropland chips will be written out the __\"12\"__ folder and the urban chips will be written out to the __\"13\"__ folder within the main directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rf = rf.withColumn('file_path_name', \n", " F.concat_ws('/', F.col('igbp'), F.col('siteid')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As specified below, the main folder containing the chips will be called __\"chips\"__. It will be created in the same directory where your notebook resides. \n", "\n", "The remaining parameters in `rf.write.chip` are optional. We pass __True__ to the `catalog` parameter, which tells the chip writer to write a CSV file directory for all of the chips written out. This CSV file includes the metadata columns specified in the `metadata` parameter as well as CRS and bounding box information for each chip.\n", "\n", "A single GeoTIFF will be written out for each row of your DataFrame. If there are multiple tile columns in your RasterFrame, each GeoTIFF will be multi-band. Run the cell below to start writing chips.\n", "\n", "_It takes 3-4 minutes to write out the 149 chips in this RasterFrame on a Dedicated Instance type._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rf.write.chip('chips', filenameCol='file_path_name', \n", " catalog=True, \n", " metadata=['siteid', 'igbp', 'geometry', 'datetime', ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the chips are written out, you can navigate through the chip directory in the left menu, right click on any of the files, and select ___Download___ to save the file to your local machine.\n", "\n", "Each chip contains a lot of metadata, including the __metadata__ columns we passed to the chip writer. We open a single chip using Rasterio to view some of the available metadata." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_chip = 'chips/12/100283546777.tif'\n", "\n", "with rasterio.open(sample_chip) as src:\n", " for k, v in src.meta.items():\n", " print(k, '\\t', v)\n", " \n", " print('\\n')\n", " print('T A G S :')\n", " for k, v in src.tags().items():\n", " print(k, '\\t', v)\n", " \n", " print('\\n')\n", " print('B A N D S :')\n", " for b in range(1, src.count + 1):\n", " print(\"Band\", b, '\\t', src.colorinterp[b-1])" ] } ], "metadata": { "kernelspec": { "display_name": "EarthAI Environment", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "zendesk": { "draft": true, "id": 360051206972, "section_id": 360008732711, "title": "How to Write Image Chips to GeoTIFF Files" } }, "nbformat": 4, "nbformat_minor": 4 }