{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This example demonstrates how to write out the contents of a Spark DataFrame to a GeoJSON file. This is useful for viewing columns of type geometry in your RasterFrame in an external GIS software.\n", "\n", "In this example, we run through the steps of 1) acquiring imagery scenes from the EarthAI Catalog, 2) using RasterFrames to read imagery, and 3) writing a RasterFrame to GeoJSON.\n", "\n", "# Import Libraries\n", "\n", "We will start by importing all of the Python libraries used in this example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from earthai.init import *\n", "import earthai.chipping.strategy\n", "\n", "from shapely.geometry import Polygon\n", "import pyspark.sql.functions as F\n", "import ipyleaflet\n", "import geopandas " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Query the EarthAI Catalog\n", "\n", "We read in a GeoJSON file containing U.S. state boundaries and filter the GeoDataFrame to the continental U.S. The chip reader requires that the geometries passed to it are of type Polygon, so we convert the MultiPolygon representing Virginia to a Polygon.\n", "\n", "We use the __geometry__ column in the GeoDataFrame to query the EarthAI catalog for MODIS surface reflectance data from September 1, 2020 covering the continental United States. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "states_url ='https://raw.githubusercontent.com/datasets/geo-admin1-us/master/data/admin1-us.geojson'\n", "states_gdf = geopandas.read_file(states_url)\n", "\n", "states_gdf = states_gdf[~(states_gdf[\"name\"].isin(['Hawaii', 'Alaska']))]\n", "\n", "# convert MultiPolygon to Polygon in Virginia\n", "va_polygons = list(states_gdf[states_gdf.name == 'Virginia'].iloc[0].geometry)\n", "states_gdf.loc[states_gdf.name == 'Virginia', ['geometry']] = va_polygons[1]\n", "\n", "cat = earth_ondemand.read_catalog(states_gdf.geometry, \n", " start_datetime='2020-09-01', \n", " end_datetime='2020-09-01', \n", " collections='mcd43a4')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read in MODIS Imagery\n", "\n", "We join the catalog back to the GeoDataFrame containing state boundaries in order to match the state boundary to the intersecting image scene. This step is critical for use of the chip reader since the chipping strategy needs the state boundary polygon." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cat = geopandas.sjoin(cat, states_gdf, how='right').rename(columns={\"geometry\":\"us_bounds\"})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use `spark.read.chip` to read only the imagery intersecting state boundaries. We read in the __B01__ (red) band. The feature-aligned grid strategy creates a grid across each state boundary polygon using the specified `tile_dimensions`, and returns the generated chips. \n", "\n", "_To view all of the available bands for the MODIS collection, you can run `earth_ondemand.item_assets('mcd43a4')`._" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rf = spark.read.chip(cat,\n", " catalog_col_names=['B01'],\n", " geometry_col_name='us_bounds',\n", " chipping_strategy=earthai.chipping.strategy.FeatureAlignedGrid(256),\n", " tile_dimensions=(256,256))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Write a RasterFrame to GeoJSON\n", "\n", "We use Geopandas to write out our data as GeoJSON because GeoPandas has a built-in method to write out a GeoDataFrame as a GeoJSON file. \n", "\n", "The expected coordinate reference system (CRS) for GeoJSON objects is \"EPSG:4326\", so as a first step, we reproject our chip outlines to this CRS. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rf = rf.select(F.col('B01').alias('red')) \\\n", " .withColumn('chip_outline_4326', st_reproject(rf_geometry('red'), rf_crs('red'), F.lit('EPSG:4326')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we select __chip_outline_4326__ and transform our RasterFrame to a Pandas DataFrame using `toPandas()`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chips_df = rf.select('chip_outline_4326').toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we transform our Pandas DataFrame, __chips_df__, to a GeoPandas GeoDataFrame, __chips_gdf__, and specify a `geometry` and `crs` column. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chips_gdf = geopandas.GeoDataFrame(chips_df,\n", " geometry='chip_outline_4326', crs='EPSG:4326')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we write our GeoDataFrame as a GeoJSON using the built-in `to_file` function and specifying \"GeoJSON\" as the `driver`. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "chips_gdf.to_file(\"chip_boundaries.json\", driver=\"GeoJSON\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__chip_boundaries.json__ will be written out to the same directory where your notebook resides. You can right click on the file in the left menu and select ___Download___ to save the file to your local machine for viewing in an external GIS software." ] } ], "metadata": { "kernelspec": { "display_name": "EarthAI Environment", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "zendesk": { "draft": true, "id": 360051613051, "section_id": 360010299511, "title": "Writing from a Spark DataFrame to a GeoJSON File" } }, "nbformat": 4, "nbformat_minor": 4 }