This article describes importing shapefiles into an EarthAI Notebook. For this example we'll use The Nature Conservancy's Terrestrial Ecoregions spatial data layer.
There are a few different ways to make a shapefile available in the Notebook environment. If the .shp is stored locally in your EarthAI folder you can read it into a Spark DataFrame by running:
from earthai.init import *
df = spark.read.shapefile('Terrestrial_Ecoregions.shp')
The first line is the necessary import statement if you are not already in an active SparkSession. The second line reads the shapefile into a Spark DataFrame. If the file is in the current working directory you can simply start typing the name and tab+autocomplete. If it does not autocomplete, the file is in another directory and you will need the full file path to the shapefile. You can check that the type is a Spark DataFrame to confirm the read was successful.
type(df)
Alternatively, if the .shp is not stored locally it can be retrieved by passing the URL in place of a file path. The file is unzipped and the shapefile is read into a Spark DataFrame.
df2 = spark.read.shapefile('https://astraea.box.com/v/TerrestrialEcosystems.zip')
Again this reads the shapefile into a Spark DataFrame.
You can also read in shapefiles as GeoDataFrames. The code below reads the ecoregions shapefile into a GeoDataFrame after first importing GeoPandas.
import geopandas as gpd
df = gpd.read_file('https://astraea.box.com/v/TerrestrialEcosystems.zip')
If the shapefile is stored locally you can pass the relative or absolute path in place of the URL, as you did for reading it into a Spark DataFrame.
Comments
0 comments
Please sign in to leave a comment.