A spatial join operation is analogous to a database table join or DataFrame merge operation, but considers the geographic relationships between records. In the context of EarthAI Notebooks, a spatial join is an operation that merges two DataFrames, each having a geometric object column, by some spatial relationship of their geometries.
The spatial join is important because it allows a variety of geographic data sources to be combined and reasoned over. We can use spatial joins to combine domain-specific information with raster @ref:catalogs.
This page discusses the case where both of the DataFrames are GeoPandas GeoDataFrames. The same example is presented for PySpark DataFrames in @ref:this page.
Let's get started with some basic imports.
from earthai.all import * import geopandas from shapely.geometry import Point %matplotlib inline
geo_admin_url = 'https://raw.githubusercontent.com/datasets/geo-admin1-us/master/data/admin1-us.geojson' adm1 = geopandas.read_file(geo_admin_url).drop(columns='id') adm1_alar = adm1[adm1.state_code.isin(['AL', 'AR'])] adm1_alar.plot()
Now we will construct a small DataFrame containing city locations as point geometries. Note the inclusion of Charlotte, North Carolina.
city_df = geopandas.GeoDataFrame([ {'city_name': 'Hot Springs', 'geom': Point(-93.055278, 34.497222)}, {'city_name': 'Tuscaloosa', 'geom': Point(-87.534607, 33.20654)}, {'city_name': 'Mobile', 'geom': Point(-88.043056, 30.694444)}, {'city_name': 'Little Rock', 'geom': Point(-92.331111, 34.736111)}, {'city_name': 'Charlotte', 'geom': Point(-80.843056, 35.227222)} ], geometry='geom', crs='epsg:4326') city_df
Spatial Join
We use the GeoPandas implementation of [spatial join called sjoin
][gpd_sjoin]. Let's join the city as the left hand side. We see that the resulting DataFrame has a single geometric column on it, from the left hand side. We will plot coloring the city locations by state that is found from the joined data.
sjoin_city_state = geopandas.sjoin(city_df, adm1_alar) sjoin_city_state.plot(column='state_code', cmap='Paired')
If we invert the join, we see the state polygon is the resulting geometry column.
sjoin_state_city = geopandas.sjoin(adm1_alar, city_df) sjoin_state_city.plot(column='state_code', cmap='Paired')
But observe the contents of the DataFrame are otherwise equivalent. The attribute columns are the same.
sorted(sjoin_state_city.columns), sorted(sjoin_city_state.columns)
And the number of records is the same. That means that each state geometry is repeated for each intersecting city.
len(sjoin_state_city), len(sjoin_city_state)
geopandas.sjoin(city_df, adm1_alar, how='left')
Comments
0 comments
Article is closed for comments.