This article introduces the GeoPandas library, which is a popular open-source project that makes working with geospatial vector data in Python easier. As the name suggests, GeoPandas is an extension of Pandas.
Definition of a GeoDataFrame
The GeoDataFrame is very similar to a standard Pandas DataFrame with much of the same structure and functionality. The primary difference is that a GeoDataFrame always contains a designated "geometry" column that is of type GeoSeries. When a spatial method is applied to the GeoDataFrame, it will act on this geometry column.
GeoPandas can read in virtually any vector-based data format like GeoJSON or ESRI shapefiles. GeoPandas is not used for raster data (or discrete gridded data) such as Earth observation imagery.
By way of example, we'll start by importing the GeoPandas library with the alias of gpd, and loading a vector dataset. You can pass in almost any vector-based data file into the gpd.read
method. In this case, we'll use a dataset of the world's countries available from Natural Earth that is built into the GeoPandas library. We read it into the variable world :
import geopandas as gpd world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
We can take a quick look at the data with the head
method. It looks very similar to a standard Pandas DataFrame, but note the geometry column. This represents the spatial features - in this case, countries of the world. These country features are either polygons or multipolygons. Multipolygons are countries that cannot be constructed spatially as one discrete shape, such as the United States with Alaska, Hawaii, and its territories.
world.head()
type(world)
The variable world is of type GeoDataFrame as expected.
Methods on GeoDataFrames
GeoDataFrames have a number of useful spatial methods that act on the specified geometry column. For example, we use the plot
method to display the geometries on a projected map of the world:
world.plot()
Any of the attribute calls or methods described for a GeoSeries will work on a GeoDataFrame, and will be applied to the geometry column. GeoDataFrames also have a few extra methods for input and output and for geocoding.
The GeoDataFrame also allows for the use of standard methods available in Pandas for non-geometry columns. In the code cell below we find the mean of the pop_est column by using the mean
method.
world['pop_est'].mean()
The Geometry Column
A GeoDataFrame may contain any number of columns with geometrical (Shapely) objects, but only one column can be the active geometry at a time.
To demonstrate this, we create a new geometrical column called centroid in our world GeoDataFrame below:
world['centroid'] = world.centroid world.head()
As expected the centroids have a point geometry. However, from the cell below, we can see that only the geometry column is the active geometry used in method calls to world.
world.geometry.name
The active geometry can always be accessed through the geometry
attribute:
world.geometry
We can change the active geometry to point to a new column using the set_geometry
method. Below, we set the active geometry to the centroid column and replot:
world = world.set_geometry('centroid') world.plot()
Now when we plot the geometry of the DataFrame it plots the centroids of each country feature. Note that resetting the geometry to another column does not change the column names. However, the geometry.name
attribute and geometry
method on world now returns the column name centroid and its corresponding data.
world.head()
world.geometry.name
world.geometry
Writing to File
Finally, you can write GeoDataFrames to other file formats with the to_file
method. GeoPandas infers the file type from the filename extension you provide it, and in some cases with the provided driver
function argument.
Note: we drop the original geometry column because GeoPandas cannot save GeoDataFrames with more than one geometry column.
world_centroids = world.drop(columns=['geometry']) world_centroids.to_file("countries.shp") # This writes our world GeoDataFrame to a shapefile world_centroids.to_file("coutries.geojson", driver='GeoJSON') # This writes our world GeoDataFrame to a GeoJSON file
Comments
0 comments
Please sign in to leave a comment.