{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This article introduces the [GeoPandas](https://geopandas.org/) library, which is a popular open-source project that makes working with geospatial vector data in Python easier. As the name suggests, GeoPandas is an extension of Pandas. \n", "\n", "# Definition of a GeoDataFrame\n", "\n", "The [GeoDataFrame](https://geopandas.org/reference/geopandas.GeoDataFrame.html) is very similar to a standard Pandas DataFrame with much of the same structure and functionality. The primary difference is that a GeoDataFrame always contains a designated \"geometry\" column that is of type [GeoSeries](https://geopandas.org/reference/geopandas.GeoSeries.html). When a spatial method is applied to the GeoDataFrame, it will act on this geometry column.\n", "\n", "GeoPandas can read in virtually any vector-based data format like [GeoJSON](https://geojson.org/) or [ESRI shapefiles](https://doc.arcgis.com/en/arcgis-online/reference/shapefiles.htm). GeoPandas is not used for raster data (or discrete gridded data) such as Earth observation imagery.\n", "\n", "By way of example, we'll start by importing the GeoPandas library with the alias of gpd, and loading a vector dataset. You can pass in almost any vector-based data file into the `gpd.read` method. In this case, we'll use a dataset of the world's countries available from [Natural Earth](https://www.naturalearthdata.com/) that is built into the GeoPandas library. We read it into the variable **world** :" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import geopandas as gpd\n", "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take a quick look at the data with the `head` method. It looks very similar to a standard Pandas DataFrame, but note the **geometry** column. This represents the spatial features - in this case, countries of the world. These country features are either polygons or multipolygons. Multipolygons are countries that cannot be constructed spatially as one discrete shape, such as the United States with Alaska, Hawaii, and its territories." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(world)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variable **world** is of type GeoDataFrame as expected." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Methods on GeoDataFrames\n", "\n", "GeoDataFrames have a number of useful spatial methods that act on the specified geometry column. For example, we use the `plot` method to display the geometries on a projected map of the world:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Any of the attribute calls or methods described for a [GeoSeries]() will work on a GeoDataFrame, and will be applied to the geometry column. GeoDataFrames also have a few extra methods for [input and output](https://geopandas.org/io.html) and for [geocoding](https://geopandas.org/geocoding.html).\n", "\n", "The GeoDataFrame also allows for the use of standard methods available in Pandas for non-geometry columns. In the code cell below we find the mean of the **pop_est** column by using the `mean` method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world['pop_est'].mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Geometry Column\n", "\n", "A GeoDataFrame may contain any number of columns with geometrical (Shapely) objects, but only one column can be the active geometry at a time.\n", "\n", "To demonstrate this, we create a new geometrical column called **centroid** in our **world** GeoDataFrame below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world['centroid'] = world.centroid\n", "world.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected the centroids have a point geometry. However, from the cell below, we can see that only the **geometry** column is the active geometry used in method calls to **world**." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.geometry.name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The active geometry can always be accessed through the `geometry` attribute:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.geometry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can change the active geometry to point to a new column using the `set_geometry` method. Below, we set the active geometry to the **centroid** column and replot:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world = world.set_geometry('centroid')\n", "world.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now when we plot the geometry of the DataFrame it plots the centroids of each country feature. Note that resetting the geometry to another column does not change the column names. However, the `geometry.name` attribute and `geometry` method on **world** now return the column name **centroid** and its corresponding data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.geometry.name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world.geometry" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Writing to File" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can write GeoDataFrames to other file formats with the `to_file` method. GeoPandas infers the file type from the filename extension you provide it, and in some cases with the provided `driver` function argument.\n", "\n", "*Note: we drop the original geometry column because GeoPandas cannot save GeoDataFrames with more than one geometry column.*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "world_centroids = world.drop(columns=['geometry'])\n", "\n", "world_centroids.to_file(\"countries.shp\") # This writes our world GeoDataFrame to a shapefile\n", "world_centroids.to_file(\"coutries.geojson\", driver='GeoJSON') # This writes our world GeoDataFrame to a GeoJSON file" ] } ], "metadata": { "kernelspec": { "display_name": "EarthAI Environment", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" }, "zendesk": { "draft": true, "id": 360042369192, "section_id": 360008608152, "title": "GeoPandas" } }, "nbformat": 4, "nbformat_minor": 4 }