Learn geopandas by plotting a United States shapefile

Learn geopandas by plotting a United States shapefile

In this tutorial we will take a look at the powerful geopandas library and use it to plot a map of the United States. You can run all of the python code examples in the tutorial by cloning the companion github repository.

I have used other GIS related libraries in python and let me say geopandas is a real joy to use!

Jonathan Cutrer

A quick note before we start
I assume you know some basic python and how to install jupyter to run the companion notebook. To start, clone my git repository with the following commands.

git clone https://github.com/joncutrer/geopandas-tutorial.git
cd geopandas-tutorial
jupyter notebook

Here are the commands you will need to run if have not already installed geopandas.

conda install geopandas
conda install -c conda-forge descartes

Geo Data Files

The data we will be working with comes from the US Census and is in a common shapefile format. A shapefile actually consists of 3 separate files with the same file name.

  • filename.shp – Shapefile shape format, contains the actual geometry data.
  • filename.dbf – Shapefile attribute format, this file stores the attributes for each shape. it uses the dBase IV format.
  • filename.shx – Shapefile index, this file makes working with larger shapefiles faster. It contains no unique data, only an index of record offsets.

The shapefile format is a geospatial vector data format for geographic information system (GIS) software. It is developed and regulated by Esri as a mostly open specification for data interoperability among Esri and other GIS software products.

Wikipedia
ls -al data
total 364
drwxrwxr-x 2 sysadmin sysadmin   4096 May  1 20:37 ./
drwxrwxr-x 4 sysadmin sysadmin   4096 May  2 03:20 ../
-rw-rw-r-- 1 sysadmin sysadmin      5 May  1 20:37 usa-states-census-2014.cpg
-rw-rw-r-- 1 sysadmin sysadmin  15201 May  1 20:37 usa-states-census-2014.dbf
-rw-rw-r-- 1 sysadmin sysadmin    143 May  1 20:37 usa-states-census-2014.prj
-rw-rw-r-- 1 sysadmin sysadmin    257 May  1 20:37 usa-states-census-2014.qpj
-rw-rw-r-- 1 sysadmin sysadmin 309672 May  1 20:37 usa-states-census-2014.shp
-rw-rw-r-- 1 sysadmin sysadmin  18517 May  1 20:37 usa-states-census-2014.shp.xml
-rw-rw-r-- 1 sysadmin sysadmin    564 May  1 20:37 usa-states-census-2014.shx

Reading shapefiles

Import the geopandas library and matplotlib for later use.

import matplotlib.pyplot as plt
import geopandas

Use the geopandas.read_file() function to read the shapefile from disk. Geopandas will return a GeoDataFrame object which is similar to a pandas DataFrame.

states = geopandas.read_file('data/usa-states-census-2014.shp')
type(states)
geopandas.geodataframe.GeoDataFrame
states.head()
STATEFPSTATENSAFFGEOIDGEOIDSTUSPSNAMELSADALANDAWATERregiongeometry
006017797780400000US0606CACalifornia0040348382318120483271881WestMULTIPOLYGON Z (((-118.59397 33.46720 0.00000,…
111017023820400000US1111DCDistrict of Columbia0015835057818633500NortheastPOLYGON Z ((-77.11976 38.93434 0.00000, -77.04…
212002944780400000US1212FLFlorida0013890320085531407883551SoutheastMULTIPOLYGON Z (((-81.81169 24.56874 0.00000, …
313017053170400000US1313GAGeorgia001489635033994947080103SoutheastPOLYGON Z ((-85.60516 34.98468 0.00000, -85.47…
416017797830400000US1616IDIdaho002140454255492397728105WestPOLYGON Z ((-117.24303 44.39097 0.00000, -117….

Understanding Coordinate reference systems (CRS)

By default this shapefile contains very commons coordinates called WGS 84. While WGS 84 is very common in GIS mapping, Mercator projection is the de facto standard for Web mapping applications. If you want to learn more about coordinate systems, have a look at this excellent post EPSG 4326 vs EPSG 3857 by Lyzi Diamond.

states.crs

Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Geopandas requires we know the geospatial reference system identifier so here is a list of common ones.

  • “EPSG:4326” WGS84 Latitude/Longitude, used in GPS
  • “EPSG:3395” Spherical Mercator. Google Maps, OpenStreetMap, Bing Maps
  • “EPSG:32633” UTM Zones (North) – (Universal Transverse Mercator)
  • “EPSG:32733” UTM Zones (South) – (Universal Transverse Mercator)

To make the map look a little more familiar lets reproject it’s coordinates to Mercator.

states = states.to_crs("EPSG:3395")

Plotting Shapefiles

Now lets plot our GeoDataFrame and see what we get. Just like pandas, geopandas provides a .plot() method on GeoDataFrames.

states.plot()
GeoPandas plot of the United States

We can also plot the state polygons with no fill color by using GeoDataFrame.boundary.plot().

states.boundary.plot()
GeoPandas Outline Plot of the United States

Add some color to the map plot

Our map is bit small and only one solid color. Lets enlarge it and add a colormap

Here are some cmap codes you can play around with.

viridis, plasma, inferno, magma, cividis
Greys, Purples, Blues, Greens, Oranges, Reds
YlOrBr, OrRd, PuRd, RdPu, BuPu, GnBu, PuBu, YlGnBu, PuBuGn, BuGn, YlGn
PiYg, PRGn, BrBG, PuOr, RdGy, RdBu, RdYlBu, Spectral, coolwarm, bwr, seismic
twilight, twilight_shifted, hsv
Pastel1, Pastel2, PAired, Accent, Dark2, Set1, Set2, Set3, tab10, tab20, tab20b, tab20c

More info on colormaps can be found here https://matplotlib.org/tutorials/colors/colormaps.html

states.plot(cmap='magma', figsize=(12, 12))
GeoPandas plot of the United States using magma Colormap

Query the DataFrame for a specific state shape, I will plot Texas.

states[states['NAME'] == 'Texas']
STATEFPSTATENSAFFGEOIDGEOIDSTUSPSNAMELSADALANDAWATERregiongeometry
1648017798010400000US4848TXTexas0067660188707019059877230SouthwestPOLYGON Z ((-11869267.604 3729445.479 0.000, -…
states[states['NAME'] == 'Texas'].plot(figsize=(12, 12))
GeoPandas plot of Texas

Plotting multiple shapes

Plot multiple states together, here are the states that makeup the South East region of the United States.

southeast = states[states['STUSPS'].isin(['FL','GA','AL','SC','NC', 'TN', 'AR', 'LA', 'MS'])]
southeast
STATEFPSTATENSAFFGEOIDGEOIDSTUSPSNAMELSADALANDAWATERregiongeometry
212002944780400000US1212FLFlorida0013890320085531407883551SoutheastMULTIPOLYGON Z (((-9107236.006 2805107.013 0.0…
313017053170400000US1313GAGeorgia001489635033994947080103SoutheastPOLYGON Z ((-9529523.377 4137300.133 0.000, -9…
822016295430400000US2222LALouisiana0011190104397723750204105SoutheastPOLYGON Z ((-10468824.609 3831551.686 0.000, -…
1547013258730400000US4747TNTennessee001068001307942352882756SoutheastPOLYGON Z ((-10052227.608 4143268.692 0.000, -…
2005000680850400000US0505ARArkansas001347716034342960200961SoutheastPOLYGON Z ((-10532818.563 4344142.083 0.000, -…
2637010276160400000US3737NCNorth Carolina0012591799595513472722504SoutheastPOLYGON Z ((-9382741.386 4167242.730 0.000, -9…
3028017797900400000US2828MSMississippi001215318999173928587545SoutheastPOLYGON Z ((-10199242.918 3645403.465 0.000, -…
3345017797990400000US4545SCSouth Carolina00778579139315074749305SoutheastPOLYGON Z ((-9278840.010 4102724.263 0.000, -9…
4101017797750400000US0101ALAlabama001311724031114594951242SoutheastPOLYGON Z ((-9848286.459 3726812.322 0.000, -9…
southeast.plot(cmap='tab10', figsize=(14, 12))
SouthWest Region Plot

Instead of supplying a list of states you may have noticed there is a region column in our GeoDataFrame. Lets query by region and plot the West Region.

west = states[states['region'] == 'West']

west.plot(cmap='Pastel2', figsize=(12, 12))
Plot of NorthEast United States

Add outlines and labels to each State

Here is another plot of the U.S NorthEast but this time we are going to use a lambda function to plot the state name over each state. We also plotting the state shapes with a black outline.

As a bonus code snippet, I have added a vertical watermark to the left side of the image.

fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()

west.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center', fontsize=14),axis=1);

west.boundary.plot(ax=ax, color='Black', linewidth=.4)

west.plot(ax=ax, cmap='Pastel2', figsize=(12, 12))

ax.text(-0.05, 0.5, 'https://jcutrer.com', transform=ax.transAxes,
        fontsize=20, color='gray', alpha=0.5,
        ha='center', va='center', rotation='90')
U.S. NorthEast States plotted with outlines and Name labels

Multiline Labels

Building on the above example, we can also use a newline (\n) character to create multiline labels. This is useful if you want to display some other data column from the GeoDataFrame. Here I am plotting the state name as well as the land area of the state in square miles.

import math
fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()

west.apply(lambda x: ax.annotate(
    s=x.NAME + "\n" + str(math.floor(x.ALAND / 2589988.1103)) + " sq mi", 
    xy=x.geometry.centroid.coords[0],
    ha='center', 
    fontsize=14
),axis=1);

west.boundary.plot(ax=ax, color='Black', linewidth=.4)

west.plot(ax=ax, cmap='Pastel2', figsize=(12, 12))
NorthEast States with Multiline Labels over shapes

Multiline Labels using a longitudinal offset

If we want to change the font size of our second data row or place the label somewhere other than directly below we will need to use a lat,long offset to draw a second label in another call to annotate()

To understand how we do this, lets first look at the data in x.geometry.centroid.coords[0]

The annotate() method requires we provide a tuple for attribute xy. In the previous apply lambda function we are passing x.geometry.centroid.coords[0].

# Get the first state
tmpdf = states.iloc[0]
# Which state
print(tmpdf.NAME)
# Get the centroid coordinates
tmpdf.geometry.centroid.coords[0]
California
(-13322855.654888818, 4465905.292663821)

This is the xy coordinates of the first state. This is where our label is plotted. We can pass an offset to xy like this.

( tmpdf.geometry.centroid.coords[0][0], tmpdf.geometry.centroid.coords[0][1] - 55000 )
(-13322855.654888818, 4410905.292663821)


Here is the complete example where we print 3 labels. State name, land area, and FIPS code. I will also use a different font size and color for the second two labels.

import math
fig = plt.figure(1, figsize=(25,15)) 
ax = fig.add_subplot()

# Label 1 State Name
west.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha='center', fontsize=14),axis=1);

# Label 2 Area using longitudinal offset
west.apply(
    lambda x: ax.annotate(
        s="Area(Sq/Mi): " + str(math.floor(x.ALAND / 2589988.1103)), 
        xy= (x.geometry.centroid.coords[0][0], x.geometry.centroid.coords[0][1] - 55000 ),
        ha='center', 
        color='#000077', # blue
        fontsize=10),axis=1);

# Label 3 FIPS Code using longitudinal Offset 
west.apply(
    lambda x: ax.annotate(
        s="FIPS Code: " + x.STATEFP, 
        xy= (x.geometry.centroid.coords[0][0] , x.geometry.centroid.coords[0][1] + 60000 ),
        ha='center',
        color='#770000', #red
        fontsize=10),axis=1);


west.boundary.plot(ax=ax, color='Black', linewidth=.4)

west.plot(ax=ax, cmap='Pastel2', figsize=(12, 12))
Multiple Labels plotted with xy offsets

Combining Overlay Maps

Here is an example where we create a larger boundary map and then overlay in a second map. It’s important to note that figsize must be specified in the first plot.

us_boundary_map = states.boundary.plot(figsize=(18, 12), color="Gray")
west.plot(ax=us_boundary_map,  color="DarkGray")
United State with NorthEast states highlighted

We can continue to chain our plots together to a plot of all othe United States Regions.

The first plot us_boundary_map serves as our base map.

Then we reference it with ax=us_boundary_map when we call plot() on each of our region GeoDataFrame.

west = states[states['region'] == 'West']
southwest = states[states['region'] == 'Southwest']
southeast = states[states['region'] == 'Southeast']
midwest = states[states['region'] == 'Midwest']
northeast = states[states['region'] == 'Northeast']

us_boundary_map = states.boundary.plot(figsize=(18, 12), color='Black', linewidth=.5)

west.plot(ax=us_boundary_map,  color="MistyRose")

southwest.plot(ax=us_boundary_map, color="PaleGoldenRod")

southeast.plot(ax=us_boundary_map, color="Plum")

midwest.plot(ax=us_boundary_map, color="PaleTurquoise")

final_map = northeast.plot(ax=us_boundary_map, color="LightPink")
United States map plotted with GeoPandas

Tuning map attributes based on plot size

If you intend to plot a small map the default linewidth of 1 is probably too large. You can use decimal numbers to plot smaller lines.

tiny_map = states.boundary.plot(figsize=(5, 5),  color="Black")
Small US Map plot with thick outline
tiny_map = states.boundary.plot(figsize=(5, 5),  color="Black", linewidth=.25)
Small US Map plot with thin outline

If you intend to generate a large map you should consider increasing the linewidth.

hires_map = states.boundary.plot(figsize=(50, 28),  color="Gray", linewidth=4)
United States Map Black Outline
us_map = states.boundary.plot(figsize=(25, 14),  color="#555555", linewidth=1)
us_map.axis('off')
United States Map axis off

Saving the map plot to disk

What if we want to save our map? We need to get the figure, trim whitespace and call savefig()

fig = us_map.get_figure()
fig.tight_layout()
fig.savefig('usa.png', dpi=96)
ls -al usa.png
-rw-rw-r-- 1 sysadmin sysadmin 236624 May  2 03:22 usa.png
from IPython.display import Image
Image(filename='usa.png') 
High Resolution Map Plot of the United States created with GeoPandas

Congratulations, you are well on your way to becoming a GIS expert with geopandas. I hope this tutorial has helped you learn a little more about GeoPandas.

If this GeoPandas tutorial inspires you to create something awesome, please come back here and share it with the rest of us in the comments section below.

If you enjoyed this article you might also likeĀ Learn geopandas by plotting tornados on a map


References

You can also download an offline version of this article here


5 Replies to “Learn geopandas by plotting a United States shapefile”

  1. I have one query. I want to annoate additional information for each State under its label.

    I see in your example you have:
    west.apply(lambda x: ax.annotate(s=x.NAME, xy=x.geometry.centroid.coords[0], ha=’center’, fontsize=14),axis=1);

    If I want to add some label with additional info below the actual State name like weather degress how can I offset it like 5pixel below the centroid ? do I have to use geojson and bokey ?

    1. Thanks for the great question!

      You can accomplish this in two ways.

      1. Use a newline \n
      2. Plot a second label passing an offset to xy.

      Additional examples have been added to the article above as well as the Jupyter notebook in the github repo.

      1. Using newline works but the second option I am unable to make it work. I want to use it so that I can show mix and max temperature in two different colors etc. If I just use \n + temperature its ignoring \n using second option.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.