Learn geopandas by plotting tornados on a map

Learn geopandas by plotting tornados on a map

In this tutorial we will take a look at the powerful geopandas library and use it to plot historical tornado data on a map of the United States.

We will use two different shapefiles from NOAA, the first dataset includes the origination point for each tornado. The second dataset includes a line path of each tornado. The data comes from the NOAA Storm Prediction Center an can be downloaded from https://www.spc.noaa.gov/gis/svrgis/

A quick note before we start

I assume you know some basic python and how to install jupyter to run the companion notebook. To get started, clone my git repository with the following commands.

git clone https://github.com/joncutrer/geopandas-tutorial
cd geopandas-tutorial
jupyter notebook

Here are the commands you will need to run if have not already install geopandas.

conda install geopandas
conda install -c conda-forge descartes

The data we will be working with comes from the US Census and is in a common shapefile format.

ls -al data
total 11044
drwxrwxr-x 2 sysadmin sysadmin    4096 May  3 19:45 ./
drwxrwxr-x 5 sysadmin sysadmin    4096 May  6 23:11 ../
-rw-rw-r-- 1 sysadmin sysadmin 3929143 Oct  1  2019 1950-2018-torn-aspath.zip
-rw-rw-r-- 1 sysadmin sysadmin 3191635 Oct  1  2019 1950-2018-torn-initpoint.zip
-rw-rw-r-- 1 sysadmin sysadmin 3808548 May  2  2019 cb_2018_48_tract_500k.zip
-rw-rw-r-- 1 sysadmin sysadmin       5 May  1 20:37 usa-states-census-2014.cpg
-rw-rw-r-- 1 sysadmin sysadmin   15201 May  1 20:37 usa-states-census-2014.dbf
-rw-rw-r-- 1 sysadmin sysadmin     143 May  1 20:37 usa-states-census-2014.prj
-rw-rw-r-- 1 sysadmin sysadmin     257 May  1 20:37 usa-states-census-2014.qpj
-rw-rw-r-- 1 sysadmin sysadmin  309672 May  1 20:37 usa-states-census-2014.shp
-rw-rw-r-- 1 sysadmin sysadmin   18517 May  1 20:37 usa-states-census-2014.shp.xml
-rw-rw-r-- 1 sysadmin sysadmin     564 May  1 20:37 usa-states-census-2014.shx

Import geopandas and load the U.S. map shapefile

import matplotlib.pyplot as plt
import geopandas
states = geopandas.read_file('data/usa-states-census-2014.shp')
states.head()
STATEFPSTATENSAFFGEOIDGEOIDSTUSPSNAMELSADALANDAWATERregiongeometry
006017797780400000US0606CACalifornia0040348382318120483271881WestMULTIPOLYGON Z (((-118.59397 33.46720 0.00000,…
111017023820400000US1111DCDistrict of Columbia0015835057818633500NortheastPOLYGON Z ((-77.11976 38.93434 0.00000, -77.04…
212002944780400000US1212FLFlorida0013890320085531407883551SoutheastMULTIPOLYGON Z (((-81.81169 24.56874 0.00000, …
313017053170400000US1313GAGeorgia001489635033994947080103SoutheastPOLYGON Z ((-85.60516 34.98468 0.00000, -85.47…
416017797830400000US1616IDIdaho002140454255492397728105WestPOLYGON Z ((-117.24303 44.39097 0.00000, -117….

Reproject coordinates

By default this shp file is in WGS 84.

states.crs

Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

To make the map look a little more familiar lets reproject it’s coordinates to Mercator.

states = states.to_crs("EPSG:3395")

Now lets plot the U.S. map and see what we get

states.plot()
US map

We can also plot the state polygons with no fill color by using boundary.

states.boundary.plot()
US states outline map

Load the NOAA US tornado dataset

Source: https://www.spc.noaa.gov/gis/svrgis/

One of the cool features of geopandas is that you can load a shapefile directly from a zip archive without first unzipping it. The file path to open a zipped shapefile is prefixed with zip://

However, there is a caveat. If the zip archive contains a subfolder you must specify it by appending !foldername to the path.

ls -al  data
total 11044
drwxrwxr-x 2 sysadmin sysadmin    4096 May  3 19:45 ./
drwxrwxr-x 5 sysadmin sysadmin    4096 May  6 23:11 ../
-rw-rw-r-- 1 sysadmin sysadmin 3929143 Oct  1  2019 1950-2018-torn-aspath.zip
-rw-rw-r-- 1 sysadmin sysadmin 3191635 Oct  1  2019 1950-2018-torn-initpoint.zip
-rw-rw-r-- 1 sysadmin sysadmin 3808548 May  2  2019 cb_2018_48_tract_500k.zip
-rw-rw-r-- 1 sysadmin sysadmin       5 May  1 20:37 usa-states-census-2014.cpg
-rw-rw-r-- 1 sysadmin sysadmin   15201 May  1 20:37 usa-states-census-2014.dbf
-rw-rw-r-- 1 sysadmin sysadmin     143 May  1 20:37 usa-states-census-2014.prj
-rw-rw-r-- 1 sysadmin sysadmin     257 May  1 20:37 usa-states-census-2014.qpj
-rw-rw-r-- 1 sysadmin sysadmin  309672 May  1 20:37 usa-states-census-2014.shp
-rw-rw-r-- 1 sysadmin sysadmin   18517 May  1 20:37 usa-states-census-2014.shp.xml
-rw-rw-r-- 1 sysadmin sysadmin     564 May  1 20:37 usa-states-census-2014.shx

Load the shapefile from zip archive

We will use the above mentioned technique to load the 1950-2018-torn-initpoint.shp shapefile from the zip archive.

tornados = geopandas.read_file(
    'zip://data/1950-2018-torn-initpoint.zip!1950-2018-torn-initpoint'
)

Note: If loading from inside a zip file confuses you don’t let is slow you down, just unzip the archive.

Lets have a look at the DataFrame

tornados.shape
(63645, 23)
tornados.info()

RangeIndex: 63645 entries, 0 to 63644
Data columns (total 23 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   om        63645 non-null  int64   
 1   yr        63645 non-null  int64   
 2   mo        63645 non-null  int64   
 3   dy        63645 non-null  int64   
 4   date      63645 non-null  object  
 5   time      63645 non-null  object  
 6   tz        63645 non-null  int64   
 7   st        63645 non-null  object  
 8   stf       63645 non-null  int64   
 9   stn       63645 non-null  int64   
 10  mag       63645 non-null  int64   
 11  inj       63645 non-null  int64   
 12  fat       63645 non-null  int64   
 13  loss      63645 non-null  float64 
 14  closs     63645 non-null  float64 
 15  slat      63645 non-null  float64 
 16  slon      63645 non-null  float64 
 17  elat      63645 non-null  float64 
 18  elon      63645 non-null  float64 
 19  len       63645 non-null  float64 
 20  wid       63645 non-null  int64   
 21  fc        63645 non-null  int64   
 22  geometry  63645 non-null  geometry
dtypes: float64(7), geometry(1), int64(12), object(3)
memory usage: 11.2+ MB
tornados
omyrmodydatetimetzststfstnlossclossslatslonelatelonlenwidfcgeometry
011950131950-01-0311:00:003MO2916.00.038.7700-90.220038.8300-90.03009.501500POINT (-90.22000 38.77000)
121950131950-01-0311:55:003IL1725.00.039.1000-89.300039.1200-89.23003.601300POINT (-89.30000 39.10000)
231950131950-01-0316:00:003OH3914.00.040.8800-84.580040.8801-84.57990.10100POINT (-84.58000 40.88000)
3419501131950-01-1305:25:003AR513.00.034.4000-94.370034.4001-94.36990.60170POINT (-94.37000 34.40000)
4519501251950-01-2519:30:003MO2925.00.037.6000-90.680037.6300-90.65002.303000POINT (-90.68000 37.60000)
63640617020201812272018-12-2710:15:003LA2207000.00.030.1302-92.364530.1321-92.35470.60250POINT (-92.36450 30.13020)
63641617021201812272018-12-2710:29:003MS28015000.00.032.6431-90.450932.6427-90.42881.291000POINT (-90.45090 32.64310)
63642617022201812312018-12-3112:35:003KY21055000.00.036.8900-87.987036.8915-87.97340.761250POINT (-87.98700 36.89000)
63643617023201812312018-12-3113:43:003IN18050000.00.038.1813-86.886338.2006-86.85852.01500POINT (-86.88630 38.18130)
63644617024201812312018-12-3114:38:003IN18020000.00.038.0935-86.086938.1000-86.04702.201400POINT (-86.08690 38.09350)

63645 rows × 23 columns

Reproject coordinates

We will also need to reproject it’s coordinates to Mercator so that our maps line up.

tornados = tornados.to_crs("EPSG:3395")

Take note how reprojecting our coordinates to Mercator moves the decimal point in our lat, long data. This will come into play later.

tornados[:1].geometry
0    POINT (-10043244.459 4662018.860)
Name: geometry, dtype: geometry

Our first plot of U.S. tornado data

tornados.plot( figsize=(12,9), color='red', marker='x', markersize=1)
red tornadoes plot

Now lets plot the tornados data on top of the U.S. basemap

import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(12,9)) 
ax = fig.add_subplot()
states.boundary.plot(ax=ax, color='black', linewidth=.8)

tornados.plot(ax=ax, color='red', marker='.', markersize=1)
red tornadoes plot

Wow, that’s great!

We even chose an ominous red v-shaped marker to represent the tornados. The two maps line up perfectly but it looks like we have some outliers data for Hawaii, Alaska, and Puerto Rico that are stretching out our map canvas.

We want to focus on the lower 48 states so now we need to narrow down our view. We can accomplish this in a couple of ways. One way is to remove any tornados from our dataframe that are not in the lower 48. Another approach is to simply limit the viewport area of the map with ax.set_xlim() and ax.set_ylim().

Our first thought might be to pass in typical lat,long data such as ax.set_xlim(-101.12345, 72.12345). After we reprojected our maps to Mercator our coordinates data now looks like this ax.set_xlim(-10112345, 7212345).

import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(12,9)) 
ax = fig.add_subplot()

# US Lower 48 Bounding Box
# -141.00000, 26.00000, -65.50000, 72.00000

ax.set_xlim(-14100244, -7200000)
ax.set_ylim(2600000, 6550000)

fig.suptitle('United States Tornado Map (1950-2018)', fontsize=16)

states.boundary.plot(ax=ax, color='black', linewidth=.8)

tornados.plot(ax=ax, color='red', marker='v', markersize=8)
US Tornados Plot

Hey! That was a success, now let’s press on.

Get total tornados by State

I already know the answer to this question (because its my home) but… Which state do you think has had the most tornados?

Hint: In Texas we call them twisters

# Create a copy of the original DataFrame
twisters_by_state = tornados.copy()
# Add a new column and set the value to 1
twisters_by_state['tornados'] = 1

# use groupby() and count() to total up all the tornadoes by state
twisters_by_state = twisters_by_state[['st','tornados']].groupby('st').count()

# sort by most tornadoes first
twisters_by_state.sort_values('tornados', ascending=False).head(10)

Yep, Texas has the most tornados!

We can also easily plot this data by calling plot()

twisters_by_state.plot.bar(figsize=(12,6), title='Tornados by State (1950-2018)')
Tornados by State Graph

Notice that our sorting did not stick, this is because we did not reassign the DataFrame only output it to the screen.

We can chain in a call to .sort_values() to get our ordering to work in the plot.

I will also select the top 10 be adding [:10] which means “slice first 10 rows”.

Now we can see our state labels a little better.

twisters_by_state.sort_values('tornados', ascending=False)[:10].plot.bar(figsize=(12,6), title='Tornados by State (Top 10)')
Top 10 States with the most tornados

If we just want a list of the top 10 states we can do something like this.

top10 = twisters_by_state.sort_values('tornados', ascending=False)[:10].index
top10
Index(['TX', 'KS', 'OK', 'FL', 'NE', 'IA', 'IL', 'MO', 'MS', 'CO'], dtype='object', name='st')

Plot tornados by state

Now that we know that Texas has had the most tornados lets subset our tornado dataset to only show Texas Twisters.

texas_map = states[states['NAME'] == 'Texas']
texas_map.plot()
texas_twisters = tornados[tornados['st'] == 'TX']
texas_twisters
omyrmodydatetimetzststfstnlossclossslatslonelatelonlenwidfcgeometry
6719501261950-01-2618:00:003TX4810.00.026.8800-98.120026.8800-98.05004.701330POINT (-10922668.437 3089173.889)
7819502111950-02-1113:10:003TX4824.00.029.4200-95.250029.5200-95.13009.904000POINT (-10603181.498 3408227.270)
63637617017201812262018-12-2615:40:003TX48030000.00.033.7772-100.779233.7819-100.77800.34300POINT (-11218689.227 3975169.719)
63638617018201812262018-12-2616:23:003TX4800.00.034.9310-100.894134.9342-100.88490.571000POINT (-11231479.836 4130042.118)

8804 rows × 23 columns

texas_twisters.plot(markersize=1)

Now that we have both of our DataFrames texas_map and texas_twisters lets overlay combine and plot them.

import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(10,10)) 
ax = fig.add_subplot()

# Adjust bounding box to only Texas
# -119.00000, 29.30000, -103.80000, 44.00000

ax.set_xlim(-11910000, -10380000)
ax.set_ylim(2930000, 4400000)

fig.suptitle('Texas Tornados (1950-2018)', fontsize=16)

texas_map.boundary.plot(ax=ax, color='black', linewidth=.8)

texas_twisters.plot(ax=ax, color='red', marker='v', markersize=20)

Now lets move over to the path shapefile and see what we have to work with.

Load the NOAA US tornado paths dataset

tornado_paths = geopandas.read_file('zip://data/1950-2018-torn-aspath.zip!1950-2018-torn-aspath')
tornado_paths.head()
omyrmodydatetimetzststfstnlossclossslatslonelatelonlenwidfcgeometry
011950131950-01-0311:00:003MO2916.00.038.77-90.2238.8300-90.03009.51500LINESTRING (-90.22000 38.77000, -90.03000 38.8…
121950131950-01-0311:55:003IL1725.00.039.10-89.3039.1200-89.23003.61300LINESTRING (-89.30000 39.10000, -89.23000 39.1…
231950131950-01-0316:00:003OH3914.00.040.88-84.5840.8801-84.57990.1100LINESTRING (-84.58000 40.88000, -84.57990 40.8…
3419501131950-01-1305:25:003AR513.00.034.40-94.3734.4001-94.36990.6170LINESTRING (-94.37000 34.40000, -94.36990 34.4…
4519501251950-01-2519:30:003MO2925.00.037.60-90.6837.6300-90.65002.33000LINESTRING (-90.68000 37.60000, -90.65000 37.6…

Reproject to Mercator

tornado_paths = tornado_paths.to_crs("EPSG:3395")
tornado_paths.plot(figsize=(14,8))
us tornado paths
texas_twister_paths = tornado_paths[tornado_paths['st'] == 'TX']
texas_twister_paths.head()
omyrmodydatetimetzststfstnlossclossslatslonelatelonlenwidfcgeometry
6719501261950-01-2618:00:003TX4810.00.026.88-98.1226.88-98.054.71330LINESTRING (-10922668.437 3089173.889, -109148…
7819502111950-02-1113:10:003TX4824.00.029.42-95.2529.52-95.139.94000LINESTRING (-10603181.498 3408227.270, -105898…
8919502111950-02-1113:50:003TX4834.00.029.67-95.0529.83-95.0012.010000LINESTRING (-10580917.600 3440054.481, -105753…
91019502111950-02-1121:00:003TX4845.00.032.35-95.2032.42-95.204.61000LINESTRING (-10597615.524 3786480.059, -105976…
101119502111950-02-1123:55:003TX4855.00.032.98-94.6333.00-94.704.5670LINESTRING (-10534163.414 3869391.894, -105419…

5 rows × 23 columns

Plot tornado points and paths for Texas

Now that we have our tornado paths DataFrame narrowed down to Texas lets plot the paths.

For this next map I will plot the start point of each Tornado as pink and the path data as Red.

As you can see, path data does not exist for all recorded tornados.

import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(10,10)) 
ax = fig.add_subplot()

# Adjust bounding box to only Texas
# -119.00000, 29.30000, -103.80000, 44.00000

ax.set_xlim(-11910000, -10380000)
ax.set_ylim(2930000, 4400000)

fig.suptitle('Texas Tornado Paths (1950-2018)', fontsize=16)

texas_map.boundary.plot(ax=ax, color='black', linewidth=.8)
texas_twisters.plot(ax=ax, color='pink', marker='v', markersize=10)
texas_twister_paths.plot(ax=ax, color='red')
Texas Tornado Paths (1950-2018)

Color coded tornados by year

In this variation of the Texas tornado map we are color coding the tornado markers and paths by year with a colormap file. To do this we add the parameter column='yr' to the call to plot().

The colormap used cmap='coolwarm' allows us to see older tornado records as cool blue colors and more recent tornados as warm red colors. Note that we also removed the color parameter as cmap and color cannot be used simultaneously.

import matplotlib.pyplot as plt
fig = plt.figure(1, figsize=(10,10)) 
ax = fig.add_subplot()

# Adjust bounding box to only Texas
# -119.00000, 29.30000, -103.80000, 44.00000

ax.set_xlim(-11910000, -10380000)
ax.set_ylim(2930000, 4400000)

fig.suptitle('Texas Tornado Paths (1950-2018)', fontsize=16)

texas_map.boundary.plot(ax=ax, color='black', linewidth=.8)
texas_twisters.plot(ax=ax, column='yr', cmap="coolwarm", marker='v', markersize=10)
texas_twister_paths.plot(ax=ax, column='yr', cmap="coolwarm")
Texas Tornado Paths (1950-2018)

Congratulations, you are well on your way to becoming a GIS expert with geopandas. I hope you have enjoyed this Python Tutorial, let me know by leaving a comment below.

If you enjoyed this article you might also like Learn geopandas by plotting a United States shapefile


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.