Installation

conda install -c ioos geopandas=0.2.0.dev0

Using SD to understand the SD Fever

In this script, we will use georeferenced data at a national level to simulate a multiregional infectious disease model. We’ll then present the advantages to project spatial data produced by our simulation back on a map.

%matplotlib inline
import pandas as pd
import pysd
/Users/houghton/anaconda/lib/python2.7/site-packages/pandas/computation/__init__.py:19: UserWarning: The installed version of numexpr 2.4.4 is not supported in pandas and will be not be used

  UserWarning)

The model we’ll use to represent the dynamics of the disease is a simple SIR model, with individuals aggregated into stocks by their health condition S-Susceptible, I-Infectious, R-Recovered. We assume that the complete population is susceptible, therefore the initial value of susceptible stock is equal to the total population. In addition, we built on the hypothesis that from now all infected person are reported.

from IPython.display import Image
Image(filename='../../models/SD_Fever/SIR_model.png')
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_4_0.png

In Vensim our model was parameterized with 1000 suceptible, 5 infectious and 0 recovered individuals, a recovery period of 5 days and a contact infectivity of 70%.

When we do not specificy anything else, the parameters and setting (e.g. timestep, simulation time) from the Vensim model are used.

model = pysd.read_vensim('../../models/SD_Fever/SIR_Simple.mdl')
result = model.run()
result.plot();
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_6_0.png

Modify parameter values

As we have seen before, we can specify changes to the parameters of the model in the call to the run function. Here we set the contact infectivity to 30% before running the simulation again. If you like, try what happens when you change some of the other parameters.

result = model.run(params={ 'total_population':1000,
                            'contact_infectivity':.3,
                            'recovery_period': 5
                          })
result.plot();
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_9_0.png

Change Model settings

We can also change in a very simpe manner the simulation time and timestep of the model. An easy way to do it is to use numpy linspace which returns evenly spaced numbers over a specified interval.

np.linspace(Start, Stop, Number of timestamps)

import numpy as np
sim_time = 10
np.linspace(0, sim_time, num=sim_time*4+1)
array([  0.  ,   0.25,   0.5 ,   0.75,   1.  ,   1.25,   1.5 ,   1.75,
         2.  ,   2.25,   2.5 ,   2.75,   3.  ,   3.25,   3.5 ,   3.75,
         4.  ,   4.25,   4.5 ,   4.75,   5.  ,   5.25,   5.5 ,   5.75,
         6.  ,   6.25,   6.5 ,   6.75,   7.  ,   7.25,   7.5 ,   7.75,
         8.  ,   8.25,   8.5 ,   8.75,   9.  ,   9.25,   9.5 ,   9.75,  10.  ])

We can use the return_timestamps keyword argument in PySD. This argument expects a list of timestamps, and will return simulation results at those timestamps.

model.run(return_timestamps=np.linspace(0, sim_time, num=sim_time*2+1))
cumulative cases infectious recovered susceptible
0.0 0.000000 5.000000 0.000000 1000.000000
0.5 0.750000 5.250000 0.500000 999.250000
1.0 1.536909 5.511909 1.025000 998.463091
1.5 2.362425 5.786234 1.576191 997.637575
2.0 3.228310 6.073495 2.154814 996.771690
2.5 4.136393 6.374229 2.762164 995.863607
3.0 5.088572 6.688986 3.399587 994.911428
3.5 6.086815 7.018329 4.068485 993.913185
4.0 7.133156 7.362838 4.770318 992.866844
4.5 8.229704 7.723102 5.506602 991.770296
5.0 9.378635 8.099723 6.278912 990.621365
5.5 10.582199 8.493314 7.088885 989.417801
6.0 11.842715 8.904499 7.938216 988.157285
6.5 13.162571 9.333905 8.828666 986.837429
7.0 14.544228 9.782172 9.762056 985.455772
7.5 15.990213 10.249939 10.740274 984.009787
8.0 17.503119 10.737852 11.765268 982.496881
8.5 19.085605 11.246552 12.839053 980.914395
9.0 20.740391 11.776683 13.963708 979.259609
9.5 22.470255 12.328879 15.141376 977.529745
10.0 24.278032 12.903768 16.374264 975.721968

Geographical Information

Geospatial information as area on a map linked to several properties are typically stored into shapefiles.

For this script, we will use geopandas library to manage the shapefiles, and utilize its inherent plotting functionality.

import geopandas as gp

shapefile = '../../data/SD_Fever/geo_df_EU.shp'
geo_data = gp.GeoDataFrame.from_file(shapefile)
geo_data.head(5)
---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-7-17f6f8f07da0> in <module>()
----> 1 import geopandas as gp
      2
      3 shapefile = '../../data/SD_Fever/geo_df_EU.shp'
      4 geo_data = gp.GeoDataFrame.from_file(shapefile)
      5 geo_data.head(5)


/Users/houghton/anaconda/lib/python2.7/site-packages/geopandas/__init__.pyc in <module>()
      2 from geopandas.geodataframe import GeoDataFrame
      3
----> 4 from geopandas.io.file import read_file
      5 from geopandas.io.sql import read_postgis
      6 from geopandas.tools import sjoin


/Users/houghton/anaconda/lib/python2.7/site-packages/geopandas/io/file.py in <module>()
      1 import os
      2
----> 3 import fiona
      4 import numpy as np
      5 from shapely.geometry import mapping


/Users/houghton/anaconda/lib/python2.7/site-packages/fiona/__init__.py in <module>()
     70 from six import string_types
     71
---> 72 from fiona.collection import Collection, BytesCollection, vsi_path
     73 from fiona._drivers import driver_count, GDALEnv, supported_drivers
     74 from fiona.odict import OrderedDict


/Users/houghton/anaconda/lib/python2.7/site-packages/fiona/collection.py in <module>()
      5 import sys
      6
----> 7 from fiona.ogrext import Iterator, ItemsIterator, KeysIterator
      8 from fiona.ogrext import Session, WritingSession
      9 from fiona.ogrext import (


ImportError: dlopen(/Users/houghton/anaconda/lib/python2.7/site-packages/fiona/ogrext.so, 2): Library not loaded: @rpath/libtiff.5.dylib
  Referenced from: /Users/houghton/anaconda/lib/libgdal.1.dylib
  Reason: Incompatible library version: libgdal.1.dylib requires version 8.0.0 or later, but libtiff.5.dylib provides version 7.0.0

Then we can project the geographic shape of the elements on a map.

import matplotlib.pyplot as plt

geo_data.plot()
plt.xlim([-28, 50])
plt.ylim([30, 75])
plt.show()
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_17_0.png

And plot on of the georeferenced property (e.g. population)

geo_data.plot(column='population', scheme='fisher_jenks', alpha=0.9, k=9, linewidth=0.1,
             cmap=plt.cm.YlOrRd, legend=False)
plt.xlim([-28, 50])
plt.ylim([30, 75])
plt.show()
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-9-c9605dc38e75> in <module>()
      1 geo_data.plot(column='population', scheme='fisher_jenks', alpha=0.9, k=9, linewidth=0.1,
----> 2              cmap=plt.cm.YlOrRd, legend=False)
      3 plt.xlim([-28, 50])
      4 plt.ylim([30, 75])
      5 plt.show()


/Users/houghton/anaconda/lib/python2.7/site-packages/geopandas/geodataframe.pyc in plot(self, *args, **kwargs)
    379
    380     def plot(self, *args, **kwargs):
--> 381         return plot_dataframe(self, *args, **kwargs)
    382
    383 def _dataframe_set_geometry(self, col, drop=False, inplace=False, crs=None):


TypeError: plot_dataframe() got an unexpected keyword argument 'linewidth'

Run the model for each country

We want to run the core SD model for each country, with country specific paramterization.

Thus, we formulate a function that based on each row parameterizes the model with the value from geodata, performs the simulation and finally returns the number of infectious individuals over time.

def runner(row):
    sim_time = 200
    params= {'total_population':row['population'],
             'contact_infectivity' : row['inf_rate']}
    res = model.run(params=params,
                    return_timestamps=np.linspace(0, sim_time, num=sim_time*2+1))
    return res['infectious']

Apply function along rows of the Dataframe.

We want to apply the function row-wise (by country) therefore we set axis to 1 (row) instead of default 0 (column). The result is a new dataframe with the produced simulation for each country.

res = geo_data.apply(runner, axis=1)
res.head()
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 ... 195.5 196.0 196.5 197.0 197.5 198.0 198.5 199.0 199.5 200.0
0 5.0 5.407815 5.848892 6.325945 6.841907 7.399952 8.003512 8.656299 9.362329 10.125943 ... 1.235898e+01 1.174243e+01 1.115664e+01 1.060006e+01 1.007126e+01 9.568833e+00 9.091472e+00 8.637925e+00 8.207004e+00 7.797580e+00
1 5.0 5.127795 5.258857 5.393268 5.531115 5.672485 5.817468 5.966157 6.118646 6.275032 ... 7.391140e+04 7.532471e+04 7.675456e+04 7.820069e+04 7.966278e+04 8.114054e+04 8.263360e+04 8.414161e+04 8.566416e+04 8.720083e+04
2 5.0 6.939664 9.631786 13.368267 18.554246 25.752025 35.742032 49.607441 68.851599 95.561003 ... 8.710440e-08 7.927009e-08 7.216718e-08 6.579565e-08 6.015551e-08 5.514593e-08 5.050006e-08 4.620362e-08 4.225663e-08 3.865907e-08
3 5.0 4.958609 4.917560 4.876852 4.836480 4.796443 4.756736 4.717359 4.678307 4.639579 ... 1.937845e-01 1.921802e-01 1.905892e-01 1.890113e-01 1.874465e-01 1.858947e-01 1.843557e-01 1.828294e-01 1.813158e-01 1.798147e-01
4 5.0 5.130475 5.264354 5.401727 5.542685 5.687321 5.835731 5.988013 6.144269 6.304603 ... 7.030861e+04 7.130697e+04 7.230348e+04 7.329749e+04 7.428830e+04 7.527523e+04 7.625757e+04 7.723462e+04 7.820564e+04 7.916990e+04

5 rows × 401 columns

Transpose simulation results for plotting

The pandas line plot assumes that rows represent the timeseries and columns the different objects. Since our data is not yet in this form, we have to transpose the data. In pandas all we have to do is add an .T at the end.

import pandas as pd
df = pd.DataFrame(res).T
df.head(2)
0 1 2 3 4 5 6 7 8 9 ... 28 29 30 31 32 33 34 35 36 37
0.0 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 ... 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000 5.000000
0.5 5.407815 5.127795 6.939664 4.958609 5.130475 4.763649 5.630614 5.293347 5.993628 5.197358 ... 7.459123 4.550577 5.240652 5.364989 5.125969 5.169376 5.245112 5.205252 4.595693 4.976995

2 rows × 38 columns

df.plot(legend=False);
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_27_0.png

Comparative Analysis

Next lets try to compare how severe a country is hit by the SD fever.

Rather than looking at the number of infectious persons over time, a better indicator for comparative analysis are the cumulative cases as percentage of population in each country.

We can reuse our code from before but instead of returning the number of infecious we return the cumulative cases.

def runner(row):
    sim_time = 200
    params= {'total_population':row['population'],
             'contact_infectivity' : row['inf_rate']}
    res = model.run(params=params,
                    return_timestamps=range(0,sim_time))
    return res['cumulative_cases']

#TIP: Ensure you are using lower case letters and the character _  not space
res = geo_data.apply(runner, axis=1)
res.head()
0 1 2 3 4 5 6 7 8 9 ... 190 191 192 193 194 195 196 197 198 199
0 0.0 1.931564 4.191064 6.834176 9.926028 13.542802 17.773617 22.722719 28.512051 35.284257 ... 2.643414e+06 2.643416e+06 2.643418e+06 2.643420e+06 2.643421e+06 2.643423e+06 2.643424e+06 2.643425e+06 2.643426e+06 2.643427e+06
1 0.0 1.284525 2.635551 4.056522 5.551057 7.122967 8.776255 10.515136 12.344040 14.267628 ... 3.207868e+05 3.353914e+05 3.505670e+05 3.663273e+05 3.826858e+05 3.996553e+05 4.172479e+05 4.354754e+05 4.543483e+05 4.738766e+05
2 0.0 6.044711 17.688969 40.119902 83.329609 166.565835 326.904336 635.757752 1230.660685 2376.436306 ... 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07 1.026049e+07
3 0.0 0.909294 1.803595 2.683151 3.548204 4.398995 5.235758 6.058724 6.868121 7.664172 ... 5.280475e+01 5.284336e+01 5.288134e+01 5.291870e+01 5.295543e+01 5.299156e+01 5.302710e+01 5.306205e+01 5.309642e+01 5.313023e+01
4 0.0 1.290563 2.649358 4.079994 5.586268 7.172179 8.841937 10.599976 12.450961 14.399808 ... 3.414903e+05 3.555252e+05 3.699699e+05 3.848232e+05 4.000827e+05 4.157447e+05 4.318042e+05 4.482554e+05 4.650906e+05 4.823014e+05

5 rows × 200 columns

The answer is a simple matrix operation: divide row-wise the elements of our computed values by the column of the original geo data set where we had the population in each country.

Let’s try to perform this type of operation on a minimal example.

# Create arbitrary column
column = pd.Series([10, 0])
column
0    10
1     0
dtype: int64
# Create arbitrary pandas dataframe
df = pd.DataFrame(np.random.randint(1,5,size=(2, 3)), columns=list('ABC'))
df
A B C
0 1 3 3
1 1 2 4
column*df
0 1 A B C
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN

Now we can translate this operation on our actual problem.

res = pd.DataFrame(res.T/geo_data["population"])
res.plot(legend=False);
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_37_0.png

Analysis of results

For example, we could study the impact of contact infectivity on the cumulative cases at the end of the simulation

geo_data['totalcases%pop'] = res.loc[199] # Slice the final value at the end of the simulation
df_scatter = pd.DataFrame(geo_data) # Geopandas dataframe to pandas Dataframe (geopandas tries to perform spatial analysis)
df_scatter.plot.scatter(x='inf_rate', y='totalcases%pop'); # Plot infectivity versus cumulative cases at the end of the simulation
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_39_0.png

How Spatial Analysis Leads to Insight

Finally, we present slighltly advanced Python scripts to get our simulation results projected on the map.

We merge the complete simulation results with our original georeferenced information just as we did in the step before.

geo_data.head(2)
country geometry inf_rate population totalcases%pop
0 Albania POLYGON ((20.59024743010491 41.85540416113361,... 0.356814 3639453.0 0.726325
1 Austria POLYGON ((16.97966678230404 48.12349701597631,... 0.250476 8210281.0 0.057717
res.head(2)
0 1 2 3 4 5 6 7 8 9 ... 28 29 30 31 32 33 34 35 36 37
0 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 ... 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
1 5.307292e-07 1.564532e-07 5.804221e-07 1.262086e-07 2.797414e-07 5.094781e-08 3.247481e-07 1.628636e-07 4.117605e-08 2.622006e-07 ... 4.582214e-07 1.133618e-08 4.005424e-08 1.709489e-07 5.763613e-08 1.868129e-07 2.840291e-07 7.280583e-07 1.592863e-08 2.077661e-08

2 rows × 38 columns

geo_data_merged = geo_data.merge(res.T, left_index=True, right_index=True)
geo_data_merged.head()
country geometry inf_rate population totalcases%pop 0 1 2 3 4 ... 190 191 192 193 194 195 196 197 198 199
0 Albania POLYGON ((20.59024743010491 41.85540416113361,... 0.356814 3639453.0 0.726325 0.0 5.307292e-07 1.151564e-06 1.877803e-06 2.727341e-06 ... 0.726322 0.726322 0.726323 0.726323 0.726324 0.726324 0.726324 0.726325 0.726325 0.726325
1 Austria POLYGON ((16.97966678230404 48.12349701597631,... 0.250476 8210281.0 0.057717 0.0 1.564532e-07 3.210062e-07 4.940783e-07 6.761105e-07 ... 0.039071 0.040850 0.042699 0.044618 0.046611 0.048677 0.050820 0.053040 0.055339 0.057717
2 Belgium POLYGON ((3.314971144228537 51.34578095153609,... 0.855631 10414336.0 0.985227 0.0 5.804221e-07 1.698521e-06 3.852373e-06 8.001433e-06 ... 0.985227 0.985227 0.985227 0.985227 0.985227 0.985227 0.985227 0.985227 0.985227 0.985227
3 Bulgaria POLYGON ((22.65714969248299 44.23492300066128,... 0.183375 7204687.0 0.000007 0.0 1.262086e-07 2.503363e-07 3.724174e-07 4.924856e-07 ... 0.000007 0.000007 0.000007 0.000007 0.000007 0.000007 0.000007 0.000007 0.000007 0.000007
4 Bosnia and Herzegovina POLYGON ((19.00548628101012 44.86023366960916,... 0.251521 4613414.0 0.104543 0.0 2.797414e-07 5.742728e-07 8.843763e-07 1.210875e-06 ... 0.074021 0.077063 0.080194 0.083414 0.086722 0.090116 0.093598 0.097163 0.100813 0.104543

5 rows × 205 columns

Plotting simulation results on map with Ipywidgets

Ipywidgets are interactive HTML widgets for IPython notebooks. Users gain control of their data and can visualize changes in the data.

import matplotlib as mpl
from ipywidgets import interact, FloatSlider, IntSlider,RadioButtons, Dropdown
sim_time = 200
slider_time = IntSlider(description = 'Time Select',
                        min=0, max=sim_time-1, value=1)
@interact( time = slider_time) # Scenario = select_scenario,
def update_map(time): # Scenario
    ax = geo_data_merged.plot(column=time, scheme='fisher-jenks', alpha=0.9, k=9, linewidth=0.1,
             cmap=plt.cm.Reds, legend=True,  figsize=(20, 30))
    plt.xlim(-28, 50)
    plt.ylim(30, 75)
    plt.xticks([])
    plt.yticks([])
C:UsersSchwarzAppDataLocalEnthoughtCanopy32Userlibsite-packagesgeopandasgeodataframe.py:447: UserWarning: Unrecognized scheme "fisher-jenks". Using "Quantiles" instead
  return plot_dataframe(self, *args, **kwargs)
../../_images/Doing_more_with_spatialdata_multi_regional_SIR_Model_46_1.png

...