WHO, Global Health Observatory - Data access and geospatial plottingΒΆ
Steven Fowler (see more projects here)ΒΆ
The Global Health Observatory (GHO) is the World Health Organisation's (WHO) gateway to share data on global health. It is structured into categories of indicators (3043 at time of writing) that describe health statistics. They generally all have dimensions of time (year) and country, some have additional dimensions for example 'Age Group', 'Sex', 'Severity'.
This report will look at how to access this data and give examples of geospatial charts that can be generated.
What I didΒΆ
The following show how python can be used to access and navigate the WHO's database thought their OAPI. Finally, there are some example geospatial plots of the WHO data. There is also an interactive plot written in D3 / JavaScript.
Why I did itΒΆ
The WHO provides an interesting and robust set of data that can be used for investigating global health trends. This investigation was done to demonstrate how this information can be accessed and show examples of how this data can be illustrated. This opening the possibility for focused studies.
What I learntΒΆ
The WHO's OAPI facilitates easy access to its databases, which require minimal data processing.
Show code
import requests as req
import pandas as pd
# from itables import show
import geopandas as gpd
import matplotlib.pyplot as plt
from IPython.display import Javascript
def get_df_from_GHO_URL(URL, record_path = ['value']):
"""Return dataframe based on URl from GHO OData API.
Args:
URL (string): URL @ GHO OData API
record_path (list, optional): Returned data is mostly nested JSON, record-path = ['value'] will return data. Defaults to ['value'].
Returns:
dataframe: pandas DF of data from URL, based on nested JSON.
"""
response = req.get(URL) # Download data into dictionary
if response.status_code == 200:
data = response.json()
else:
print(f"Error: {response.status_code}")
return None
# Return dataframe
return pd.json_normalize(data, record_path=record_path)
Structure of available dataΒΆ
The data sets provided by the GHO are divided into indicators (3043 at the time of writing), for example:
- Number of people dying from HIV-related causes (ID: HIV_0000000006)
- Number of laboratory scientists (ID: HRH_13)
- Penalties for drink driving (ID: SA_0000001726)
Indicators have a primary value that reflects upon the indicator. In general, this primary value has dimensions of country and year, and some indicators have additional indicators for example:
- Patient type
- Antibiotic
- Pregnancy status
Accessing a list of indicatorsΒΆ
A list of indicators can be found here: https://ghoapi.azureedge.net/api/Indicator.
For example:
Show code
INDICATOR_LIST_URL = "https://ghoapi.azureedge.net/api/Indicator"
indicatorList = get_df_from_GHO_URL(INDICATOR_LIST_URL)
print(f"\nTotal number of indicators: {len(indicatorList)}.\n")
# print(indicatorList.head(n = 10))
indicatorList[0:10]
# show(indicatorList)
Accessing list of dimensionsΒΆ
A list of dimensions can be found here: https://ghoapi.azureedge.net/api/Dimension.
And the an index for the dimension values can be found here: https://ghoapi.azureedge.net/api/DIMENSION/COUNTRY/DimensionValues.
For example:
Show code
DIMENSION_AVAILABLE_URL = "https://ghoapi.azureedge.net/api/Dimension"
dimensionsAvailable = get_df_from_GHO_URL(DIMENSION_AVAILABLE_URL)
print(f"\nTotal number of dimensions: {len(dimensionsAvailable)}.\n")
# print(dimensionsAvailable.head(n = 10))
dimensionsAvailable[1:10]
# show(dimensionsAvailable)
Show code
DIMENSION_VALUES_URL = "https://ghoapi.azureedge.net/api/DIMENSION/COUNTRY/DimensionValues"
dimensionValues = get_df_from_GHO_URL(DIMENSION_VALUES_URL)
# print(dimensionValues.head(n = 5))
dimensionValues[0:5]
# show(dimensionValues)
Example data - "Healthy life expectancy at birth (years)"ΒΆ
As an example indicator, Healthy life expectancy at birth (years) is the 'Average number of years that a person can expect to live in βfull healthβ from birth'. To access the data we need the associated indicator code. Then the data can be requested as was done in the above examples.
Show code
# Find indicator code
indicatorList[indicatorList.IndicatorName.str.contains("life expectancy", case=False)]
Show code
# Download data and cast to DF
INDICATOR_DATA = "https://ghoapi.azureedge.net/api/"
indicator_code = "WHOSIS_000002"
whosis_000002_data = get_df_from_GHO_URL(INDICATOR_DATA + indicator_code)
whosis_000002_data.dtypes
From the above table is can be seen that are 4 numeric dimensions:
TimeDim- The year associated with the valueNumericValue- The age (number of years) of healthy life expectancyLow,High- Confidence interval associated withNumericValue
There are also geographic dimensions, primarily SpatialDim which is the country code that is defined in the dimension values listed above. An additional dimension in this dataset is Sex, which is captured under Dim1.
Example geospatial chart - "Healthy life expectancy at birth (years)"ΒΆ
The following example uses WHO indicator WHOIS_000002 - "Healthy life expectancy at birth (years)". Plotted on a geospatial chart illustrates the lower life expectancy in the mid to southern africa region, while Japan stands out as a clear high life expectancy.
Show code
# Get world map geospatial data and 'join it' to WHO DF
# Convert DF to geoDF with defined 'geometry' column
url = "https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_countries.zip"
worldMap = gpd.read_file(url)
worldMap['SpatialDim'] = worldMap['ADM0_A3']
whosis_000002_GeoData = pd.merge(whosis_000002_data, worldMap, on='SpatialDim', how='left')
whosis_000002_GeoData = gpd.GeoDataFrame(whosis_000002_GeoData)
whosis_000002_GeoData = whosis_000002_GeoData.set_geometry('geometry')
# Plot data
fig, ax = plt.subplots(figsize=(15,7.5))
# Plot world map as base layer
whosis_000002_GeoData[whosis_000002_GeoData.Dim1 =='SEX_BTSX'].plot(ax=ax,
column='NumericValue',
cmap='Spectral',
edgecolor='black',
linewidth=0.1,
alpha=0.5,
legend=True,
legend_kwds={
'label': 'Age from birth (years)',
'orientation': 'horizontal',
'shrink': 0.5 }
)
ax.set_axis_off()
plt.title("Healthy life expectancy at birth")
plt.show()
Interactive plotΒΆ
An example of an interactive geospatial plot can be found here: Interactive geospatial chart.