COVID-19 and its R-Squared Relationship to GDP Change
The highly-respected data visualization site, Our World in Data, published the blog post Which countries have protected both health and the economy in the pandemic? that concludes:
Comparing the COVID-19 death rate with the latest GDP data, we in fact see the opposite: countries that have managed to protect their population’s health in the pandemic have generally also protected their economy too
But the post never actually quantifies this, relying instead on a chart that kinda-sorta seems to justify the conclusion. Can we do better?
Since Our World in Data freely opens all their source data, our first step is to read it into R:
data_raw <- read_csv(file.path("q2-gdp-growth-vs-confirmed-deaths-due-to-covid-19-per-million-people.csv"),
col_types = "f_cdid_i") %>%
transmute(country=Entity, date=mdy(Date),
deaths = `Total confirmed deaths due to COVID-19 per million people (deaths per million)`,
# population = `Population, total`,
gdp_growth = `GDP growth from previous year, 2020 Q2 (%)` )
data_raw %>% head() %>% knitr::kable(caption = "Raw data (first ten rows)" )
country | date | deaths | gdp_growth |
---|---|---|---|
Afghanistan | 2019-12-31 | 0 | NA |
Afghanistan | 2020-01-01 | 0 | NA |
Afghanistan | 2020-01-02 | 0 | NA |
Afghanistan | 2020-01-03 | 0 | NA |
Afghanistan | 2020-01-04 | 0 | NA |
Afghanistan | 2020-01-05 | 0 | NA |
To manipulate data, we first need to convert the table to something more Tidy, and just include data from 2020:
library(kableExtra)
options(kableExtra.html.bsTable = TRUE)
data <- data_raw %>% dplyr::filter(date > "2020-01-01" & !is.na(gdp_growth))
data_df <- data %>% group_by(country) %>% summarise(deaths = sum(deaths), gdp = first(gdp_growth)) %>%
dplyr::filter(deaths>0) %>% ungroup()
data_df %>% arrange(gdp) %>% knitr::kable(col.names = c("Country", "Deaths Per 1M", "GDP Change"), digits = 1) %>%
kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Country | Deaths Per 1M | GDP Change |
---|---|---|
United Kingdom | 77002.9 | -21.7 |
France | 62977.7 | -19.0 |
Italy | 81760.4 | -17.3 |
Belgium | 114558.5 | -14.5 |
Canada | 26685.4 | -13.5 |
Austria | 10738.1 | -13.3 |
Singapore | 579.1 | -13.2 |
Germany | 14154.6 | -11.7 |
Czech Republic | 4416.2 | -10.7 |
Japan | 964.6 | -10.0 |
United States | 50593.8 | -9.5 |
Netherlands | 48798.8 | -9.0 |
Denmark | 14195.3 | -8.5 |
Sweden | 66939.9 | -8.3 |
Norway | 6306.8 | -5.3 |
South Korea | 867.9 | -3.0 |
Taiwan | 45.7 | -0.6 |
China | 583.3 | 3.2 |
And finally, we plot the result, which happily resembles the one at Our World in Data:
data_df %>% ggplot(aes(y = deaths, x = gdp)) + geom_point() +
geom_text(aes(label = country), hjust = 0, vjust = 1) +
labs(title = "COVID 19 Deaths and GDP Performance",
subtitle = paste("R^2 =",format(cor(data_df$deaths, data_df$gdp) ^2, digits = 3)))
In the subtitle of the graph, I’ve computed the \(r^2\) value 0.366, a result that is not especially high. With values that vary between 0 (no association) and 1 (perfect association), I’d want the relationship to be quite a bit stronger before using this as the basis for any policy decisions.