Richard Sprague

My personal website

COVID-19 and its R-Squared Relationship to GDP Change

Created: 2020-09-04 ; Updated: 2020-09-04

The highly-respected data visualization site, Our World in Data, published the blog post Which countries have protected both health and the economy in the pandemic? that concludes:

Comparing the COVID-19 death rate with the latest GDP data, we in fact see the opposite: countries that have managed to protect their population’s health in the pandemic have generally also protected their economy too

But the post never actually quantifies this, relying instead on a chart that kinda-sorta seems to justify the conclusion. Can we do better?

Since Our World in Data freely opens all their source data, our first step is to read it into R:

data_raw <- read_csv(file.path("q2-gdp-growth-vs-confirmed-deaths-due-to-covid-19-per-million-people.csv"),
                     col_types = "f_cdid_i")  %>%
  transmute(country=Entity, date=mdy(Date),
            deaths = `Total confirmed deaths due to COVID-19 per million people (deaths per million)`,
           # population = `Population, total`, 
            gdp_growth = `GDP growth from previous year, 2020 Q2 (%)` )

data_raw %>% head() %>% knitr::kable(caption = "Raw data (first ten rows)" )
Table 1: Raw data (first ten rows)
country date deaths gdp_growth
Afghanistan 2019-12-31 0 NA
Afghanistan 2020-01-01 0 NA
Afghanistan 2020-01-02 0 NA
Afghanistan 2020-01-03 0 NA
Afghanistan 2020-01-04 0 NA
Afghanistan 2020-01-05 0 NA

To manipulate data, we first need to convert the table to something more Tidy, and just include data from 2020:

library(kableExtra)
options(kableExtra.html.bsTable = TRUE)

data <- data_raw %>% dplyr::filter(date > "2020-01-01" & !is.na(gdp_growth)) 
data_df <- data %>% group_by(country) %>% summarise(deaths = sum(deaths), gdp = first(gdp_growth)) %>% 
  dplyr::filter(deaths>0) %>% ungroup()
data_df %>% arrange(gdp) %>%  knitr::kable(col.names = c("Country", "Deaths Per 1M", "GDP Change"), digits = 1) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Country Deaths Per 1M GDP Change
United Kingdom 77002.9 -21.7
France 62977.7 -19.0
Italy 81760.4 -17.3
Belgium 114558.5 -14.5
Canada 26685.4 -13.5
Austria 10738.1 -13.3
Singapore 579.1 -13.2
Germany 14154.6 -11.7
Czech Republic 4416.2 -10.7
Japan 964.6 -10.0
United States 50593.8 -9.5
Netherlands 48798.8 -9.0
Denmark 14195.3 -8.5
Sweden 66939.9 -8.3
Norway 6306.8 -5.3
South Korea 867.9 -3.0
Taiwan 45.7 -0.6
China 583.3 3.2

And finally, we plot the result, which happily resembles the one at Our World in Data:

data_df %>% ggplot(aes(y = deaths, x = gdp)) + geom_point() + 
    geom_text(aes(label = country), hjust = 0, vjust = 1)  +
    labs(title = "COVID 19 Deaths and GDP Performance",
         subtitle = paste("R^2 =",format(cor(data_df$deaths, data_df$gdp) ^2, digits = 3)))

In the subtitle of the graph, I’ve computed the \(r^2\) value 0.366, a result that is not especially high. With values that vary between 0 (no association) and 1 (perfect association), I’d want the relationship to be quite a bit stronger before using this as the basis for any policy decisions.