Gideon’s Wisdom of Crowds Experiment

science
Published

November 7, 2019

My grad school friend conducted a fun experiment back in 2012 which I’m embarrassed to say I just learned about. He asked his zillions of Google+ followers to guess the number of cheerios in a jar, hoping to test the idea of the “wisdom of crowds”. He has released the raw data, so I used it as an excuse for another R-tude.

After placing his original spreadsheet into an R dataframe called gideon_woc, I generated this quick overall summary:

Code
gideon_woc %>% group_by(type) %>% summarise(mean = mean(value), median = median(value)) %>% knitr::kable(digits = 0, caption = "Final summary of all data collected.")

Gideon received 2,238 valid guesses made in multiple rounds, here organized by type. Some of the people guesses had access to the other people’s guesses (“Shared”) while others were blind to the other guesses (“GR”). Since the actual number of cheerios in the jar is 467, you can see that blinding appears to have made a significant difference in the final guesses.

The good news is that so far our math agrees.

Here’s what we get when we graph each version of the guess. The red dots are outliers, i.e. guesses that fit outside the middle 75% of all guesses.

Code
gideon_woc %>% dplyr::filter(type != "Combined: All Groups") %>% ggplot(aes(x=type,y=value)) + geom_boxplot(outlier.color = "red") + theme(axis.text.x = element_text(angle=90))

Outliers

One thing that astounds me is the total number of such outliers, as you can see in this table. A lot of people thought the jar contained many multiples more objects than it actually did.

Code
gideon_woc %>% 
  group_by(type) %>%
  summarise(min = min(value), max = max(value), mean = mean(value), median = median(value)) %>% knitr::kable(caption = "Summary after removing outliers")

Now let’s remove those outliers and see what we get.

Code
outlier <- function(x) {
  b <- boxplot.stats(x)
  b$out}
outliers <- gideon_woc %>% dplyr::filter(type != "Combined: All Groups") %>% group_by(type) %>% pull(value) %>% outlier() 
gideon_woc %>% dplyr::filter(!(value %in% outliers)) %>% 
  group_by(type) %>%
  summarise(min = min(value), max = max(value), mean = mean(value), median = median(value)) %>% knitr::kable()
gideon_woc %>% dplyr::filter(!(value %in% outliers)) %>% 
  ggplot(aes(x=type,y=value)) + geom_boxplot(outlier.color = "red") + theme(axis.text.x = element_text(angle=90))

Gideon’s post asks a bunch of questions for statistics experts, hoping to understand just how significant the results were. Unfortunately that’s all the time I have for today. Hopefully I can revisit this post to learn some other interesting insights.