Tails I win
Here’s a problem from Alexander Bogomolny on Twitter:
a counterintuitive property of coin-tossing: If Alice tosses a coin until she sees a head followed by a tail, and Bob tosses a coin until he sees two heads in a row, then on average, Alice will require four tosses while Bob will require six tosses
Can I demonstrate this in R?
Let’s make Alice and Bob flip coins:
# returns the number of flips needed till the start of the pattern
flips_till_pattern <- function(pattern) {
a <- rbinom(n=50,size=1,prob = 0.5) # simulate tossing a coin 50 times.
# make a string of the form "Heads-Tails..." showing the sequence of random results
str <- paste(stringr::str_replace_all(stringr::str_replace_all(a,"1","Heads"),"0","Tails"), collapse = '-')
# a list of all the indices in the string where pattern holds
inds <- c(1, unlist(gregexpr(pattern, str)))
m <- substring(str, head(inds, -1), tail(inds, -1))
ln <- lengths(strsplit(m, '-'))
cumsum(c(ln[1], ln[-1] - 1))[1]+1
}
Playing this game 1000 times, here’s the average number of flips needed to get “Heads-Tails”
mean(replicate(1000,flips_till_pattern("Heads-Tails")))
## [1] 4.004
and the average number needed for “Heads-Heads”
mean(replicate(1000,flips_till_pattern("Heads-Heads")))
## [1] 5.899
More general solution
Now, you could say the above solution is terribly inelegant. There’s gotta be a way to implement this in a single line. But one advantage of my crude, string-based approach is that I can easily test other versions of the game.
For example, here’s the number of flips needed to get the pattern “Heads-Heads-Heads”
mean(replicate(1000,flips_till_pattern("Heads-Heads-Heads")))
## [1] 11.683
Or how about “Heads-Tails-Heads”
mean(replicate(1000,flips_till_pattern("Heads-Tails-Heads")))
## [1] 8.641
Interesting that a sequence where you throw the same result multiple times in a row is rarer than one in which you throw a pattern.
Simpler solutions
In this solution, 1 = Heads, 0 = Tails, and I can compute all the locations in a vector of 50 tosses where I find the pattern 1,1:
v = rbinom(n=50, size = 1, prob = 0.5)
x = c(1,1)
idx <- which(v == x[1]) # two heads in a row
idx[sapply(idx, function(i) all(v[i:(i+(length(x)-1))] == x))]
## [1] 14 24 25 26 27 28 29 40 41 48
That’s a little shorter than the string-based solution above, but it’s not the one-line (or close) that I’d like. Any hints?
Update Yes, there are much better solutions.
from David Robinson tweet:
library(tidyverse)
crossing(trial = 1:10000,
flips = 1:100) %>%
mutate(heads = rbinom(n(), 1, 0.5)) %>%
group_by(trial) %>%
mutate(next_flip = lead(heads),
hh = heads & next_flip,
ht = heads & !next_flip) %>%
summarize(first_hh = which(hh)[1] + 1,
first_ht = which(ht)[1] + 1) %>%
summarize(first_hh = mean(first_hh),
first_ht = mean(first_ht))
## # A tibble: 1 x 2
## first_hh first_ht
## <dbl> <dbl>
## 1 5.94 3.99
and, if you’re so inclined, math.stackexchange.com has a mathematical solution (with fancy equations)
Thanks to R4ds and Twitter for the excellent solutions!
edit Now Correlaid has posted another great solution, with Markov diagrams.