Reevaluating my sleep experiments
Way back in 2015, I was just learning R and produced a simple analysis of my sleep patterns when I experimented with taking potato starch.
What happens when I re-evaluate the data using better R skills? This post will show how I write the code. Check the Personal Science Guide to the Microbiome for a summary of the conclusions.
To start, I load the original data into the variable track_data
, which
is derived from an Excel spreadsheet with rows representing my
self-tracking results from each day of the year. The sheet has columns
like Date
, Vitamin D
, Fish Oil
and many other variables
representing what I did that day and, if relevant, the quantity. For
purposes below, I’ll especially use the columns related to sleep:
Z
: Total amount of sleep that night (hours)REM
: Time spent in REM sleep (hours)Deep
: Time spent in deep sleep (hours)Potato Starch
: Quantity of potato starch consumed the previous day (tablespoons)
Let’s do a super-dumb t-test to compare the days with and without potato
starch. A t-test is a quick way to compare two lists of numbers to see
the likelihood that any differences between them are due to chance. R
makes this super-easy with the function t.test()
: you simply give it
two vectors of numbers and it will calculate the means, the deviation
from those means, and some summary information that includes a
p-value
. You shouldn’t rely on p-values for hard-and-fast confirmation
that something is statistically significant, but for an initial
guesstimate, statisticians use the convention that p < 0.05
is worthy
of further investigation, while values greater than that are generally
simply due to chance.
R Note: Please see the handy
cut()
function, which lets me chop a vector into an arbitrary number of factors. Because I have so many different amounts of potato starch, the analysis is easier if I can divide the amounts into more qualitative buckets (None/Low/High).
experiment_days <- track_data %>% dplyr::filter(Date < as.Date("2015-04-10")) %>%
dplyr::filter(!is.na(`Potato Starch`) & !is.na(Z)) %>%
select(Date,PS = `Potato Starch`, Z, REM, Alcohol, Deep, `Vitamin D`) %>%
group_by(Week = cut(Date, "1 week")) %>%
mutate(MeanZ = mean(Z),
Date = as.Date(Date),
`Potato Starch` = cut(PS,breaks = c(0,.2,2,8), include.lowest = TRUE, labels =c("None", "Low", "High")),
REM = as.numeric(REM) * 24,
Deep = as.numeric(Deep) * 24)
t.test(experiment_days %>% dplyr::filter(PS>0 ) %>% pull(Z),
experiment_days %>% dplyr::filter(PS == 0) %>% pull(Z))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(PS > 0) %>% pull(Z) and experiment_days %>% dplyr::filter(PS == 0) %>% pull(Z)
## t = 1.04, df = 61.824, p-value = 0.3024
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1419544 0.4498331
## sample estimates:
## mean of x mean of y
## 6.482676 6.328737
with(experiment_days,table(cut(PS,breaks = c(0,.2,2,8), include.lowest = TRUE, labels =c("None", "Low", "High")))) %>% knitr::kable(col.names = c("PS","Days"))
PS | Days |
---|---|
None | 124 |
Low | 24 |
High | 12 |
Here’s a more detailed breakdown of the number of days involved:
with(experiment_days, table(PS, cut(Z,4))) %>% knitr::kable()# %>% as.matrix() %>% prop.table() * 100
(2.99,4.38] | (4.38,5.75] | (5.75,7.12] | (7.12,8.51] | |
---|---|---|---|---|
0 | 2 | 25 | 79 | 18 |
1 | 0 | 1 | 11 | 4 |
1.5 | 0 | 0 | 1 | 0 |
2 | 0 | 1 | 6 | 0 |
2.5 | 0 | 1 | 0 | 0 |
3 | 0 | 0 | 4 | 1 |
4 | 0 | 1 | 3 | 1 |
8 | 0 | 0 | 1 | 0 |
Let’s look at Bifidobacterium levels during this period
bifido <- mhg_taxa(subset_samples(sprague.genus.norm,Site == "gut" &
Date > as.Date("2014-5-15") &
Date < as.Date("2015-05-01")),
"Bifidobacterium") %>% mutate(Date = date, bifido = abundance/10000) %>%
select(Date,bifido)
# insert one new row (not in my current uBiome Explorer results for some reason, but for which I have JSON data)
bifido <- rbind(bifido,
c("2015-01-19",bifido = as.numeric(6.6516)))
bifido$bifido <- as.numeric(bifido$bifido)
j <- full_join(bifido, experiment_days) %>% as_tibble()
j[is.na(j$`Potato Starch`),]$`Potato Starch` <- "None"
ggplot(data = j ) +
geom_point(aes(x=as.Date(Date),y=Z, color = `Potato Starch`)) +
geom_smooth(inherit.aes = FALSE, data = j %>% dplyr::filter(`Potato Starch` %in% c("Low","None")),
aes(x=as.Date(Date),y=Z, color = `Potato Starch`),
method = "lm", se = FALSE) +
geom_bar( data = j, stat = "identity", aes(x = as.Date(Date), y = bifido),
color = "black") +
scale_y_continuous(sec.axis = sec_axis(~.,
name = "Bifidobacterium (%)")) +
scale_x_date(date_labels = "%b", date_breaks = "1 month", date_minor_breaks = "2 weeks") +
xlim(as.Date("2014-10-04"),as.Date("2015-04-01")) +
labs(title = "Total Sleep Per Night", x = "Date")
During the period when I was rigorously testing this, I only sampled myself three times, so the microbiome data isn’t as useful as for my other experiments. Still, note that my long-term typical levels of Bifidobacterium are around 1 % – about what you see in this chart for the February sample, which was taken after several weeks of no potato starch.
Of course, Bifidobacterium levels are not the main point of this experiment. We want to know if potato starch affected my sleep. So how does the t-test look when comparing days with or without potato starch?
t.test(experiment_days %>% dplyr::filter(`Potato Starch` == "None") %>% pull(Z),
experiment_days %>% dplyr::filter(`Potato Starch` != "None") %>% pull(Z))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(`Potato Starch` == "None") %>% and experiment_days %>% dplyr::filter(`Potato Starch` != "None") %>% pull(Z) and pull(Z)
## t = -1.04, df = 61.824, p-value = 0.3024
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4498331 0.1419544
## sample estimates:
## mean of x mean of y
## 6.328737 6.482676
Sadly, that p-value is much greater than 0.05, which tells me that, at least in this experiment, there’s not much reason to think the potato starch really helped one way or another.
Does alcohol affect my sleep?
Incidentally, I have data for several other variables. It turns out that none of the other things I tried had much impact on sleep either. Here’s my sleep on nights after drinking alcohol, compared to nights when I had none:
t.test(experiment_days %>% dplyr::filter(Alcohol == 0 ) %>% pull(Z),
experiment_days %>% dplyr::filter(Alcohol !=0 & !is.na(Alcohol)) %>% pull(Z))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(Alcohol == 0) %>% pull(Z) and experiment_days %>% dplyr::filter(Alcohol != 0 & !is.na(Alcohol)) %>% experiment_days %>% dplyr::filter(Alcohol == 0) %>% pull(Z) and pull(Z)
## t = -0.022903, df = 157.77, p-value = 0.9818
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2613462 0.2553548
## sample estimates:
## mean of x mean of y
## 6.361838 6.364833
Here’s the total number of days I drank, and how much:
with(experiment_days,table(cut(Alcohol,breaks = c(0,.2,1,8), include.lowest = TRUE, labels =c("None", "One", "More than One")))) %>% knitr::kable(col.names = c("Drinks","Days"))
Drinks | Days |
---|---|
None | 78 |
One | 67 |
More than One | 15 |
In this case I can test my numbers against a much longer data set where I recorded alcohol consumption and sleep. Over the course of nearly two years of data, I found that my nighly sleep average is slightly higher when I’ve had no alcohol, but the difference isn’t statistically significant. I tried to find correlations between sleep and the amount of alcohol, comparing the nights when I’d had several drinks, but I found nothing. Alcohol seems not to affect the number of hours I sleep.
# grab all dates where I had some alcohol the day/night before
ad <- rikStats %>% dplyr::filter(Alcohol !=0 & !is.na(Alcohol)) %>% pull(Date) + lubridate::days(1)
# and dates following no-alcohol days
nad <- rikStats %>% dplyr::filter(Alcohol == 0 ) %>% pull(Date) + lubridate::days(1)
t.test(rikStats %>% dplyr::filter(Date %in% ad) %>% pull(Z),
rikStats %>% dplyr::filter(Date %in% nad) %>% pull(Z))
##
## Welch Two Sample t-test
##
## data: rikStats %>% dplyr::filter(Date %in% ad) %>% pull(Z) and rikStats %>% dplyr::filter(Date %in% nad) %>% pull(Z)
## t = -3.3721, df = 1181.4, p-value = 0.0007702
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.27870206 -0.07367709
## sample estimates:
## mean of x mean of y
## 6.537934 6.714124
with(rikStats,table(cut(Alcohol,breaks = c(0,.2,1,8), include.lowest = TRUE, labels =c("None", "One", "More than One")))) %>% knitr::kable(col.names = c("Drinks","Days"))
Drinks | Days |
---|---|
None | 775 |
One | 450 |
More than One | 136 |
But maybe alcohol affects my REM or deep sleep? The answer appears to be no.
# grab all dates where I had some alcohol the day/night before
ad <- experiment_days %>% dplyr::filter(Alcohol !=0 & !is.na(Alcohol)) %>% pull(Date) #+ lubridate::days(1)
# and dates following no-alcohol days
nad <- experiment_days %>% dplyr::filter(Alcohol == 0 ) %>% pull(Date) #+ lubridate::days(1)
t.test(experiment_days %>% dplyr::filter(Date %in% ad & !is.na(REM)) %>% pull(REM),
experiment_days %>% dplyr::filter(Date %in% nad & !is.na(REM)) %>% pull(REM))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(Date %in% ad & !is.na(REM)) %>% and experiment_days %>% dplyr::filter(Date %in% nad & !is.na(REM)) %>% pull(REM) and pull(REM)
## t = -0.48131, df = 118.25, p-value = 0.6312
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1760789 0.1072209
## sample estimates:
## mean of x mean of y
## 1.811616 1.846045
t.test(experiment_days %>% dplyr::filter(Date %in% ad & !is.na(Deep)) %>% pull(Deep),
experiment_days %>% dplyr::filter(Date %in% nad & !is.na(Deep)) %>% pull(Deep))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(Date %in% ad & !is.na(Deep)) %>% and experiment_days %>% dplyr::filter(Date %in% nad & !is.na(Deep)) %>% pull(Deep) and pull(Deep)
## t = -0.2844, df = 122.95, p-value = 0.7766
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.07853084 0.05879962
## sample estimates:
## mean of x mean of y
## 1.059343 1.069209
Vitamin D supplementation and sleep
There did seem to be one effect worth noting. I seem to sleep better on days after I’ve taken a Vitamin D supplement:
# grab all dates where I had some Vitamin D the day/night before
ad <- experiment_days %>% dplyr::filter(`Vitamin D` > 0 & !is.na(`Vitamin D`)) %>% pull(Date)# + lubridate::days(1)
# and dates following no `Vitamin D` days
nad <- experiment_days %>% dplyr::filter(`Vitamin D` == 0 ) %>% pull(Date)# + lubridate::days(1)
t.test(experiment_days %>% dplyr::filter(Date %in% ad & !is.na(`Vitamin D`)) %>% pull(REM),
experiment_days %>% dplyr::filter(Date %in% nad & !is.na(`Vitamin D`)) %>% pull(REM))
##
## Welch Two Sample t-test
##
## data: experiment_days %>% dplyr::filter(Date %in% ad & !is.na(`Vitamin D`)) %>% and experiment_days %>% dplyr::filter(Date %in% nad & !is.na(`Vitamin D`)) %>% pull(REM) and pull(REM)
## t = 1.995, df = 71.758, p-value = 0.04984
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0001122644 0.3138701523
## sample estimates:
## mean of x mean of y
## 1.883128 1.726136
with(experiment_days,table(`Vitamin D`)) %>% knitr::kable(col.names = c("3K IUs", "Days"))
3K IUs | Days |
---|---|
0 | 75 |
1 | 53 |
2 | 32 |
But when I looked closer, the effect disappeared if I had more than 3000 IUs. Either Vitamin D helps sleep only in a narrow band of quantity, or it has no real effect. My money is on the “no effect”, though it would be interesting to do a more rigorous study.
Let’s look at my longer data set.
vd <- rikStats %>% dplyr::filter(`Vitamin D` !=0 & !is.na(`Vitamin D`)) %>% pull(Date) + lubridate::days(1)
nvd <- rikStats %>% dplyr::filter(`Vitamin D` == 0 & !is.na(`Vitamin D`)) %>% pull(Date) + lubridate::days(1)
with(rikStats, table(`Vitamin D`)) %>% knitr::kable(col.names = c("3K IUs", "Days"))
3K IUs | Days |
---|---|
0 | 435 |
0.5 | 3 |
1 | 147 |
2 | 182 |
3 | 9 |
4 | 5 |
5 | 16 |
6 | 2 |
13 | 1 |
t.test(rikStats %>% dplyr::filter(Date %in% vd) %>% pull(Z),
rikStats %>% dplyr::filter(Date %in% nvd) %>% pull(Z))
##
## Welch Two Sample t-test
##
## data: rikStats %>% dplyr::filter(Date %in% vd) %>% pull(Z) and rikStats %>% dplyr::filter(Date %in% nvd) %>% pull(Z)
## t = 0.92771, df = 710.22, p-value = 0.3539
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.06598338 0.18420195
## sample estimates:
## mean of x mean of y
## 6.581159 6.522050
Here, indeed, I slept slightly better on nights after taking Vitamin D (6.6 hrs vs 6.5) and this is with a lot of observations, but that p-value indicates that this is likely just a coincidence. Still, it’s intriguing that two separate data sets gave results in the same direction.
More generally, can I make a table of all possible P-values?
experiment_days %>% ungroup() %>% dplyr::filter(Date >= "2014-10-05" & Date <= "2015-03-18") %>% select(Z,REM, Deep) %>%
apply(2,function (x) c(mean(x, na.rm = TRUE),
median(x, na.rm = TRUE),
sd(x, na.rm = TRUE)) ) %>% knitr::kable(digits = 3)
Z | REM | Deep |
---|---|---|
6.377 | 1.856 | 1.086 |
6.500 | 1.883 | 1.067 |
0.859 | 0.372 | 0.193 |
# t.test(experiment_days %>% dplyr::filter(`Potato Starch` == "None") %>% pull(Z),
# experiment_days %>% dplyr::filter(`Potato Starch` != "None") %>% pull(Z))$p.value
#
# t.test(experiment_days %>% dplyr::filter(`Potato Starch` == "None") %>% pull(REM),
# experiment_days %>% dplyr::filter(`Potato Starch` != "None") %>% pull(REM))$p.value
#
# t.test(experiment_days %>% dplyr::filter(`Potato Starch` == "None") %>% pull(Deep),
# experiment_days %>% dplyr::filter(`Potato Starch` != "None") %>% pull(Deep))$p.value
# compare the null case with the non-null case. Returns the T-Test showing X=NULL, y= hypothesis
compare_treatment<- function (data=experiment_days,treatment=experiment_days$PS,wrt="Deep"){
x = t.test(data[[wrt]][treatment>0], data[[wrt]][treatment==0])
x$p.value
}
# P values while taking Potato Starch
# sapply(c("REM","Z","Deep"),function(x,y) {compare_treatment(treatment = experiment_days[[y]], wrt = x)},
# c("PS"))
#
# sapply(c("REM","Z","Deep"),function(x,y) {compare_treatment(treatment = experiment_days[[y]], wrt = x)},
# c("Alcohol"))
#
# sapply(c("REM","Z","Deep"),function(x,y) {compare_treatment(treatment = experiment_days[[y]], wrt = x)},
# c("Vitamin D"))
sapply(c("PS","Alcohol","Vitamin D"),
function (m) {sapply(c("REM","Z","Deep"),
function(x,y) {compare_treatment(treatment = experiment_days[[y]], wrt = x)},m)}) %>% knitr::kable(digits = 3)
PS | Alcohol | Vitamin D | |
---|---|---|---|
REM | 0.635 | 0.631 | 0.050 |
Z | 0.302 | 0.982 | 0.508 |
Deep | 0.785 | 0.777 | 0.064 |
Like I said, I’m still exploring my data and will write this up in a prettier format. For now, consider this a tutorial on how to study the data in R.