Comparing Amazon Halo to Apple Watch
The Amazon Halo is a new fitness band with a few interesting features, notably a body scan option that claims to be more accurate than a household scale at calculating body fat. I’ll look in depth at that feature later, but meanwhile I wanted a quick-and-dirty test of the overall sensor accuracy.
To help with the analysis, I created a new R package called amazonhalor
that can read the raw data you can download from the Halo.
devtools::install_github("richardsprague/amazonhalor")
library(tidyverse)
The Halo data files are a little quirky, so this package offers a number of convenience functions to automatically convert the raw data into an R dataframe that’s a bit easier to handle.
My Apple Watch data is stored in the variable watch_data_full
, so I’ll use the package to read data from Halo so I can compare.
If you set the variable halo_directory
to a pathname pointing to the Amazon Health Data directory, read the Halo heart rate data like this:
halo_heartrate_df <- amazonhalor::halo_heartrate_df(halo_directory)
halo_activity_daily_df <- amazonhalor::halo_daily_df(halo_directory)
halo_sleep_df <- amazonhalor::halo_sleep_sessions_df(halo_directory)
Sleep
Here’s how sleep looks on the Halo:
halo_sleep_df %>% pivot_longer(cols = c("Z","REM", "Deep", "Light"), names_to = "Phase", values_to = "Duration") %>%
transmute(date = `Date Of Sleep`, Phase = factor(Phase), Duration) %>% dplyr::filter(Phase != "Z") %>%
ggplot(aes(x=date, y = Duration/3600, fill = Phase)) + geom_col() +
labs(title = "Sleep Duration", subtitle = "Amazon Halo", y = "Hours", x = "")
Resting Heart Rate
Here’s a chart showing how the Apple Watch and the Halo compare for resting heart rate:
watch_data_full %>% dplyr::filter(endDate >= "2020-11-23" & type=="RestingHeartRate") %>%
transmute(Date=lubridate::as_date(endDate, tz = Sys.timezone()),sourceName=sourceName,`Resting Heart Rate (bpm)`=value) %>%
full_join(select(halo_activity_daily_df, Date, sourceName, `Resting Heart Rate (bpm)`)) %>%
mutate(source = factor(sourceName, labels = c("Amazon Halo", "Apple Watch"))) %>%
ggplot(aes(x=Date, y= `Resting Heart Rate (bpm)`, color = source)) + geom_line(size=2) +
labs(title = "Apple Watch vs. Halo", color = "Source", x = "")
It’s not encouraging to see two wildly different values for my daily resting heart rates.
That chart was made using the daily summary, which comes from an algorithm that each manufacturer uses to compute what they think is your resting heart rate for the day. I’m not sure exactly how this is computed, which may explain the variance.
Heart Rates
How about the actual heart rates throughout the day? Because we’re looking at raw heart rate data, I assume it’s much closer to the truth, assuming each device accurately measures and stores my minute-by-minute heart rate.
library(tidyverse)
library(lubridate)
hr_compare <- watch_data_full %>% dplyr::filter(endDate >= "2020-11-23" & type=="HeartRate") %>%
transmute(datetime=lubridate::as_datetime(endDate,
tz = Sys.timezone()),
sourceName=sourceName,
value=value) %>%
full_join(select(halo_heartrate_df %>% sample_frac(0.1), datetime, sourceName, value)) %>%
mutate(source = factor(sourceName, labels = c("Amazon Halo", "Apple Watch"))) %>%
dplyr::filter(datetime>"2020-11-29 12:00pm")
hr_compare %>%
ggplot(aes(x=datetime, y = value, color = source)) + geom_point() + geom_smooth(method = "loess") +
labs(title = "Apple Watch vs. Halo", color = "Source",
x = "",
y = "Heart Rate (bpm)")
This looks a little, well, random. A lot of variance between the two devices. How much?
Tone
One feature that gets too much attention is the conversation “tone” measurement that uses the Halo’s built-in microphone to analyze your speech throughout the day in an attempt to find patterns.
p <- halo_tone_utterances_df %>% dplyr::filter(StartTime >= today() - days(1)) %>%
pivot_longer(cols = c("Positivity","Energy"),
names_to = "Measure",
values_to = "Intensity") %>%
ggplot(aes(x=StartTime, y = Intensity, color = Measure)) + geom_point() +
labs(title = "Amazon Halo Tone Measurements", x = "")
library(gganimate)
anim <- p + geom_point(aes(color = Measure, group = 1L)) + transition_states(Measure, transition_length = 2, state_length = 1)
anim + ease_aes('cubic-in-out')
I’m not sure how to interpret this, but it sure is cool to make the plots!
Statistics
Here are a few metrics where I started to attempt a comparison.
with(hr_compare, table(source))
## source
## Amazon Halo Apple Watch
## 4381 11668
hr_compare %>% group_by(source) %>% summarise(q1 = quantile(value, .25),
q2 = quantile(value, 0.75))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
## source q1 q2
## <fct> <dbl> <dbl>
## 1 Amazon Halo 65 82
## 2 Apple Watch 72 130
hr_compare %>% split(.$source) %>% purrr::map(summary)
## $`Amazon Halo`
## datetime sourceName value
## Min. :2020-11-29 12:02:58 Length:4381 Min. : 52.00
## 1st Qu.:2020-12-03 09:30:24 Class :character 1st Qu.: 65.00
## Median :2020-12-07 04:08:41 Mode :character Median : 71.00
## Mean :2020-12-07 06:12:49 Mean : 74.96
## 3rd Qu.:2020-12-11 05:33:56 3rd Qu.: 82.00
## Max. :2020-12-15 05:49:12 Max. :147.00
## source
## Amazon Halo:4381
## Apple Watch: 0
##
##
##
##
##
## $`Apple Watch`
## datetime sourceName value
## Min. :2020-11-29 12:01:59 Length:11668 Min. : 42.0
## 1st Qu.:2020-12-04 23:26:35 Class :character 1st Qu.: 72.0
## Median :2020-12-08 16:07:42 Mode :character Median :109.0
## Mean :2020-12-08 06:24:19 Mean :106.1
## 3rd Qu.:2020-12-12 10:57:43 3rd Qu.:130.0
## Max. :2020-12-15 06:01:44 Max. :171.0
## source
## Amazon Halo: 0
## Apple Watch:11668
##
##
##
##
t.test(hr_compare %>% dplyr::filter(source=="Apple Watch") %>% pull(value),
hr_compare %>% dplyr::filter(source=="Amazon Halo") %>% pull(value))
##
## Welch Two Sample t-test
##
## data: hr_compare %>% dplyr::filter(source == "Apple Watch") %>% pull(value) and hr_compare %>% dplyr::filter(source == "Amazon Halo") %>% pull(value)
## t = 83.95, df = 15769, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 30.44326 31.89886
## sample estimates:
## mean of x mean of y
## 106.13408 74.96302
Conclusion: the two device don’t agree very well, though it’s unclear the reason. I’ll continue to explore and update this post when I get more answers.