Richard Sprague

My personal website

Comparing Amazon Halo to Apple Watch

Created: 2020-12-02 ; Updated: 2020-12-14

The Amazon Halo is a new fitness band with a few interesting features, notably a body scan option that claims to be more accurate than a household scale at calculating body fat. I’ll look in depth at that feature later, but meanwhile I wanted a quick-and-dirty test of the overall sensor accuracy.

To help with the analysis, I created a new R package called amazonhalor that can read the raw data you can download from the Halo.

devtools::install_github("richardsprague/amazonhalor")
library(tidyverse)

The Halo data files are a little quirky, so this package offers a number of convenience functions to automatically convert the raw data into an R dataframe that’s a bit easier to handle.

My Apple Watch data is stored in the variable watch_data_full, so I’ll use the package to read data from Halo so I can compare.

If you set the variable halo_directory to a pathname pointing to the Amazon Health Data directory, read the Halo heart rate data like this:

halo_heartrate_df <- amazonhalor::halo_heartrate_df(halo_directory)
halo_activity_daily_df <- amazonhalor::halo_daily_df(halo_directory)
halo_sleep_df <- amazonhalor::halo_sleep_sessions_df(halo_directory)

Sleep

Here’s how sleep looks on the Halo:

halo_sleep_df %>%  pivot_longer(cols = c("Z","REM", "Deep", "Light"), names_to = "Phase", values_to = "Duration") %>% 
  transmute(date = `Date Of Sleep`, Phase = factor(Phase), Duration) %>% dplyr::filter(Phase != "Z") %>% 
  ggplot(aes(x=date, y = Duration/3600, fill = Phase)) + geom_col() +
  labs(title = "Sleep Duration", subtitle = "Amazon Halo", y = "Hours", x = "")

Resting Heart Rate

Here’s a chart showing how the Apple Watch and the Halo compare for resting heart rate:

watch_data_full %>% dplyr::filter(endDate >= "2020-11-23" & type=="RestingHeartRate") %>%
  transmute(Date=lubridate::as_date(endDate, tz = Sys.timezone()),sourceName=sourceName,`Resting Heart Rate (bpm)`=value) %>%
  full_join(select(halo_activity_daily_df, Date, sourceName, `Resting Heart Rate (bpm)`)) %>%
  mutate(source = factor(sourceName, labels = c("Amazon Halo", "Apple Watch"))) %>%
  ggplot(aes(x=Date, y= `Resting Heart Rate (bpm)`, color = source)) + geom_line(size=2) +
  
  labs(title = "Apple Watch vs. Halo", color = "Source", x = "")

It’s not encouraging to see two wildly different values for my daily resting heart rates.

That chart was made using the daily summary, which comes from an algorithm that each manufacturer uses to compute what they think is your resting heart rate for the day. I’m not sure exactly how this is computed, which may explain the variance.

Heart Rates

How about the actual heart rates throughout the day? Because we’re looking at raw heart rate data, I assume it’s much closer to the truth, assuming each device accurately measures and stores my minute-by-minute heart rate.

library(tidyverse)
library(lubridate)
hr_compare <- watch_data_full %>% dplyr::filter(endDate >= "2020-11-23" & type=="HeartRate") %>%
  transmute(datetime=lubridate::as_datetime(endDate,
                                            tz = Sys.timezone()),
            sourceName=sourceName,
            value=value) %>%
  full_join(select(halo_heartrate_df %>% sample_frac(0.1), datetime, sourceName, value)) %>%
  mutate(source = factor(sourceName, labels = c("Amazon Halo", "Apple Watch"))) %>%
  dplyr::filter(datetime>"2020-11-29 12:00pm")



hr_compare %>%
  ggplot(aes(x=datetime, y = value, color = source)) + geom_point() + geom_smooth(method = "loess") +
  
  labs(title = "Apple Watch vs. Halo", color = "Source",
       x = "",
       y = "Heart Rate (bpm)")

This looks a little, well, random. A lot of variance between the two devices. How much?

Tone

One feature that gets too much attention is the conversation “tone” measurement that uses the Halo’s built-in microphone to analyze your speech throughout the day in an attempt to find patterns.

p <- halo_tone_utterances_df %>% dplyr::filter(StartTime >= today() - days(1)) %>% 
  pivot_longer(cols = c("Positivity","Energy"),
               names_to = "Measure",
               values_to = "Intensity") %>% 
  ggplot(aes(x=StartTime, y = Intensity, color = Measure)) + geom_point() +
  labs(title = "Amazon Halo Tone Measurements", x = "")

library(gganimate)
anim <- p + geom_point(aes(color = Measure, group = 1L)) + transition_states(Measure, transition_length = 2, state_length = 1)

anim + ease_aes('cubic-in-out')

I’m not sure how to interpret this, but it sure is cool to make the plots!

Statistics

Here are a few metrics where I started to attempt a comparison.

with(hr_compare, table(source))
## source
## Amazon Halo Apple Watch 
##        4381       11668
hr_compare %>% group_by(source) %>% summarise(q1 = quantile(value, .25),
                                              q2 = quantile(value, 0.75))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
##   source         q1    q2
##   <fct>       <dbl> <dbl>
## 1 Amazon Halo    65    82
## 2 Apple Watch    72   130
hr_compare %>% split(.$source) %>% purrr::map(summary)
## $`Amazon Halo`
##     datetime                    sourceName            value       
##  Min.   :2020-11-29 12:02:58   Length:4381        Min.   : 52.00  
##  1st Qu.:2020-12-03 09:30:24   Class :character   1st Qu.: 65.00  
##  Median :2020-12-07 04:08:41   Mode  :character   Median : 71.00  
##  Mean   :2020-12-07 06:12:49                      Mean   : 74.96  
##  3rd Qu.:2020-12-11 05:33:56                      3rd Qu.: 82.00  
##  Max.   :2020-12-15 05:49:12                      Max.   :147.00  
##          source    
##  Amazon Halo:4381  
##  Apple Watch:   0  
##                    
##                    
##                    
##                    
## 
## $`Apple Watch`
##     datetime                    sourceName            value      
##  Min.   :2020-11-29 12:01:59   Length:11668       Min.   : 42.0  
##  1st Qu.:2020-12-04 23:26:35   Class :character   1st Qu.: 72.0  
##  Median :2020-12-08 16:07:42   Mode  :character   Median :109.0  
##  Mean   :2020-12-08 06:24:19                      Mean   :106.1  
##  3rd Qu.:2020-12-12 10:57:43                      3rd Qu.:130.0  
##  Max.   :2020-12-15 06:01:44                      Max.   :171.0  
##          source     
##  Amazon Halo:    0  
##  Apple Watch:11668  
##                     
##                     
##                     
## 
t.test(hr_compare %>% dplyr::filter(source=="Apple Watch") %>% pull(value),
       hr_compare %>% dplyr::filter(source=="Amazon Halo") %>% pull(value))
## 
##  Welch Two Sample t-test
## 
## data:  hr_compare %>% dplyr::filter(source == "Apple Watch") %>% pull(value) and hr_compare %>% dplyr::filter(source == "Amazon Halo") %>% pull(value)
## t = 83.95, df = 15769, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  30.44326 31.89886
## sample estimates:
## mean of x mean of y 
## 106.13408  74.96302

Conclusion: the two device don’t agree very well, though it’s unclear the reason. I’ll continue to explore and update this post when I get more answers.