Richard Sprague

My personal website

Who moves to Seattle?

2018-09-07


Geekwire’s Monica Nickelsburg wrote Where do Seattle’s newcomers move from? Drivers license numbers reveal some surprises, with a pretty Excel chart showing the top states from which people move into King County.

But her chart doesn’t correct for the population of each state. Can we do better in R?

First, I downloaded the raw data from the Washington State Department of Licensing, which appears to be the source for her article.

Then I converted all the data to Tidy format:

library(tidyverse)

dol_king <- readxl::read_excel(dol_path, sheet = "King", skip = 5)
data(state)  # read state abbreviations

# load state populations


census_pop <- read.csv(census_pop_path)

# convert to Tidy dataset

dol_king$From[24] <- "Mississippi"  # fix an error in the DOL spelling
dol_king <- dol_king %>% setNames(stringr::str_replace(names(dol_king),"CY ","")) 
dol_king <- dol_king %>% gather(Year, Change, -From)

# dol_king is a tidy dataframe (tibble) showing the number of people who moved to King County from each state between 2006-2017

# Now do the same with census_pop

census_pop_no <- census_pop %>% select(starts_with("POPESTIMATE")) %>% tbl_df()  # just the numbers for populations, not state names

census_pop_no <- census_pop_no %>% 
  setNames(stringr::str_replace_all(names(census_pop_no),"[:alpha:]*",""))
census_pop <- cbind(select(census_pop,"NAME"),census_pop_no) %>% tbl_df()
census_pop <- census_pop %>% gather(Year,Population,-NAME)
census_pop$NAME <- as.character(census_pop$NAME)

dol <- dplyr::left_join(census_pop,dol_king, by = c("NAME" = "From", "Year" = "Year"))
dol$NAME <- factor(dol$NAME)
names(dol)[1] <- "From"

This gives me one handy variable, dol, with each state and both its population as well as the number of people who moved to King County in each year.

dol
## # A tibble: 416 x 4
##    From                 Year  Population Change
##    <fct>                <chr>      <int>  <dbl>
##  1 Alabama              2010     4785579    180
##  2 Alaska               2010      714015    679
##  3 Arizona              2010     6407002   2045
##  4 Arkansas             2010     2921737    210
##  5 California           2010    37327690  10373
##  6 Colorado             2010     5048029   1423
##  7 Connecticut          2010     3580171    344
##  8 Delaware             2010      899712     80
##  9 District of Columbia 2010      605040     NA
## 10 Florida              2010    18846461   2780
## # ... with 406 more rows

Now it’s just a matter of applying simple calculations to normalize the data.

Let’s draw this as a heatmap, with darker colors representing small percentages of a population, and ligher colors representing larger percentages.

ggplot(data = dol, mapping = aes(x = Year, y = From, fill = Change/Population )) +
  geom_tile() + 
  scale_y_discrete(limits = rev(levels(dol$From)))

The lighter the color, the higher the percentage of people from that state (and year) who are moving to King County. Represented this way, a few states stand out: Alaska and Oregon, for example. Although their overall populations are relatively small, lots of people move here from there. By comparison, relatively few residents of large states like California or Texas move here.

Interestingly, a non-obvious standout is Hawaii. I don’t normally think of Hawaiians as likely to move to Seattle, but percentage-wise they’re pretty high. In fact, for the last few years the average Hawaiian is more likely to move here than the average Idahoan. Go figure.

You can also see a few trends over time. For example, although both Montana and Idaho have sent a fair share of people here since since the early 2010s, their enthusiasm seems to have waned in the past few years. Similarly, Nevadans I guess decided to slow down too.

It’s a big country, so I wouldn’t read too much into this information – it’s not as though there’s a stampede in one direction or the other. Just normal people doing normal things.