# Richard Sprague

My personal website

# Which microbes go with each other?

### 2018-04-08

Microbes in the body are always part of a community, an ecology of interdependent organisms, some rising, some falling as the environment changes. By watching the shifts over time, I would expect to see some patterns: microbes that consume that same kinds of nutrients may go up and down together as the amounts of those nutrients change. Similarly, a microbe that depends on some waste product secreted by another organism should go up or down depending on the abundance levels of the other organism.

To find patterns, I’ll line up the abundances of every sample I’ve taken, then run a simple correlation analysis to see which microbe levels are most highly-correlated.

Prerequisites: The following example is written in R and I’ve written an R package, actino that converts between uBiome raw data files and Phyloseq, an excellent microbiome analysis tool developed at Stanford’s Bioconductor program.

I start with a Phyloseq object called `gut.best` that contains the normalized abundances of all my gut microbes. If you follow the `actino` directions, you should be able to create a similar object for your own data.

Then simply turn that Phyloseq object into a single matrix:

``gut.mat <- as.matrix(otu_table(gut.best))``

But some taxa are quite rare, occuring in just a few out of hundreds of samples. Let’s ignore any sample where a taxa has fewer than 10 reads; finally, of those that remain, let’s arbitrarily eliminate any taxa that occur fewer than five times. Here we show the resulting total number of samples.

``````g <- apply(gut.mat,1,function(x) ncol(gut.mat)-sum(x<10)) >= 5
gut.mat <- gut.mat[g,]
nrow(gut.mat)``````
``## [1] 174``

Now run the correlations, showing the top 10…

``````matCorrs<-cor(t(gut.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )

mCorr %>% arrange(desc(val)) %>% head(10) %>% knitr::kable()``````
col row val
Aquabacterium Methylobacterium 0.9995392
Methylobacterium Phyllobacterium 0.9991739
Aquabacterium Phyllobacterium 0.9986580
Weissella Veillonella 0.9963727
Veillonella Peptostreptococcus 0.9942442
Weissella Peptostreptococcus 0.9919192
Veillonella Campylobacter 0.9838159

…and the bottom:

``tail(mCorr[order(mCorr\$val, decreasing = TRUE),],10) %>% knitr::kable()``
col row val
1354 Butyricimonas Collinsella -0.4078009
12098 Fusicatenibacter Acidaminococcus -0.4111277
1315 Blautia Akkermansia -0.4127982
6814 Papillibacter Collinsella -0.4146698
776 Akkermansia Dorea -0.4168521
755 Akkermansia Clostridium -0.4388539
8413 Anaerovorax Collinsella -0.4436428
769 Akkermansia Collinsella -0.4462354
1880 Terrisporobacter Oscillibacter -0.5169693
747 Akkermansia Roseburia -0.5625351

For fun, let’s run the same correlation on other people. I have a private collection of a few hundred samples that others have sent me, stored in the Phyloseq object `people.norm`. Let’s run the above calculations on those samples to see if the results are similar

``````people.gut <- subset_samples(people.norm, Site == "gut" & Reads > 10000 & Condition == "Healthy")
people.best <- prune_taxa(taxa_sums(people.gut)>42,people.gut)
people.mat <- as.matrix(otu_table(people.best))

g <- apply(people.mat,1,function(x) ncol(people.mat)-sum(x<10)) >= 5
people.mat <- people.mat[g,]

nrow(people.mat)``````
``## [1] 145``
``````matCorrs<-cor(t(people.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr.people<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )

# tail(mCorr.people[order(mCorr\$val, decreasing = TRUE),],10)``````

Here are the most and least-correlated taxa for all people:

``mCorr.people %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Aerococcus Actinobaculum 0.9991356
Aerococcus Solobacterium 0.9971294
Ochrobactrum Delftia 0.9967131
Solobacterium Actinobaculum 0.9962884
Pseudomonas Actinobaculum 0.9962027
Pyramidobacter Actinobaculum 0.9959956
``mCorr.people %>% arrange(desc(val)) %>% tail() %>% knitr::kable()``
col row val
10435 Lactobacillus Bacteroides -0.3214633
10436 Subdoligranulum Bacteroides -0.3242899
10437 Sarcina Bacteroides -0.3244339
10438 Faecalibacterium Bacteroides -0.3246193
10439 Bilophila Roseburia -0.3367604
10440 Blautia Peptococcus -0.4040759

There are more unique taxa in the sample of people than there are in me. That makes sense, since you’d expect more diversity amount lots of people. Here are the taxa that are in me but not in `people.best`:

``setdiff(rownames(gut.mat),rownames(people.mat))``
``````##  [1] "Oligella"           "Achromobacter"      "Stenotrophomonas"
##  [4] "Ralstonia"          "Shinella"           "Neisseria"
##  [7] "Actinobacillus"     "Pasteurella"        "Rothia"
## [10] "Johnsonella"        "Aggregatibacter"    "Phyllobacterium"
## [13] "Acinetobacter"      "Planomicrobium"     "Pediococcus"
## [16] "Anaerovorax"        "Tissierella"        "Pantoea"
## [22] "Aquabacterium"      "Pelomonas"          "Christensenella"
## [25] "Anaerobacter"       "Azospira"           "Trueperella"
## [28] "Parasporobacterium" "Raoultella"         "Hafnia"
## [31] "Rahnella"           "Sedimentibacter"    "Tessaracoccus"
## [34] "Fretibacterium"     "Caldicoprobacter"   "Geobacillus"
## [37] "Cronobacter"        "Anaerobacterium"    "Coprobacillus"
## [40] "Desulfitibacter"    "Proteiniphilum"     "Enorma"
## [43] "Clostridioides"``````
``setdiff(rownames(people.mat),rownames(gut.mat))``
``````##  [1] "Parvibacter"      "Actinobaculum"    "Eremococcus"
##  [4] "Alloscardovia"    "Senegalimassilia" "Olsenella"
##  [7] "Megamonas"        "Negativicoccus"   "Dermabacter"
## [10] "Butyricicoccus"   "Syntrophococcus"  "Howardella"
## [13] "Anaeroglobus"     "Aerococcus"``````

Let’s see if a few common taxa are similarly correlated:

Here are the microbes that are most and least correlated with Blautia

in all people:

``mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Blautia Dorea 0.3099527
Blautia Gemella 0.2347608
Blautia Pseudobutyrivibrio 0.1989548
Blautia Subdoligranulum 0.1983484
Blautia Anaerostipes 0.1877759
Blautia Marvinbryantia 0.1850781
``mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()``
col row val
Blautia Peptococcus -0.4040759
Blautia Turicibacter -0.2948222
Blautia Sarcina -0.2787066
Blautia Oscillospira -0.2783613
Blautia Oscillibacter -0.2756989
Blautia Odoribacter -0.2591768

and just in me:

``mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Blautia Dorea 0.6246366
Blautia Collinsella 0.5462091
Blautia Anaerostipes 0.5194107
Blautia Pseudobutyrivibrio 0.4484533
Blautia Hespellia 0.4293270
Blautia Roseburia 0.3599543
``mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()``
col row val
Blautia Akkermansia -0.4127982
Blautia Thalassospira -0.3094241
Blautia Barnesiella -0.3026999
Blautia Alistipes -0.2799459
Blautia Bacteroides -0.2226105
Blautia Bilophila -0.2035641

Interestingly, at least among the top microbes, there does seem to be some agreement (Blautia - Dorea, Blautia Anaerostipes). Just a coincidence? Hmm..

If I can think of a better way to present this information – or if you have any suggestions for me – I’ll update this post.