# Richard Sprague

My personal website

# Which microbes go with each other?

### 2018-04-08

Microbes in the body are always part of a community, an ecology of interdependent organisms, some rising, some falling as the environment changes. By watching the shifts over time, I would expect to see some patterns: microbes that consume that same kinds of nutrients may go up and down together as the amounts of those nutrients change. Similarly, a microbe that depends on some waste product secreted by another organism should go up or down depending on the abundance levels of the other organism.

To find patterns, I’ll line up the abundances of every sample I’ve taken, then run a simple correlation analysis to see which microbe levels are most highly-correlated.

Prerequisites: The following example is written in R and I’ve written an R package, actino that converts between uBiome raw data files and Phyloseq, an excellent microbiome analysis tool developed at Stanford’s Bioconductor program.

I start with a Phyloseq object called `gut.best` that contains the normalized abundances of all my gut microbes. If you follow the `actino` directions, you should be able to create a similar object for your own data.

Then simply turn that Phyloseq object into a single matrix:

``gut.mat <- as.matrix(otu_table(gut.best))``

But some taxa are quite rare, occuring in just a few out of hundreds of samples. Let’s ignore any sample where a taxa has fewer than 10 reads; finally, of those that remain, let’s arbitrarily eliminate any taxa that occur fewer than five times. Here we show the resulting total number of samples.

``````g <- apply(gut.mat,1,function(x) ncol(gut.mat)-sum(x<10)) >= 5
gut.mat <- gut.mat[g,]
nrow(gut.mat)``````
``## [1] 174``

Now run the correlations, showing the top 10…

``````matCorrs<-cor(t(gut.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )

mCorr %>% arrange(desc(val)) %>% head(10) %>% knitr::kable()``````
col row val
Tessaracoccus Brevibacterium 0.9988865
Facklamia Brevibacterium 0.9982044
Anaerococcus Corynebacterium 0.9981376
Tessaracoccus Facklamia 0.9976986
Facklamia Pseudoclavibacter 0.9959390
Pseudoclavibacter Brevibacterium 0.9955023
Gallicola Facklamia 0.9939966
Tessaracoccus Pseudoclavibacter 0.9934610
Gallicola Brevibacterium 0.9925372
Tessaracoccus Gallicola 0.9923258

…and the bottom:

``tail(mCorr[order(mCorr\$val, decreasing = TRUE),],10) %>% knitr::kable()``
col row val
1354 Butyricimonas Collinsella -0.3817954
1315 Blautia Akkermansia -0.3962517
767 Akkermansia Pseudobutyrivibrio -0.3991998
6814 Papillibacter Collinsella -0.4039415
755 Akkermansia Clostridium -0.4158221
776 Akkermansia Dorea -0.4200674
8413 Anaerovorax Collinsella -0.4205054
769 Akkermansia Collinsella -0.4244149
1880 Terrisporobacter Oscillibacter -0.5144061
747 Akkermansia Roseburia -0.5510311

For fun, let’s run the same correlation on other people. I have a private collection of a few hundred samples that others have sent me, stored in the Phyloseq object `people.norm`. Let’s run the above calculations on those samples to see if the results are similar

``````people.gut <- subset_samples(people.norm, Site == "gut" & Reads > 10000 & Condition == "Healthy")
people.best <- prune_taxa(taxa_sums(people.gut)>42,people.gut)
people.mat <- as.matrix(otu_table(people.best))

g <- apply(people.mat,1,function(x) ncol(people.mat)-sum(x<10)) >= 5
people.mat <- people.mat[g,]

nrow(people.mat)``````
``## [1] 140``
``````matCorrs<-cor(t(people.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr.people<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )

# tail(mCorr.people[order(mCorr\$val, decreasing = TRUE),],10)``````

Here are the most and least-correlated taxa for all people:

``mCorr.people %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Aerococcus Actinobaculum 0.9992074
Aerococcus Solobacterium 0.9971287
Ochrobactrum Delftia 0.9967128
Actinobaculum Solobacterium 0.9963677
Actinobaculum Pseudomonas 0.9962911
Actinobaculum Pyramidobacter 0.9960798
``mCorr.people %>% arrange(desc(val)) %>% tail() %>% knitr::kable()``
col row val
9725 Subdoligranulum Bacteroides -0.3237482
9726 Faecalibacterium Bacteroides -0.3260428
9727 Sarcina Bacteroides -0.3323116
9728 Bilophila Roseburia -0.3430269
9729 Faecalibacterium Collinsella -0.3446945
9730 Peptococcus Blautia -0.4095067

There are more unique taxa in the sample of people than there are in me. That makes sense, since you’d expect more diversity amount lots of people. Here are the taxa that are in me but not in `people.best`:

``setdiff(rownames(gut.mat),rownames(people.mat))``
``````##  [1] "Acidaminococcus"     "Arthrobacter"        "Oligella"
##  [4] "Achromobacter"       "Stenotrophomonas"    "Ralstonia"
##  [7] "Shinella"            "Neisseria"           "Actinobacillus"
## [10] "Pasteurella"         "Rothia"              "Johnsonella"
## [13] "Aggregatibacter"     "Alloprevotella"      "Phyllobacterium"
## [16] "Acinetobacter"       "Acetanaerobacterium" "Planomicrobium"
## [19] "Pediococcus"         "Anaerovorax"         "Tissierella"
## [25] "Sphingomonas"        "Aquabacterium"       "Pelomonas"
## [28] "Christensenella"     "Lysinibacillus"      "Anaerobacter"
## [31] "Azospira"            "Trueperella"         "Brachybacterium"
## [34] "Parasporobacterium"  "Raoultella"          "Hafnia"
## [37] "Rahnella"            "Carnobacterium"      "Sedimentibacter"
## [40] "Caldicoprobacter"    "Geobacillus"         "Cronobacter"
## [43] "Tessaracoccus"       "Anaerobacterium"     "Coprobacillus"
## [46] "Desulfitibacter"     "Proteiniphilum"``````
``setdiff(rownames(people.mat),rownames(gut.mat))``
``````##  [1] "Olsenella"        "Megamonas"        "Negativicoccus"
##  [4] "Senegalimassilia" "Dermabacter"      "Eremococcus"
##  [7] "Parvibacter"      "Butyricicoccus"   "Syntrophococcus"
## [10] "Howardella"       "Anaeroglobus"     "Actinobaculum"
## [13] "Aerococcus"``````

Let’s see if a few common taxa are similarly correlated:

Here are the microbes that are most and least correlated with Blautia

in all people:

``mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Blautia Dorea 0.3072171
Blautia Pseudobutyrivibrio 0.2017246
Blautia Subdoligranulum 0.2017201
Blautia Anaerostipes 0.1857530
Blautia Marvinbryantia 0.1815135
Blautia Anaerotruncus 0.1440489
``mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()``
col row val
Blautia Oscillibacter -0.2873210
Blautia Oscillospira -0.2739586
Blautia Sarcina -0.2736859
Blautia Odoribacter -0.2587790
Blautia Butyrivibrio -0.2456773
Blautia Porphyromonas -0.2342377

and just in me:

``mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()``
col row val
Blautia Dorea 0.6233786
Blautia Collinsella 0.5515443
Blautia Anaerostipes 0.5195198
Blautia Pseudobutyrivibrio 0.4415812
Blautia Hespellia 0.4167464
Blautia Roseburia 0.3516662
``mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()``
col row val
Blautia Akkermansia -0.3962517
Blautia Barnesiella -0.3002634
Blautia Thalassospira -0.2998090
Blautia Alistipes -0.2789667
Blautia Bacteroides -0.2346611
Blautia Odoribacter -0.2053716

Interestingly, at least among the top microbes, there does seem to be some agreement (Blautia - Dorea, Blautia Anaerostipes). Just a coincidence? Hmm..

If I can think of a better way to present this information – or if you have any suggestions for me – I’ll update this post.