Richard Sprague

My personal website

Which microbes go with each other?

Created: 2018-04-08 ; Updated: 2018-04-08

Microbes in the body are always part of a community, an ecology of interdependent organisms, some rising, some falling as the environment changes. By watching the shifts over time, I would expect to see some patterns: microbes that consume that same kinds of nutrients may go up and down together as the amounts of those nutrients change. Similarly, a microbe that depends on some waste product secreted by another organism should go up or down depending on the abundance levels of the other organism.

To find patterns, I’ll line up the abundances of every sample I’ve taken, then run a simple correlation analysis to see which microbe levels are most highly-correlated.

Prerequisites: The following example is written in R and I’ve written an R package, actino that converts between uBiome raw data files and Phyloseq, an excellent microbiome analysis tool developed at Stanford’s Bioconductor program.

I start with a Phyloseq object called gut.best that contains the normalized abundances of all my gut microbes. If you follow the actino directions, you should be able to create a similar object for your own data.

Then simply turn that Phyloseq object into a single matrix:

gut.mat <- as.matrix(otu_table(gut.best))

But some taxa are quite rare, occuring in just a few out of hundreds of samples. Let’s ignore any sample where a taxa has fewer than 10 reads; finally, of those that remain, let’s arbitrarily eliminate any taxa that occur fewer than five times. Here we show the resulting total number of samples.

g <- apply(gut.mat,1,function(x) ncol(gut.mat)-sum(x<10)) >= 5
gut.mat <- gut.mat[g,]
nrow(gut.mat)
## [1] 175

Now run the correlations, showing the top 10…

matCorrs<-cor(t(gut.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
                   row = dimnames(matCorrs)[[1]][ind[,1]] ,
                   val = matCorrs[ ind ] )

#head(mCorr[order(mCorr$val, decreasing = TRUE),],10)

mCorr %>% arrange(desc(val)) %>% head(10) %>% knitr::kable()
col row val
Aquabacterium Methylobacterium 0.9995393
Methylobacterium Bradyrhizobium 0.9993668
Aquabacterium Bradyrhizobium 0.9992332
Methylobacterium Phyllobacterium 0.9991739
Aquabacterium Phyllobacterium 0.9986580
Bradyrhizobium Phyllobacterium 0.9985602
Weissella Veillonella 0.9951919
Veillonella Peptostreptococcus 0.9942447
Weissella Peptostreptococcus 0.9907536
Veillonella Campylobacter 0.9838081

…and the bottom:

tail(mCorr[order(mCorr$val, decreasing = TRUE),],10) %>% knitr::kable()
col row val
1354 Butyricimonas Collinsella -0.4024848
767 Akkermansia Pseudobutyrivibrio -0.4107916
1315 Blautia Akkermansia -0.4118125
12254 Fusicatenibacter Acidaminococcus -0.4129678
776 Akkermansia Dorea -0.4195457
6814 Papillibacter Collinsella -0.4369405
769 Akkermansia Collinsella -0.4537434
8413 Anaerovorax Collinsella -0.4682722
1880 Terrisporobacter Oscillibacter -0.5092576
747 Akkermansia Roseburia -0.5715307

For fun, let’s run the same correlation on other people. I have a private collection of a few hundred samples that others have sent me, stored in the Phyloseq object people.norm. Let’s run the above calculations on those samples to see if the results are similar

people.gut <- subset_samples(people.norm, Site == "gut" & Reads > 10000 & Condition == "Healthy")
people.best <- prune_taxa(taxa_sums(people.gut)>42,people.gut) 
people.mat <- as.matrix(otu_table(people.best))

g <- apply(people.mat,1,function(x) ncol(people.mat)-sum(x<10)) >= 5
people.mat <- people.mat[g,]

nrow(people.mat)
## [1] 153
matCorrs<-cor(t(people.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle

ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )

mCorr.people<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
                   row = dimnames(matCorrs)[[1]][ind[,1]] ,
                   val = matCorrs[ ind ] )

# head(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)
# tail(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)

Here are the most and least-correlated taxa for all people:

mCorr.people %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col row val
Aerococcus Actinobaculum 0.9991361
Aerococcus Solobacterium 0.9971312
Ochrobactrum Delftia 0.9967136
Solobacterium Actinobaculum 0.9962899
Pseudomonas Actinobaculum 0.9962008
Pyramidobacter Actinobaculum 0.9959956
mCorr.people %>% arrange(desc(val)) %>% tail() %>% knitr::kable()
col row val
11623 Erysipelatoclostridium Sarcina -0.3246730
11624 Faecalibacterium Bacteroides -0.3272404
11625 Subdoligranulum Bacteroides -0.3287083
11626 Faecalibacterium Collinsella -0.3307656
11627 Bilophila Roseburia -0.3382679
11628 Peptococcus Blautia -0.3744271

There are more unique taxa in the sample of people than there are in me. That makes sense, since you’d expect more diversity amount lots of people. Here are the taxa that are in me but not in people.best:

setdiff(rownames(gut.mat),rownames(people.mat))
##  [1] "Oligella"           "Achromobacter"      "Stenotrophomonas"  
##  [4] "Ralstonia"          "Shinella"           "Neisseria"         
##  [7] "Actinobacillus"     "Pasteurella"        "Rothia"            
## [10] "Johnsonella"        "Phyllobacterium"    "Acinetobacter"     
## [13] "Planomicrobium"     "Pediococcus"        "Tissierella"       
## [16] "Bradyrhizobium"     "Methylobacterium"   "Methanosphaera"    
## [19] "Sphingomonas"       "Aquabacterium"      "Pelomonas"         
## [22] "Anaerobacter"       "Azospira"           "Trueperella"       
## [25] "Parasporobacterium" "Raoultella"         "Hafnia"            
## [28] "Rahnella"           "Sedimentibacter"    "Tessaracoccus"     
## [31] "Fretibacterium"     "Caldicoprobacter"   "Geobacillus"       
## [34] "Anaerobacterium"    "Coprobacillus"      "Desulfitibacter"   
## [37] "Enorma"             "Proteiniphilum"
setdiff(rownames(people.mat),rownames(gut.mat))
##  [1] "Negativicoccus"   "Howardella"       "Acetivibrio"     
##  [4] "Dermabacter"      "Syntrophococcus"  "Megamonas"       
##  [7] "Parvibacter"      "Senegalimassilia" "Cellulosilyticum"
## [10] "Actinobaculum"    "Eremococcus"      "Alloscardovia"   
## [13] "Olsenella"        "Butyricicoccus"   "Anaeroglobus"    
## [16] "Aerococcus"

Let’s see if a few common taxa are similarly correlated:

Here are the microbes that are most and least correlated with Blautia

in all people:

mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col row val
Blautia Anaerofustis 0.2091862
Blautia Anaerostipes 0.1966629
Blautia Dorea 0.1727297
Blautia Pseudobutyrivibrio 0.1708692
Blautia Marvinbryantia 0.1660357
Blautia Anaerotruncus 0.1615309
mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()
col row val
Blautia Sarcina -0.3054882
Blautia Oscillibacter -0.3004970
Blautia Fibrobacter -0.2865571
Blautia Turicibacter -0.2814700
Blautia Thalassospira -0.2292539
Blautia Mogibacterium -0.2134808

and just in me:

mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col row val
Blautia Dorea 0.6234238
Blautia Collinsella 0.5422636
Blautia Anaerostipes 0.5143553
Blautia Pseudobutyrivibrio 0.4521050
Blautia Hespellia 0.4254004
Blautia Roseburia 0.3665500
mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()
col row val
Blautia Akkermansia -0.4118125
Blautia Barnesiella -0.3000064
Blautia Thalassospira -0.2954108
Blautia Alistipes -0.2680040
Blautia Bacteroides -0.2163156
Blautia Odoribacter -0.2126371

Interestingly, at least among the top microbes, there does seem to be some agreement (Blautia - Dorea, Blautia Anaerostipes). Just a coincidence? Hmm..

If I can think of a better way to present this information – or if you have any suggestions for me – I’ll update this post.