Which microbes go with each other?
Microbes in the body are always part of a community, an ecology of interdependent organisms, some rising, some falling as the environment changes. By watching the shifts over time, I would expect to see some patterns: microbes that consume that same kinds of nutrients may go up and down together as the amounts of those nutrients change. Similarly, a microbe that depends on some waste product secreted by another organism should go up or down depending on the abundance levels of the other organism.
To find patterns, I’ll line up the abundances of every sample I’ve taken, then run a simple correlation analysis to see which microbe levels are most highly-correlated.
Prerequisites: The following example is written in R and I’ve written an R package, actino that converts between uBiome raw data files and Phyloseq, an excellent microbiome analysis tool developed at Stanford’s Bioconductor program.
I start with a Phyloseq object called gut.best
that contains the normalized abundances of all my gut microbes. If you follow the actino
directions, you should be able to create a similar object for your own data.
Then simply turn that Phyloseq object into a single matrix:
gut.mat <- as.matrix(otu_table(gut.best))
But some taxa are quite rare, occuring in just a few out of hundreds of samples. Let’s ignore any sample where a taxa has fewer than 10 reads; finally, of those that remain, let’s arbitrarily eliminate any taxa that occur fewer than five times. Here we show the resulting total number of samples.
g <- apply(gut.mat,1,function(x) ncol(gut.mat)-sum(x<10)) >= 5
gut.mat <- gut.mat[g,]
nrow(gut.mat)
## [1] 175
Now run the correlations, showing the top 10…
matCorrs<-cor(t(gut.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle
ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )
mCorr<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )
#head(mCorr[order(mCorr$val, decreasing = TRUE),],10)
mCorr %>% arrange(desc(val)) %>% head(10) %>% knitr::kable()
col | row | val |
---|---|---|
Aquabacterium | Methylobacterium | 0.9995393 |
Methylobacterium | Bradyrhizobium | 0.9993668 |
Aquabacterium | Bradyrhizobium | 0.9992332 |
Methylobacterium | Phyllobacterium | 0.9991739 |
Aquabacterium | Phyllobacterium | 0.9986580 |
Bradyrhizobium | Phyllobacterium | 0.9985602 |
Weissella | Veillonella | 0.9951920 |
Veillonella | Peptostreptococcus | 0.9942449 |
Weissella | Peptostreptococcus | 0.9907539 |
Veillonella | Campylobacter | 0.9838069 |
…and the bottom:
tail(mCorr[order(mCorr$val, decreasing = TRUE),],10) %>% knitr::kable()
col | row | val | |
---|---|---|---|
1354 | Butyricimonas | Collinsella | -0.4043263 |
1315 | Blautia | Akkermansia | -0.4108318 |
767 | Akkermansia | Pseudobutyrivibrio | -0.4135541 |
12254 | Fusicatenibacter | Acidaminococcus | -0.4140983 |
776 | Akkermansia | Dorea | -0.4193126 |
6814 | Papillibacter | Collinsella | -0.4442487 |
769 | Akkermansia | Collinsella | -0.4580287 |
8413 | Anaerovorax | Collinsella | -0.4769236 |
1880 | Terrisporobacter | Oscillibacter | -0.5099724 |
747 | Akkermansia | Roseburia | -0.5741032 |
For fun, let’s run the same correlation on other people. I have a private collection of a few hundred samples that others have sent me, stored in the Phyloseq object people.norm
. Let’s run the above calculations on those samples to see if the results are similar
people.gut <- subset_samples(people.norm, Site == "gut" & Reads > 10000 & Condition == "Healthy")
people.best <- prune_taxa(taxa_sums(people.gut)>42,people.gut)
people.mat <- as.matrix(otu_table(people.best))
g <- apply(people.mat,1,function(x) ncol(people.mat)-sum(x<10)) >= 5
people.mat <- people.mat[g,]
nrow(people.mat)
## [1] 154
matCorrs<-cor(t(people.mat)) # matrix of all correlation coefficients
mc<-matCorrs[upper.tri(matCorrs)] # just the upper triangle
ind <- which( upper.tri(matCorrs,diag=F) , arr.ind = TRUE )
mCorr.people<-data.frame( col = dimnames(matCorrs)[[2]][ind[,2]] ,
row = dimnames(matCorrs)[[1]][ind[,1]] ,
val = matCorrs[ ind ] )
# head(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)
# tail(mCorr.people[order(mCorr$val, decreasing = TRUE),],10)
Here are the most and least-correlated taxa for all people:
mCorr.people %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col | row | val |
---|---|---|
Aerococcus | Actinobaculum | 0.9991366 |
Aerococcus | Solobacterium | 0.9970899 |
Ochrobactrum | Delftia | 0.9966713 |
Actinobaculum | Solobacterium | 0.9962508 |
Pyramidobacter | Actinobaculum | 0.9959956 |
Aerococcus | Pyramidobacter | 0.9955293 |
mCorr.people %>% arrange(desc(val)) %>% tail() %>% knitr::kable()
col | row | val | |
---|---|---|---|
11776 | Erysipelatoclostridium | Sarcina | -0.3354350 |
11777 | Blautia | Sarcina | -0.3383840 |
11778 | Sarcina | Bacteroides | -0.3400012 |
11779 | Terrisporobacter | Blautia | -0.3431375 |
11780 | Peptococcus | Blautia | -0.3572107 |
11781 | Subdoligranulum | Bacteroides | -0.3743589 |
There are more unique taxa in the sample of people than there are in me. That makes sense, since you’d expect more diversity amount lots of people. Here are the taxa that are in me but not in people.best
:
setdiff(rownames(gut.mat),rownames(people.mat))
## [1] "Oligella" "Achromobacter" "Stenotrophomonas"
## [4] "Ralstonia" "Shinella" "Neisseria"
## [7] "Pasteurella" "Rothia" "Johnsonella"
## [10] "Phyllobacterium" "Acinetobacter" "Planomicrobium"
## [13] "Pediococcus" "Tissierella" "Bradyrhizobium"
## [16] "Methylobacterium" "Methanosphaera" "Sphingomonas"
## [19] "Aquabacterium" "Pelomonas" "Anaerobacter"
## [22] "Azospira" "Trueperella" "Parasporobacterium"
## [25] "Raoultella" "Hafnia" "Rahnella"
## [28] "Sedimentibacter" "Tessaracoccus" "Fretibacterium"
## [31] "Caldicoprobacter" "Geobacillus" "Anaerobacterium"
## [34] "Coprobacillus" "Desulfitibacter" "Enorma"
## [37] "Proteiniphilum"
setdiff(rownames(people.mat),rownames(gut.mat))
## [1] "Senegalimassilia" "Eremococcus" "Negativicoccus" "Dermabacter"
## [5] "Butyricicoccus" "Olsenella" "Howardella" "Acetivibrio"
## [9] "Syntrophococcus" "Megamonas" "Parvibacter" "Cellulosilyticum"
## [13] "Actinobaculum" "Alloscardovia" "Anaeroglobus" "Aerococcus"
Let’s see if a few common taxa are similarly correlated:
Here are the microbes that are most and least correlated with Blautia
in all people:
mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col | row | val |
---|---|---|
Blautia | Anaerostipes | 0.2092081 |
Blautia | Pseudobutyrivibrio | 0.1837263 |
Blautia | Dorea | 0.1725876 |
Blautia | Anaerotruncus | 0.1682866 |
Blautia | Klebsiella | 0.1620616 |
Blautia | Marvinbryantia | 0.1250385 |
mCorr.people %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()
col | row | val |
---|---|---|
Blautia | Sarcina | -0.3383840 |
Blautia | Oscillibacter | -0.3260838 |
Blautia | Odoribacter | -0.2945652 |
Blautia | Fibrobacter | -0.2704612 |
Blautia | Victivallis | -0.2434587 |
Blautia | Oscillospira | -0.2403851 |
and just in me:
mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(desc(val)) %>% head() %>% knitr::kable()
col | row | val |
---|---|---|
Blautia | Dorea | 0.6234608 |
Blautia | Collinsella | 0.5392529 |
Blautia | Anaerostipes | 0.5102585 |
Blautia | Pseudobutyrivibrio | 0.4512732 |
Blautia | Hespellia | 0.4249490 |
Blautia | Roseburia | 0.3655861 |
mCorr %>% dplyr::filter(col=="Blautia") %>% arrange(val) %>% head() %>% knitr::kable()
col | row | val |
---|---|---|
Blautia | Akkermansia | -0.4108318 |
Blautia | Barnesiella | -0.2996382 |
Blautia | Thalassospira | -0.2952619 |
Blautia | Alistipes | -0.2675867 |
Blautia | Bacteroides | -0.2160841 |
Blautia | Odoribacter | -0.2121982 |
Interestingly, at least among the top microbes, there does seem to be some agreement (Blautia - Dorea, Blautia Anaerostipes). Just a coincidence? Hmm..
If I can think of a better way to present this information – or if you have any suggestions for me – I’ll update this post.