C Appl. 0000007248 00000 n
We take samples from environments, and investigate the microbial community present in the sample. No use, distribution or reproduction is permitted which does not comply with these terms. 3:652. doi: 10.1038/s41564-018-0156-0, Weiss, S., Xu, Z. doi: 10.1111/biom.12332, Willis, A. D., Bunge, J., and Whitman, T. (2016). Front. The editor and reviewer's affiliations are the latest provided on their Loop research profiles and may not reflect their situation at the time of review. To assess if the amendments affect the flux, we would fit a regression-type model (such as ANOVA) to flux with amendment as an explanatory variable. startxref
*7]9rQ(_Eh%;K) [8)JR=W-&z%/q b<5mD:;3[\.z6H-Aa&9WD\h+(*0,8OuNOd*B&jr'J
V ^o |o7\;lW N6p*n:K;tK{ DG%9gHs6 {tX1cw'BjDEA&?f50~|Q We would adjust for the measurement error by adding 5 units to each measurement before comparing them. Microbiol. 0000004882 00000 n
Nonparametric estimation of Shannon's index of diversity when there are unseen species in sample. These estimates are then used for modeling and hypothesis testing (see, for example, Arora et al., 2017). doi: 10.1002/0471728438, Fisher, R. A., Corbet, A. S., and Williams, C. B. u71H1l{uR(MJ Modeling parameters observed with estimation error is not a new suggestion: this approach is from the field of statistical meta-analysis, where the results of multiple studies estimating the same effect size is compared (Demidenko, 2004; Willis et al., 2016; Washburne et al., 2018). 0000014469 00000 n
0000010375 00000 n
doi: 10.1146/annurev-statistics-022513-115654, Chao, A., and Bunge, J. xref
For example, Figure 1E shows two environments with different abundance structures but equal richness; rarefying gives the false impression of unequal richness (see also Lande et al., 2000). Hoboken, NJ: Wiley-Interscience. Divnet: estimating diversity in networked communities. (2016). The library sizes can dominate the biology in determining the result of the diversity analysis (Lande, 1996). doi: 10.1034/j.1600-0706.2000.890320.x, Makipaa, R., Rajala, T., Schigel, D., Rinne, K. T., Pennanen, T., Abrego, N., et al. 0000011550 00000 n
Here I advocate for a third strategy: adjust the sample richness of each ecosystem by adding to it an estimate of the number of unobserved species, estimate the variance in the total richness estimate, and compare the diversities relative to these errors (Figure 1D). Bell Syst. I encourage microbial ecologists to use estimates of alpha diversity that account for unobserved species, and to use the variance of the estimates in measurement error models to compare diversity across ecosystems. It has recently been argued that studying microbial diversity without context is distracting us from gaining insight into ecological mechanisms (Shade, 2016). Furthermore, this discussion applies equally to diversity analyses performed at the strain, species, or other taxonomic level. Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos 76, 513. Understanding the drivers of diversity is a fundamental question in ecology. While the example discussed here is richness, this approach to estimating and comparing alpha diversity using a bias correction (incorporating unobserved taxa) and a variance adjustment (measurement error model) could apply to any alpha diversity metric. doi: 10.1023/A:1026096204727, Demidenko, E. (2004). 0
0000015025 00000 n
1, 427445. We currently do not account for measurement error in microbial diversity studies. doi: 10.1111/rssc.12206, Willis, A. D., and Martin, B. D. (2018). doi: 10.1101/231878, Willis, A., and Bunge, J. Plug-in estimates of many alpha diversity indices (including richness and Shannon diversity) are negatively biased for the environment's alpha diversity parameter, that is, they underestimate the true alpha diversity (Lande, 1996). Adjusting for unobserved taxa and accounting for uncertainty in the estimate correctly detects both true (D) and false (H) differences in richness. <<9BA3DB7AEA9C6C4BA71E5272DAA5A3D1>]>>
Microbiome 5:27. doi: 10.1186/s40168-017-0237-y, Willis, A. Normalization and microbial differential abundance strategies depend upon data characteristics. ISME J. Based on these subsamples of equal size, diversity metrics can be calculated that can contrast ecosystems fairly, independent of differences in sample sizes (Weiss et al., 2017). endstream
endobj
120 0 obj<>
endobj
121 0 obj<>/Encoding<>>>>>
endobj
122 0 obj<>/Font<>/ProcSet[/PDF/Text]/ExtGState<>>>/Type/Page/LastModified(D:20101018155947)>>
endobj
123 0 obj<>
endobj
124 0 obj<>
endobj
125 0 obj/DeviceGray
endobj
126 0 obj/DeviceCMYK
endobj
127 0 obj<>
endobj
128 0 obj<>
endobj
129 0 obj<>
endobj
130 0 obj<>stream
endstream
endobj
131 0 obj<>
endobj
132 0 obj<>
endobj
133 0 obj<>
endobj
134 0 obj<>stream
However, since estimates for alpha diversity metrics are heavily biased when taxa are unobserved, comparing alpha diversity using either raw or rarefied data should not be undertaken. Copyright 2019 Willis. This option has the advantages of leveraging all observed reads, comparing estimates of the actual parameter of interest (taxonomic richness), and accounting for experimental noise. In the setting of Figure 1A, this leads to the erroneous conclusion that Environment A has lower richness than Environment B. The resulting rarefied richness levels are then cA1, cA2, cB1, and cB2. Despite this, alpha diversity estimates that account for unobserved taxa and provide variance estimates are vastly preferable to both plug-in and rarefied estimates, which do not account for unobserved taxa nor provide variance estimates. As we sample more and more of the environment using larger samples, we get closer to understanding the true and total microbial community of interest. Waste not, want not: why rarefying microbiome data is inadmissible. However, there are two incorrect practices surrounding alpha diversity that are preventing the uptake of statistically-motivated methodologies. !Fh{T$zCwJR?Oh,zy,UQ[vb]2A First proposed by Sanders (1968), rarefaction involves selecting a specified number of samples that is equal to or less than the number of samples in the smallest sample, and then randomly discarding reads from larger samples until the number of remaining samples is equal to this threshold (see Hurlbert, 1971 for a deterministic version). doi: 10.1101/305045, Zhang, Z., and Grabchak, M. (2016). bioRxiv 18. I introduce a statistical perspective on the estimation of alpha diversity, and argue that a common view of diversity indices is causing fundamental issues in comparing samples. To illustrate, consider the following example where the alpha diversity metric of interest is strain-level richness of a microbial community (the total number of strain variants present in the environment). The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. AW wrote the manuscript and performed the data analysis. To account for the additional experimental noise, we would use a model that would account for measurement error in assessing differences between amendments. But what happens when we have random measurement error? Methods for phylogenetic analysis of microbiome data. A mathematical theory of communication. J. Anim. %PDF-1.6
%
Suppose I conduct an experiment in which I take a sample from Environment A and count the number of different microbial taxa present in my sample. 0000007622 00000 n
Now suppose we knew that our flux-measuring machine consistently underestimated flux by exactly 5 units. QKjhjZF`N_$ xOV 0000048803 00000 n
In order to draw meaningful conclusions regarding comparisons of microbial communities, it is necessary to use measurement error models to adjust for the uncertainty in the estimation of alpha diversity. 0000067213 00000 n
However, richness estimation has a well-studied statistical literature, and richness estimators that are adapted to microbiome data exist (see Bunge et al., 2014 for a review). 0000048969 00000 n
Rarefying samples to the same number of reads can also lead to incorrect conclusions (C,G). Ecol. ISME J. This article is based on course notes presented by the author at the Marine Biological Laboratory at the STAMPS course in 2013, 2014, 2015, 2016, 2017, and 2018. xb```b``Qa`e`` l,|{5,A/tXxf=~** 6" .}|oyzYETY_?#2eCStfi~4A}`i6N6*tlljQ4GT6.G{Dd\jb3_K%MU(^%P-|%)Hp(Zz.@5@JxY@at!k[d4\N,IX)ar"SKk1. This manuscript has been released as a preprint via bioRxiv (Willis, 2017). gh78?PFj#HfHi:?hsk8f`i9Xjgry2I0o4)~CKCa*s~]Ir$&z4
uzf6SPpI$yjv6M8Nj1_#!:0Kg"7SfYdV'| 8N-yl,i(u0a%?Gm~eRr+:!Ca,gGA+ECk2q0nU|nu$?s$BmQd-W*
?=I5._Mo'P3=)J\'{ea
2p' }_
(1943). d ` 7
27, 379423. We would measure the flux of equally sized soil sites treated with the different amendments, performing biological replicates using multiple sites for each amendment. x`vZC
6@LEsr_Qoe6pT>_s68`$aGH(.;LM 0000012731 00000 n
The author is grateful to Berry Brosi, the MBL, the STAMPS course directors, and the STAMPS participants for countless discussions on this topic. Stat. The nonconcept of species diversity: a critique and alternative parameters. 0000000016 00000 n
Biometrics 71, 10421049. The unique property of microbiome experiments and alpha diversity analysis is that samples do not faithfully represent the entire microbial community under study. doi: 10.1038/ismej.2017.70, PubMed Abstract | CrossRef Full Text | Google Scholar, Bunge, J., Willis, A., and Walsh, F. (2014). Suppose we are interested in modeling the CO2 flux of soil treated with different amendments. 10:2407. doi: 10.3389/fmicb.2019.02407. 11, 20352046. doi: 10.1080/10485252.2016.1190357, Keywords: bioinformatics, computational biology, ecological data analysis, latent variable model, reproducibility, measurement error, Citation: Willis AD (2019) Rarefaction, Alpha Diversity, and Statistics. 12:42. doi: 10.2307/1411, Hurlbert, S. H. (1971). We use our findings about the sample to draw inferences about the environment that we are truly interested in. Marine benthic diversity: a comparative study. Montana State University System, United States. If the measurement error on the machine was random (e.g., with 0 mean and variance of 1 unit for all amendments), this would not affect any particular amendment.
The first method, Figure 1B, is to use the estimates cA1, cA2, cB1, and cB2, and perform modeling and hypothesis testing (such as ANOVA) as if both the bias and variance of these estimates were zero (see, for example, Makipaa et al., 2017). doi: 10.1038/ismej.2016.118, Shannon, C. E. (1948). (2018). doi: 10.1086/282541, Shade, A. The second practice is treating alpha diversity estimates as precisely observed quantities that do not have measurement error. In microbial ecology, analyzing the alpha diversity of amplicon sequencing data is a common first approach to assessing differences between environments. 119 0 obj <>
endobj
10, 14961516. jt$gZ
9O'0z,ZlM6wO,t7+@==p5Ar:lT*}cMNo;V 5
I discuss a statistical perspective on diversity, framing the diversity of an environment as an unknown parameter, and discussing the bias and variance of plug-in and rarefied estimates. While the focus of the examples is microbiome data analysis, the issues and discussion are equally applicable to macroecological data analysis. I then take a sample from Environment B, count the number of different taxa in that sample, and compare it to the number of taxa in Environment A. I am likely to observe higher numbers of different taxa in the sample with more microbial reads. Appl. Stat. Adjusting for sample size when comparing different groups of observations without discarding data is widely prevalent in the sciences, and discarding data to adjust for unequal sample sizes is the exception. While measurement error in microbiome studies affects all analyses of microbiome data, alpha diversity is particularly affected because commonly used estimates of alpha diversity are heavily biased compared to other estimation problems in microbial ecology (such as estimating relative abundances). Microbiol. 0000001988 00000 n
The set-up where an estimate of a quantity converges to the correct value as more samples are obtained is also well understood in statistics. I describe statistical methodology for alpha diversity analysis that adjusts for missing taxa, which should be used in place of existing common approaches to diversity analysis in ecology. ISME J. There are currently two commonly used methods for comparing alpha diversity. AW is supported by start-up funds awarded by the Department of Biostatistics at the University of Washington, and the National Institutes of Health (R35GM133420). (2015). Ann. Estimating the number of species in microbial diversity studies. 28, 563575. I describe the state of the statistical literature for addressing these problems, focusing on the analysis of microbial diversity. Comparing sample taxonomic richness can therefore often lead to incorrect conclusions about true richness (B,F). There is unadjusted error in using our samples as proxies for the entire community. Alpha diversity could be compared exactly, because we would know entire microbial populations with perfect precision. doi: 10.2307/1934145, Lande, R. (1996). #'xXGt)=boOvd*
1m78OSu} BEe (2017). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. (2017). This means that as we increase sampling, our calculation of any diversity metric [e.g., richness (Fisher et al., 1943), Shannon index (Shannon, 1948), and Simpson index (Simpson, 1949)] approaches the value of that diversity metric as calculated using the entire population. The relation between the number of species and the number of individuals in a random sample of an animal population. Oikos 89, 601605. To this criticism, I add misapplying statistical tools is undermining many analyses of alpha diversity. To compare microbial diversity, we would define specific environments (e.g., the distal gut of women aged 35 living in the contiguous U.S.) and compare diversity metrics across different ecological gradients (e.g., with or without irritable bowel syndrome diagnoses). (2017). Biol. ^vB+
J2.U9-VkPAZbz_b?g|@4=Pt_Q/.{|>93*@,p*v>,1kqw"q\j@ FaG
0000009023 00000 n
Appl. While the example employed here concerns microbial richness, the same argument applies to macroecological richness, as well as other alpha diversity indices. 0000013146 00000 n
The strategy outlined here for modeling richness after adjusting for missing species adjusts for both bias and variance, thus accounting for library size differences and incomplete microbial surveys. 0000007451 00000 n
G f>-%4d21eel77QNc
/Ov'J@Hg`2+W2|< 9sg 8y)((($ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). .b lVehxWr=y3(o!!Mwzom9Wg6R.c-x.-s@Pd3'77h(Cpz \u 166KK@D$ISuH@IIY+2f#P+c1pY m@].iiNsAl-mtD J. R. Stat. bioRxiv 123. Soc. (2002). Because technical replicates in microbiome experiments yield different numbers of reads, different community compositions, and different levels of alpha diversity, we have measurement error in microbial experiments. The samples are not of particular interest, except that they reflect the environment from which they were sampled. 0000001884 00000 n
However, it is widely believed that diversity depends on the intensity of sampling. Measurement of diversity. 0000012940 00000 n
Ecology 52, 577586. Improved detection of changes in species richness in high-diversity microbial communities. 10:e1003531. The second method is to generate a normalized, or rarefied sample by randomly discarding reads from all samples until each sample has nA1 reads (the number of reads in the smallest sample), Figure 1C. (2003). PLoS Comput. Expected sample taxonomic richness increases with number of reads (A,E). 151 0 obj<>stream
Diversity is the question, not the answer. To clarify this discussion, I will focus on taxonomic richness (the simplest case), and later generalize the argument to other alpha diversity metrics. Furthermore, not all information collected from the samples was used in making the comparison. 11, 19641974. trailer
Alpha diversity metrics summarize the structure of an ecological community with respect to its richness (number of taxonomic groups), evenness (distribution of abundances of the groups), or both. Annu. 0000002113 00000 n
Stat. Unfortunately, we do not have knowledge of every microbe. 990eAt!9kDg9HQ7eTYTrPAYaF>dX?yl$jXB6e]l*Yi6EMq&X91('\h1mn9sx:7:B175>zr;Ijizc8S6
AF`F=g`, Unfortunately, determining how to meaningfully estimate and compare alpha diversity is not trivial. Nat. Am. For example, the Chao-Bunge (Chao and Bunge, 2002) and breakaway (Willis and Bunge, 2015) estimators of taxonomic richness provide variance estimates, account for unobserved taxa, and are not overly sensitive to the singleton count (the number of species observed once). Suppose we have two biological replicates of samples from each environment: nA1 and nA2 reads from Environment A, nB1 and nB2 reads from Environment B, and nA1 < nB1 < nA2 < nB2. endstream
endobj
135 0 obj<>stream
4k^p
,V\~y{-1/JAC$g@A}W
~EDF(Y2IkGX(o0FRUs Ky"eY"YXLB_#Qc>v6e
\9?. J. Let cij be the observed richness of environment i on replicate j. In meta-analyses, larger studies need to be given more weight in determining the overall effect size, and this is incorporated into a meta-analysis via the smaller standard errors on the effect size estimates. I encourage ecologists to use estimates of diversity that account for unobserved species, and to use measurement error models to compare diversity across ecosystems. J. Nonparametr. Attempting to address this problem using rarefaction actually induces more bias. Because many perturbations to a community affect the alpha diversity of a community, summarizing and comparing community structure via alpha diversity is a ubiquitous approach to analyzing community surveys. Unfortunately, rarefaction is neither justifiable nor necessary, a view framed statistically by McMurdie and Holmes (2014) in the context of comparison of relative abundances.