A biologist's guide to statistical thinking and analysis
Sample Problems about Sampling Distributions: Confidence Intervals The follow are the population of the eight planets in our solar system and their period If the sample size had been instead of 64, would each confidence interval be. As sample size increases, Confidence intervals from the likelihood surface using the likelihood ratio distribution. They calculated “enrichment factors” based on a comparison of measured values with concentrations found in the Earth's. Calculating confidence intervals for binomial proportions; Probability calculations when sample sizes are large relative to the population size; . It also includes more complex statistics such as the correlation between related experimental execution, or alignment of the planets, could result in a value for wild.
Introduction Many studies in our field boil down to generating means and comparing them to each other. This is true even if the data are acquired from a single population; the sample means will always be different from each other, even if only slightly. The pertinent question that statistics can address is whether or not the differences we inevitably observe reflect a real difference in the populations from which the samples were acquired.
Put another way, are the differences detected by our experiments, which are necessarily based on a limited sample size, likely or not to result from chance effects of sampling i.
If chance sampling can account for the observed differences, then our results will not be deemed statistically significant In contrast, if the observed differences are unlikely to have occurred by chance, then our results may be considered significant in so much as statistics are concerned.
Whether or not such differences are biologically significant is a separate question reserved for the judgment of biologists.
Most biologists, even those leery of statistics, are generally aware that the venerable t-test a. Several factors influence the power of the t-test to detect significant differences. These include the size of the sample and the amount of variation present within the sample. If these sound familiar, they should. They were both factors that influence the size of the SEM, discussed in the preceding section. This is not a coincidence, as the heart of a t-test resides in estimating the standard error of the difference between two means SEDM.
Greater variance in the sample data increases the size of the SEDM, whereas higher sample sizes reduce it. Thus, lower variance and larger samples make it easier to detect differences. If the size of the SEDM is small relative to the absolute difference in means, then the finding will likely hold up as being statistically significant. In fact, it is not necessary to deal directly with the SEDM to be perfectly proficient at interpreting results from a t-test.
We will therefore focus primarily on aspects of the t-test that are most relevant to experimentalists. These include choices of carrying out tests that are either one- or two-tailed and are either paired or unpaired, assumptions of equal variance or not, and issues related to sample sizes and normality.
We would also note, in passing, that alternatives to the t-test do exist. These tests, which include the computationally intensive bootstrap see Section 6. For reasonably large sample sizes, a t-test will provide virtually the same answer and is currently more straightforward to carry out using available software and websites.
It is also the method most familiar to reviewers, who may be skeptical of approaches that are less commonly used. We will do this through an example. Imagine that we are interested in knowing whether or not the expression of gene a is altered in comma-stage embryos when gene b has been inactivated by a mutation. To look for an effect, we take total fluorescence intensity measurements 15 of an integrated a:: For each condition, we analyze 55 embryos.
Expression of gene a appears to be greater in the control setting; the difference between the two sample means is Summary of GFP-reporter expression data for a control and a test group. Along with the familiar mean and SD, Figure 5 shows some additional information about the two data sets. Recall that in Section 1.
What we didn't mention is that distribution of the data 16 can have a strong impact, at least indirectly, on whether or not a given statistical test will be valid.
Such is the case for the t-test. Looking at Figure 5we can see that the datasets are in fact a bit lopsided, having somewhat longer tails on the right. In technical terms, these distributions would be categorized as skewed right.
Although not critical to our present discussion, several parameters are typically used to quantify the shape of the data including the extent to which the data deviate from normality e.
In any case, an obvious question now becomes, how can you know whether your data are distributed normally or at least normally enoughto run a t-test?
STAT In-class problems on confidence intervals
Before addressing this question, we must first grapple with a bit of statistical theory. The Gaussian curve shown in Figure 6A represents a theoretical distribution of differences between sample means for our experiment. Put another way, this is the distribution of differences that we would expect to obtain if we were to repeat our experiment an infinite number of times. Thus, if we carried out such sampling repetitions with our two populations ad infinitum, the bell-shaped distribution of differences between the two means would be generated Figure 6A.
Note that this theoretical distribution of differences is based on our actual sample means and SDs, as well as on the assumption that our original data sets were derived from populations that are normal, which is something we already know isn't true. So what to do?
Theoretical and simulated sampling distribution of differences between two means. The distributions are from the gene expression example. The black vertical line in each panel is centered on the mean of the differences. As it happens, this lack of normality in the distribution of the populations from which we derive our samples does not often pose a problem.
The reason is that the distribution of sample means, as well as the distribution of differences between two independent sample means along with many 20 other conventionally used statisticsis often normal enough for the statistics to still be valid.
How large is large enough? That depends on the distribution of the data values in the population from which the sample came.
The more non-normal it is usually, that means the more skewedthe larger the sample size requirement. Assessing this is a matter of judgment Figure 7 was derived using a computational sampling approach to illustrate the effect of sample size on the distribution of the sample mean.
- Determining sample size based on confidence and margin of error
- There was a problem providing the content you requested
In this case, the sample was derived from a population that is sharply skewed right, a common feature of many biological systems where negative values are not encountered Figure 7A. As can be seen, with a sample size of only 15 Figure 7Bthe distribution of the mean is still skewed right, although much less so than the original population.
By the time we have sample sizes of 30 or 60 Figure 7C, Dhowever, the distribution of the mean is indeed very close to being symmetrical i.
Illustration of Central Limit Theorem for a skewed population of values. Panel A shows the population highly skewed right and truncated at zero ; Panels B, C, and D show distributions of the mean for sample sizes of 15, 30, and 60, respectively, as obtained through a computational sampling approach.
As indicated by the x axes, the sample means are approximately 3. The y axes indicate the number of computational samples obtained for a given mean value. As would be expected, larger-sized samples give distributions that are closer to normal and have a narrower range of values.
The Central Limit Theorem having come to our rescue, we can now set aside the caveat that the populations shown in Figure 5 are non-normal and proceed with our analysis.
Small sample size confidence intervals
From Figure 6 we can see that the center of the theoretical distribution black line is Furthermore, we can see that on either side of this center point, there is a decreasing likelihood that substantially higher or lower values will be observed.
The vertical blue lines show the positions of one and two SDs from the apex of the curve, which in this case could also be referred to as SEDMs. Thus, for the t-test to be valid, the shape of the actual differences in sample means must come reasonably close to approximating a normal curve.
But how can we know what this distribution would look like without repeating our experiment hundreds or thousands of times? To address this question, we have generated a complementary distribution shown in Figure 6B. In contrast to Figure 6AFigure 6B was generated using a computational re-sampling method known as bootstrapping discussed in Section 6.
It shows a histogram of the differences in means obtained by carrying out 1, in silico repeats of our experiment. Importantly, because this histogram was generated using our actual sample data, it automatically takes skewing effects into account. Notice that the data from this histogram closely approximate a normal curve and that the values obtained for the mean and SDs are virtually identical to those obtained using the theoretical distribution in Figure 6A.
What this tells us is that even though the sample data were indeed somewhat skewed, a t-test will still give a legitimate result. Moreover, from this exercise we can see that with a sufficient sample size, the t-test is quite robust to some degree of non-normality in the underlying population distributions.
Issues related to normality are also discussed further below. One- versus two-sample tests Although t-tests always evaluate differences between two means, in some cases only one of the two mean values may be derived from an experimental sample. For example, we may wish to compare the number of vulval cell fates induced in wild-type hermaphrodites versus mutant m.
Because it is broadly accepted that wild type induces on average three progenitor vulval cells, we could theoretically dispense with re-measuring this established value and instead measure it only in the mutant m background Sulston and Horvitz, In such cases, we would be obliged to run a one-sample t-test to determine if the mean value of mutant m is different from that of wild type.
There is, however, a problem in using the one-sample approach, which is not statistical but experimental. Namely, there is always the possibility that something about the growth conditions, experimental execution, or alignment of the planets, could result in a value for wild type that is different from that of the established norm. Data Analysis and Statistical Inference In-class problems on confidence intervals Answers to conceptual questions on confidence intervals Decide whether the following statements are true or false.
To get higher confidence, we need to make the interval wider interval. This is evident in the multiplier, which increases with confidence level. Increasing the sample size decreases the width of confidence intervals, because it decreases the standard error. In this case, it is either in between andor it is not in between and Hence, the probabliity that the population percentage is in between those two exact numbers is either zero or one. True, as long as we're talking about a CI for a population percentage.
Hence, increasing the sample size by a factor of 4 i. Hence, the interval will be half as wide. This also works approximately for population averages as long as the multiplier from the t-curve doesn't change much when increasing the sample size which it won't if the original sample size is large. The central limit theorem is needed for confidence intervals to be valid.