# What is the relationship between probability sampling and inferential statistics

### Difference Between Descriptive and Inferential Statistics (with Comparison Chart) - Key Differences

Sampling. Why is probability relevant to inferential statistics? Statistics are, in one establishing whether differences or associations exist between sets of data. When probability sampling is used, inferential statistics allow estimation of the extent to which . relationship between variables & that any observed relationship. The concepts and procedures of inferential statistics Samples are used to generate the data, and inferential Null hypothesis: a statement of “no difference ”.

It attempts to reach the conclusion to learn about the population, that extends beyond the data available. Definition of Descriptive Statistics Descriptive Statistics refers to a discipline that quantitatively describes the important characteristics of the dataset. For the purpose of describing properties, it uses measures of central tendency, i.

The data is summarised by the researcher, in a useful way, with the help of numerical and graphical tools such as charts, tables, and graphs, to represent data in an accurate way. Moreover, the text is presented in support of the diagrams, to explain what they represent. Definition of Inferential Statistics Inferential Statistics is all about generalising from the sample to the population, i.

It is a convenient way to draw conclusions about the population when it is not possible to query each and every member of the universe. The sample chosen is a representative of the entire population; therefore, it should contain important features of the population. Inferential Statistics is used to determine the probability of properties of the population on the basis of the properties of the sample, by employing probability theory. Methods of inferential statistics: Estimation of parameters Key Differences Between Descriptive and Inferential Statistics The difference between descriptive and inferential statistics can be drawn clearly on the following grounds: We test to see if there is a difference between the mean scores for boys and girls in our sample and whether it is sufficiently large to be true of the population remembering to take into account our sample size.

Imagine we find a difference in the age 14 test scores of boys and girls in our sample such that boys have, on average, lower scores than girls. This could be a fair representation of the wider population or it could be due to chance factors like sampling variation. There is a chance, however small, that we inadvertently selected only the boys with low attainment so our sample does not represent the whole population fairly. The independent t-test, like many statistical analyses, lets us compute a test of statistical significance to find out how likely it is that any difference in scores resulted just from sampling variation.

To understand this properly you will need to be introduced to the p-value Statistical Significance - What is a P-value? A p-value is a probability.

**Understanding Statistical Inference**

It is usually expressed as a proportion which can also be easily interpreted as a percentage: P-values become important when we are looking to ascertain how confident we can be in accepting or rejecting our hypotheses.

However, by using the properties of the normal distribution we can compute the probability that the result we observed in our sample could have occurred by chance. To clarify, we can calculate the probability that the effect or relationship we observe in our sample e. The strength of the effect the size of the difference between the mean scores for boys and girlsthe amount of variation in scores indicated by the standard deviation and the sample size are all important in making the decision we will discuss this in detail when we report completing independent t-tests on Page 1.

- Using Statistical Regression Methods in Education Research

This is our confidence level. You are therefore looking for a p-value that is less than. It is important to remember these are somewhat arbitrary conventions - the most appropriate confidence level will depend on the context of your study see more on this below. The way that the p-value is calculated varies subtlety between different statistical tests, which each generate a test statistic called, for example, t, F or X2 depending on the particular test. This test statistic is derived from your data and compared against a known distribution commonly a normal distribution to see how likely it is to have arisen by chance.

Compare this to Figure 1. If we attain such a value we can say that our result is unlikely to have occurred by chance — it is statistically significant.

### Probability and Inferential Stats

Note that either way we can never be absolutely certain, these are probabilities. There is always a possibility we will make one of two types of error: Type of Error Type I error: When we conclude that there is a relationship or effect but in fact there is not one false positive.

When we conclude that there is no relationship or effect when in fact there is one false negative. The balance of the consequences of these different types or error determines the level of confidence you might want to accept. For example if you are testing the efficacy of a new and very expensive drug or one with lots of unwanted side effects you might want to be very confident that it worked before you made it widely available, you might select a very stringent confidence level e.

Before leaving p-values we should note that the p-value tells us nothing about the size of the effect.

In large samples even very small differences may be statistically significant bigger sample sizes increase the statistical power of the test. Also, remember that statistical significance is not the same as practical importance - you need to interpret your findings and ground them in the context of your field.

Standard Error and Confidence Intervals A core issue in generalising from our sample to the wider population is establishing how well our sample data fits to the population from which it came. If we took lots of random samples from our population, each of the same number of cases, and calculated the mean score for each sample, then the sample means themselves would vary slightly just by chance. Suppose we take 10 random samples, each composed of 10 students, from the Year 11 group in a large secondary school and calculate the mean exam score for each sample.

It is probable that the sample means will vary slightly just by chance sampling variation. While some sample means might be exactly at the population mean, it is probable that most will be either somewhat higher or somewhat lower than the population mean.

So these 10 sample means would themselves have a distribution with a mean and a standard deviation we call this the 'sampling distribution'. If lots of samples are drawn and the mean score calculated for each, the distribution of the means could be plotted as a histogram like in Figure 1.

## Difference Between Descriptive and Inferential Statistics

Histogram of mean scores from a large number of samples The standard deviation of the distribution of the sample means is called the standard error SE. The SE is extremely important in determining how confident we can be about the accuracy of the sample mean as a representation of the population mean.

The standard deviation of the distribution of the sample means the standard error is approximately 10 score points. We can use the properties of the normal distribution to calculate the range above or below the population mean within which we would expect any given sample mean to lie given our sample size.

For the example in Figure 1. Crucially the SE will vary depending on the size of the samples. With larger samples we are more likely to get sample mean scores that cluster closely around the population mean, with smaller samples there is likely to be much more variability in the sample means.

Thus the greater the number of cases in the samples the smaller the SE.