What are common measures of spread in statistics?

Answer

Spread describes how “wide” the dataset is. Two common measures of spread are the five-number summary and the standard deviation.

Calculating a five-number summary requires finding the minimum, maximum, median, and first and third quartiles (the 25th and 75th percentiles). The minimum and maximum tell us the range of the data, and the other three values help draw conclusions about the shape of the data. The five-number summary is often represented visually using a boxplot like the one shown below. It is important to note that the minimum and maximum are calculated using the Interquartile Range (IQR), which is found by subtracting the first quartile from the third quartile. The minimum in our five-number summary is equal to the first quartile minus 1.5 times the IQR. The maximum in our five number summary is calculated by the third quartile plus 1.5 times the IQR. This means that any points that lie beyond the minimum and maximum are represented as points and outliers in the data. The median is at the center of our boxplot.

 Here we have a horizontal boxplot that shows us the outliers (points beyond the minimum or maximum), as well as the minimum, the first quartile, the median, the third quartile, and the maximum respectively. The boxplot is also able to show us the Interquartile Range, which is the third quartile minus the first quartile.

Unlike the five-number summary, the standard deviation is a single value.  The standard deviation is interpreted as the average distance between the data points and the mean. Smaller standard deviations suggest that the observed data is closer to the mean.

The standard deviation is calculated using another measure of spread: the variance. The variance is difficult to interpret on its own, but we can find the standard deviation by taking the square root of the variance. The sample variance can be calculated using the equation below, where \(\bar{x}\) is the sample mean, n is the sample size, and the \(x_i\) are all of the sample values.

\[s^2 = \frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1}\]

To distinguish between the sample and population variation, we abbreviate the sample variance with \(s^2\) and the population variance with \(\sigma^2\).

Topics

  • Last Updated Apr 16, 2021
  • Views 48
  • Answered By Lydia Carter

FAQ Actions

Was this helpful? 0 0