Share on facebook
Share on twitter
Share on linkedin
Share on pinterest

Measure of Central Tendency and Measure of Spread

Summarizing the quantitative data can help us understand them better. In this article, we’ll see various methods to summarize quantitative data by the measure of central tendency(such as mean, median, and mode) and by the measure of spread(such as range, variance, and IQR).

MEASURES OF CENTRAL TENDENCY:

The measures of central tendency attempt to describe the center of the distribution of our data. The three most common estimators of central tendency are the arithmetic mean, the median and the mode.

MEAN:

The mean is the arithmetic average of all the observations in the data set. The arithmetic mean of a population with N elements is represented by the Greek symbol μ, pronounced as “mew” and the arithmetic mean of a sample with n elements is represented by \bar{x}, pronounced as “x-bar.”

\text { Population Mean: } \mu=\frac{\sum x_{i}}{N}

where xi are the data values and N is the size of the population.

The sample mean of n observations is given by

\text { Sample Mean: } \bar{x}=\frac{\sum x_{i}}{n}

The mean is extremely sensitive to the presence of outliers and may not be a reliable measurement when we have outliers in our data.

MEDIAN:

The median is the middle value of the data when the data is sorted from the least to the greatest.

The median of the variable can be calculated as follows

If the number of observations is odd, then the middle value is the median of the data.

M e d i a n=\left(\frac{n+1}{2}\right)

where n is the number of observations

If the number of observations is even, the median is the average of the two middle values.

\text {Median}=\frac{\left(\frac{n}{2}\right)+\left(\frac{n}{2}+1\right)}{2}

Unlike the mean, the median is not influenced by the outlier values.

MODE:

The mode is the value that occurs most frequently in a data set.

Some data sets can have more than one mode. If there are two values with the highest frequency, then it is said to be bimodal. If there are more than two modes, then the data is said to be multi-modal.

Having two or more modes, however, does not mean that they all occurred the same number of times, but they are more common than the other values.

On the other hand, it is also possible that some data sets do not have a mode because each value occurs only once.

COMPARISON OF MEAN, MEDIAN AND MODE:

Comparing the mean, median, and mode can give us the shape of the distribution.

In a symmetric distribution, all three measures are identical.

If the mean is higher than the median and the mode, then the distribution is skewed to the right, or positively skewed.

On the other hand, if the mean is smaller than the median or the mode, then the distribution is skewed to the left or negatively skewed.

measure of central tendency

Source

MEASURE OF SPREAD:

The measure of Central tendency gives us information only about the center of the distribution. However, it is also essential to understand the spread of the distribution.

The spread of the data is a measure that tells us how much variation is there in the data.
Standard metrics to quantify the spread are the range, variance, and IQR.

RANGE:

A straightforward, but not particularly useful, measure of spread is the range. The range is calculated as the difference between the maximum and minimum values in a data set.

\text { Range }=\operatorname{Max}-\operatorname{Min}

As it only considers the maximum and the minimum values, it is highly impacted by the presence of outliers.

VARIANCE AND STANDARD DEVIATION:

Variance and standard deviation are measures of the spread that evaluates how much the data are dispersed with respect to the arithmetic mean.

Higher the variance and standard deviation, the higher the data are spread out from the mean.

Variance is calculated as the average of the squared deviations from the mean of X.

The population variance is denoted as σ2, and the sample variance is written as S2.

\sigma^{2}=\frac{\sum_{i=1}^{N}\left(X_{i}-\mu\right)^{2}}{N}

Xi represents the data values
μ represents the population mean
N represents the number of units in the population

The sample variance is given by,

S^{2}=\frac{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}}{n-1}

 

Xi represents the ith unit, starting from the first observation to the last

\bar{X} represents the sample mean

n represents the number of units in the population

Since variance considers the mean of “squared” deviations, the unit of measurement is changed.

For this purpose, we calculate the square root of variance called standard deviation.

Since we are taking the square root, the standard deviation reverts the unit of measurement to its original scale.

The population standard deviation is denoted by σ, and the sample standard deviation is denoted as s.

\sigma=\sqrt{\sigma^{2}}\ \ \ \ (\text{for the population })

S=\sqrt{S^{2}}\ \ \ \ (\text { for samples })

INTER-QUARTILE RANGE:

The inter-quartile range(IQR) is defined as the difference between the upper quartile(Q3) and the lower quartile(Q1).

I Q R=Q 3-Q_{1}

The lower quartile describes 25% of the data, and the upper quartile describes 75% of the data. Thus, IQR gives us the spread of the data around the median. IQR is highly resistant to outliers.

CONCLUSION:

Summarizing the quantitative data can help us understand them better. In this tutorial, we discussed various methods to summarize the data.

We first discussed the measure of central tendency, which describes the center of the distribution of the data.

Metrics like mean, median, and mode can be used to quantify central tendency.

The measure of spread tells us how much our data is spread out. Some of the common metrics used to quantify spread are the range, variance and inter-quartile range.

Love What you Read. Subscribe to our Newsletter.

Stay up to date! We’ll send the content straight to your inbox, once a week. We promise not to spam you.

Subscribe Now! We'll keep you updated.