The relationship between the above measures of central tendency will be interpreted in terms of a continuous frequency curve. If the number of observations of a frequency distribution is increased gradually, then accordingly, we need to have more number of classes, for approximately the same range of values of the variable, and simultaneously, the width of the corresponding classes would decrease. Consequently, the histogram of the frequency distribution will get transformed into a smooth frequency curve, as shown in the following figure.
For a given distribution, the mean is the value of the variable which is the point of balance or centre of gravity of the distribution. The median is the value such that half of the observations are below it and remaining half are above it. In terms of the frequency curve, the total area under the curve is divided into two equal parts by the ordinate at median. Mode of a distribution is a value around which there is maximum concentration of observations and is given by the point at which peak of the curve occurs. For a symmetrical distribution, all the three measures of central tendency are equal i.e. X = Md = Mo, as shown in the following figure.
Imagine a situation in which the symmetrical distribution is made asymmetrical or positively (or negatively) skewed by adding some observations of very high (or very low) magnitudes, so that the right hand (or the left hand) tail of the frequency curve gets elongated.
Consequently, the three measures will depart from each other. Since mean takes into account the magnitudes of observations, it would be highly affected. Further, since the total number of observations will also increase, the median would also be affected but to a lesser extent than mean. Finally, there would be no change in the position of mode. More specifically, we shall have Mo < Md < X , when skewness is positive and X < Md < Mo, when skewness is negative, as shown in the following figure.
Empirical Relation between Mean, Median and Mode
Empirically, it has been observed that for a moderately skewed distribution, the difference between mean and mode is approximately three times the difference between mean and median, i.e.,
This relation can be used to estimate the value of one of the measures when the values of the other two are known.
Example :
The mean and median of a moderately skewed distribution are 42.2 and 41.9 respectively. Find mode of the distribution.For a moderately skewed distribution, the median price of men's shoes is Rs 380 and modal price is Rs 350. Calculate mean price of shoes.
Solution:
(a) Here, mode will be determined by the use of empirical formula.
Choice of a Suitable Average
The choice of a suitable average, for a given set of data, depends upon a number of considerations which can be classified into the following broad categories:
Considerations based on the suitability of the data for an average.Considerations based on the purpose of investigation.Considerations based on various merits of an average.
(a) Considerations based on the suitability of the data for an average:
The nature of the given data may itself indicate the type of average that could be selected. For example, the calculation of mean or median is not possible if the characteristic is neither measurable nor can be arranged in certain order of its intensity. However, it is possible to calculate mode in such cases. Suppose that the distribution of votes polled by five candidates of a particular constituency are given as below:
Since the above characteristic, i.e., name of the candidate, is neither measurable nor can be arranged in the order of its intensity, it is not possible to calculate the mean and median. However, the mode of the distribution is D and hence, it can be taken as the representative of the above distribution.
If the characteristic is not measurable but various items of the distribution can be arranged in order of intensity of the characteristics, it is possible to locate median in addition to mode. For example, students of a class can be classified into four categories as poor, intelligent, very intelligent and most intelligent. Here the characteristic, intelligence, is not measurable. However, the data can be arranged in ascending or descending order of intelligence. It is not possible to calculate mean in this case.If the characteristic is measurable but class intervals are open at one or both ends of the distribution, it is possible to calculate median and mode but not a satisfactory value of mean. However, an approximate value of mean can also be computed by making certain an assumption about the width of class (es) having open ends.If the distribution is skewed, the median may represent the data more appropriately than mean and mode.If various class intervals are of unequal width, mean and median can be satisfactorily calculated. However, an approximate value of mode can be calculated by making class intervals of equal width under the assumption that observations in a class are uniformly distributed. The accuracy of the computed mode will depend upon the validity of this assumption.
(b) Considerations based on the purpose of investigation:
The choice of an appropriate measure of central tendency also depends upon the purpose of investigation. If the collected data are the figures of income of the people of a particular region and our purpose is to estimate the average income of the people of that region, computation of mean will be most appropriate. On the other hand, if it is desired to study the pattern of income distribution, the computation of median, quartiles or percentiles, etc., might be more appropriate. For example, the median will give a figure such that 50% of the people have income less than or equal to it.
Similarly, by calculating quartiles or percentiles, it is possible to know the percentage of people having at least a given level of income or the percentage of people having income between any two limits, etc.
If the purpose of investigation is to determine the most common or modal size of the distribution, mode is to be computed, e.g., modal family size, modal size of garments, modal size of shoes, etc. The computation of mean and median will provide no useful interpretation of the above situations.
(c) Considerations based on various merits of an average: The presence or absence of various characteristics of an average may also affect its selection in a given situation.
If the requirement is that an average should be rigidly defined, mean or median can be chosen in preference to mode because mode is not rigidly defined in all the situations.An average should be easy to understand and easy to interpret. This characteristic is satisfied by all the three averages.It should be easy to compute. We know that all the three averages are easy to compute. It is to be noted here that, for the location of median, the data must be arranged in order of magnitude. Similarly, for the location of mode, the data should be converted into a frequency distribution. This type of exercise is not necessary for the computation of mean.It should be based on all the observations. This characteristic is met only by mean and not by median or mode.It should be least affected by the fluctuations of sampling. If a number of independent random samples of same size are taken from a population, the variations among means of these samples are less than the variations among their medians or modes. These variations are often termed as sampling variations.
Therefore, preference should be given to mean when the requirement of least sampling variations is to be fulfilled. It should be noted here that if the population is highly skewed, the sampling variations in mean may be larger than the sampling variations in median.
It should not be unduly affected by the extreme observations. The mode is most suitable average from this point of view. Median is only slightly affected while mean is very much affected by the presence of extreme observations.It should be capable of further mathematical treatment. This characteristic is satisfied only by mean and, consequently, most of the statistical theories use mean as a measure of central tendency.It should not be affected by the method of grouping of observations. Very often the data are summarized by grouping observations into class intervals. The chosen average should not be much affected by the changes in size of class intervals.
It can be shown that if the same data are grouped in various ways by taking class intervals of different size, the effect of grouping on mean and median will be very small particularly when the number of observations is very large. Mode is very sensitive to the method of grouping.
It should represent the central tendency of the data. The main purpose of computing an average is to represent the central tendency of the given distribution and, therefore, it is desirable that it should fall in the middle of distribution. Both mean and median satisfy this requirement but in certain cases mode may be at (or near) either end of the distribution.
No comments:
Post a Comment