5.4. Dispersion
(Deviation) of data:
5.4.1 Mean, Median, Mode for grouped data
Sometimes when the scores
are large it becomes difficult to calculate Mean, Median and Modes. When scores
are large we use class intervals to represent data as studied in the 5.1.1.
Example 2. When scores are represented in class intervals we follow a slightly
different method for the calculation of Mean, Median and Modes. Let us study
the method using an example.
5.4.1. Example 1:
Assume that the following data about the presence of 110 people from
different age groups in a marriage function is collected.
Working:
Class Interval (CI) (Age groups) 
Frequency (f) 
010 
7 
1020 
13 
2030 
24 
3040 
26 
4050 
18 
5060 
12 
6070 
10 
Note: In the above distribution, we notice, that in
each CI, upper limit of a class interval appears again as a lower limit in the
next class interval (for example 10 appears twice, once in CI: (010) and in
CI:(1020)).
Thus the question arises
where should the score for upper limit (10) be included? However, by convention
the upper limit is not included in the corresponding class interval and is
included in the next class interval.
(i.e. the score 10 is
included in CI: 1020 and not in CI: 010)
Let us calculate the mean,
median and mode for grouped data.
To recollect, if we had
ungrouped scores then
Mean
= (_{})/Number of scores
Similarly
Median would be in the interval ‘3040’ (which has 55^{th} and 56^{th}
occurrence of the score).
Since
we do not have individual scores, it will not be possible for us to arrive at
the exact mode and exact median easily. In such cases we follow a different
method:
We use the following
notations to arrive at values as shown below
N = Total number of scores
= 110
‘Mid point’( Or ‘Class mark’)(x)
= _{}
f= frequency
f(x) = f*x
‘Cumulative
frequency’ of a class interval is sum of all the
frequencies of all the class intervals up to this class interval.
CI 
Frequency (f) 
Cumulative frequency(cf) 
Mid Point (x) of CI 
f(x) =f*x 
7 
7 
5 
35 

1020 
13 
20=7+13 
15 
195 
2030 
24 
44=20+24 
25 
600 
3040 
26 
70=44+26 
35 
910 
4050 
18 
88=70+18 
45 
810 
5060 
12 
100=88+12 
55 
660 
6070 
10 
110=100+10 
65 
650 
Total

N=110 


_{} = 3860 
By definition Mean = _{} = _{} = 35.09 _{} 35.1
Since number of score is
110, Median must be between 55^{th} and 56^{th} score which is
in the class interval ‘3040’.(because up to the class interval 2030 we have 44 (cf) scores and up to the class interval 3040 we
have 70 scores (cf)).
Let
i= size of the class
interval = 11(There are 11 scores in each class interval)
L= Lower limit of the class
interval which includes the median score (This CI (’3040’)
is also called Median class interval) = 30 ??
F =Cumulative frequency up to the median class interval = 44
m = frequency of the median
class interval = 26
Then
Median = L+ (_{})*i
= 30+ (_{})*11 = 30+_{}*11 = 30+4.65 = 34.65
Mode lies in the class
interval ‘3040’ and the formula for mode is
Mode = 3*median2mean
= 3*34.65 2*35.1
= 33.75
5.4.2 Measures of
dispersion: Range, Deviations
Let us take the following
example of attendance of a class for 2 different weeks in a month.
First week : 45,44,41,10,40,60 : Mean (average) = 40
Second week: 35,45,40,45,40,35: Mean (average) = 40
In both the cases, the
average attendance is 40. But we also observe the following:
1. First week has registered
a very low attendance of 10 and a high attendance of 60, with maximum
deviations (dispersions) from average where as
2. In the second week, the
deviations from average are not high. In simple terms we can say that
attendance is consistent in the second week.
Thus we conclude that,
average may not give a correct picture.
Therefore we need other
measures to arrive at meaningful conclusions.
We introduce the following
concepts:
The difference between two
extreme scores of a distribution is called the ‘Range’
Range = Highest
Score Lowest Score= HL
Coefficient of
Range = _{} =_{}
We have learnt that, median
is a score that divides the distribution of score in to two equal parts.
Similarly we define Quartile as the distribution of scores in to four equal
parts. In such cases the distribution is divided in to four parts as:
1st Quartile (Q_{1}),
2nd Quartile (Q_{2}), 3rd Quartile (Q_{3}). They are scores at
1/4th, 1/2nd and 3/4th the distribution of scores.
We note that 2nd Quartile is the Median itself._{}
Quartile deviation( Semi
interquartilerange) is calculated as
QD = (Q_{3}Q_{1})/2
5.4.2 Example 1 : Calculate Range, Coefficient of
Range ,Quartile deviation and Co
–efficient of Quartile deviation for the scores 16,40,23,25,29,24,20,30,32,34,43
Working:
By arranging the scores in
ascending order, we get
16,20,23,25,29,30,32,34,40,43.
Note that L= 16, H =43 and
N=11
Therefore
Range = HL = 4316 = 27
Coefficient of Range =_{}=_{}=0.46
Since there are 11elements
for Q_{1 }the
score to be considered is 3^{rd} (1/4^{th} of 11) score = 23.
for Q_{3 }the
score to be considered is 8^{th} (3/4^{th} of 11) score = 34
QD = (Q_{3}Q_{1})/2
= _{}= 5.5
Coefficient of
QD = (Q_{3}Q_{1})/ (Q_{3}+Q_{1})
= _{}=_{}=0.1
For
grouped data, we have seen earlier that
If N = Total number of
scores,
i = Size of the class
interval,
L = Lower limit of the
Median class interval,
F = Cumulative frequency
(cf) up to the median class interval and
f = frequency of the median
class interval
Then
Median = L+ (_{})*i = Q_{2}
Similarly for grouped data
we calculate
Q_{1} =
L+ (_{})*i
Q_{3} =
L+ (_{})*i
Where
L = Lower limit of the
respective Quartile class interval
F = Cumulative frequency
(cf) up to the respective Quartile class
interval
f = frequency of the
respective Quartile class interval
5.4.2 Example 2: Calculate Range, Coefficient of
Range, Quartile deviation and Co –efficient of Quartile deviation for the
grouped data of 100 scores
CI 
f 
48 
6 
913 
10 
1418 
18 
1923 
20 
2428 
15 
2933 
15 
3438 
9 
3943 
7 
Working:
Here we have N = 100, i = 5
and let us calculate cumulative frequency as follows:
CI 
f 
cf 
48 
6 
6 
913 
10 
16 
1418 
18 
34 
1923 
20 
54 
2428 
15 
69 
2933 
15 
84 
3438 
9 
93 
3943 
7 
100 
For Q_{1} we need
to find 25^{th} (1/4^{th} of 100) element which lies in the
class interval ’1418’
L= 13.5, F=16, f= 18
Q_{1} = L+ (_{}) * i
= 14 +_{}*5 = 14 + 2.5 = 16.5
For Q_{3} we need
to find 75^{th} (3/4^{th} of 100) element which lies in the
class interval ’2933’
L = 29, F = 69, f = 15
Q_{3} = L+ (_{})*i
=29+_{}*5 = 29+2 =31
QD = (Q_{3}Q_{1})/2
= _{}=7.25
Coefficient of QD = (Q_{3}Q_{1})/ (Q_{3}+Q_{1})
= _{}=_{}=0.31
5.4.
3 Mean Deviation for Ungrouped data:
As the name suggests, here
we calculate the average deviation from the mean.
Note: Mean
Deviation can be found in two ways  using Median method or using Mean method.
5.4.3 Example 1. Calculate the mean deviation for the scores
given below, by BOTH methods.
90,125,115,100,110.
Working:
By rearranging the scores
in increasing order we get
90,100,110,115,125
Here we have N= 5, _{}= 90+100+110+115+125=540
_{} The median (M) = 110
(3rd term)
The mean (_{}) of scores
is = _{} = _{}=108
Scores(X) 
I Method Deviation from Median D= XM 
II Method Deviation from Mean 
90 
20(90110) 
18(90108) 
100 
10(100110) 
8(100108) 
110 
0(110110) 
2(110108) 
115 
5(115110) 
7(115108) 
125 
15(125110) 
17(125108) 
_{}= 540 
_{} =20+10+0+5+15= 50 
_{} =18+8+2+7+17= 52 
In the above calculation
D is the absolute value of D (we consider value of D as always positive).
By Median method, Mean
deviation = _{} = _{}=10
By Mean method, Mean
deviation = _{} = _{}=10.4
5.4.4
Mean Deviation for Grouped data:
Note: As in the
case of ungrouped data, Mean Deviation can be found in two ways (Using Median
method and Mean method)
5.4.4 Example 1.
Compute Mean Deviation of
C.I 
f 
020 
8 
2040 
10 
4060 
19 
6080 
14 
80100 
9 
Workings:
Here we have N = 60 and i=
21
Median (M) = L+ (_{})*i
= 40 +_{}*21 = 40+13.3 = 53.3 (Use the values from the table arrived
below)
Mean (_{}) = _{} =_{} = 52 (Use the values from the table arrived below)
C.I 
Mid Point (x) 
f 
I Method Deviation from Median 
II Method Deviation from
Mean 

cf 
D = xM 
f*D 
fx 
D = x_{} 
f*D 

020 
10 
8 
8 
43.3 
346.4 
80 
42 
336 
2040 
30 
10 
18 
23.3 
233 
300 
22 
220 
4060 
50 
19 
37 
3.3 
62.7 
950 
2 
38 
6080 
70 
14 
51 
16.7 
233.8 
980 
18 
252 
80100 
90 
9 
60 
36.7 
330.3 
810 
38 
342 


N=60 


_{}=1206.2 
_{}=3120 

_{} = 1188 
By Median method, Mean Deviation = _{} =_{}= 20.10
By Mean
method, Mean Deviation = _{} =_{}= 19.8
5.4.5.
Graphical representation of frequency distribution
In earlier sessions we have
seen that, graphical representation of data is always easy to understand and
interpret. Two important types of representations are histogram and frequency
polygon.
Histogram:
Here we represent the distribution in vertical rectangles. The rectangles are
drawn side by side. The vertical height is proportional to the frequency and is
represented on y axis. The class intervals are represented on xaxis .
We need a graph sheet for
this type of representation. Class intervals (CI) are marked as the base of
rectangle on x axis. Frequencies are marked as the height of rectangle on y
axis.
5.4.5 Example 1.
Draw histogram and frequency polygon for
C.I 
f 
020 
8 
2040 
10 
4060 
19 
6080 
14 
80100 
9 
Working:
Use a suitable scale for
representing Class interval and frequency
(In this case let 1C.I =
1cm and 2f=1cm)
Histogram:
Step
1: Take a graph sheet. Mark 0 and draw x –axis and yaxis. Step
2: On the xaxis mark the class intervals adjacent to each other from 0. Use
1cm as the width of each class interval. (Thus the scale for C.I. is 1C.I. =
1cm) Step
3: Convert frequency to a suitable unit so that the graph fits into one page
easily. In
this example use the scale 1cm = 2f. Therefore we have: 8f =4cm, 10f =5cm, 19f = 9.5cm, 14f
= 7cm and 9f =4.5cm. (Thus
the scale for frequency is 2f = 1cm)
Step
4: Draw a rectangle of height 4cm representing the first CI (020) Step
5: Draw a rectangle of height 5cm
representing the next CI 2040, next
to the previous one, so that these two vertical bars have a common side. Draw
the remaining rectangles for other class intervals. 

Observations:
1.Class intervals are
represented on x axis and frequency on y axis
2.The scales chosen for
both the axes need not be same.
3. Since the sizes of class
intervals are same, width of the rectangles are also same.
4. Since there are no gaps
in the class intervals the rectangles are contiguous (No space in between
them).
5. Height of the rectangle
is proportional to the respective frequencies of the C.I.
Note : If there
are breaks in the class intervals(usually in the beginning) a zigzag curve (is drawn between the class intervals).
Frequency Polygon (Method I):
When the mid points of the
adjacent tops of the rectangles are joined by straight lines, the figure so
obtained is called ‘frequency polygon’
Step
1: Draw the histogram as above. Step
2: Mark non existing class interval (since f =
0, height = 0cm) one
each at two extreme ends (i.e. (20)  0 on the left side and 100 120 on the
right side). Step
3: Identify middle point for each of
the class interval bars (at
0.5, 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5cms on xaxis and y being (0, 4, 5, 9.5, 7,4.5
and 0 )
respectively).
Step
4: Join two consecutive mid points of
bars by a straight line to get the required polygon 

Frequency Polygon (Method II):
Step
1: Mark non existing class intervals one each at two extreme ends (i.e.
(20)  0 on the left side and 100  120 on the right side).
Step
2: Identify middle point for each of
the class intervals as per the scale used (in
this example 1C.I. = 1cm). These
points are 0.5, 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5 on the xaxis. Step
3: Identify the height of frequency
for each class interval as per the scale used (2f=1cm). These
points are 0, 4, 5, 9.5, 7,4.5 and 0 on the
yaxis. Step
4: Plot and join these points. 

Note : If the mid points
of class intervals are very close,
then we get a frequency curve by
joining these points by a smooth curve rather than joining by straight lines.
5.4.5 Cumulative Frequency Curve
(Ogive):
In this type of graph we
plot the points corresponding to cumulative frequency for the given data
(Ungrouped or grouped) and join the points by a smooth curve.
The given data (actual
score or Upper class limit in case of grouped data) is marked along the xaxis.
Cumulative frequency is marked along the yaxis.
Let us again consider the same example we have taken in 5.4.5
Example 1.
5.4.5 Example 2.
Draw Ogive for
C.I 
f 
020 
8 
2040 
10 
4060 
19 
6080 
14 
80100 
9 
Working:
1. First
arrive at an ‘imaginary’ class interval
with 0 frequency (In this case 20 to 0). 2.
Prepare the cumulative frequency table as shown below starting with
the imaginary class interval (20 to 0).
3.
Use a suitable scale for xaxis for representing the upper Class limit (In
this case let 1cm=10 upper class limit). 4.
Use a suitable scale for yaxis for representing the cumulative frequency (In
this case let 1cm =10cf) 5.
Plot the points corresponding to each upper class limit as shown in the adjacent graph. 6.
Join these points by a smooth curve (This curve is Ogive). 

From the
cumulative frequency curve it will be easy to arrive at frequencies for
different class intervals.
(For example: From the
above graph we can conclude that the cumulative frequency for scores up to 30
is 13. This point is circled red in the graph).
5.4
Summary of learning
No 
Points to remember 
1 
Mean = _{}(For grouped data) 
2 
Median = L+ (_{})*i(For grouped data) 
3 
Mode = 3*median2mean(For
grouped data) 
4 
Coefficient of Range = _{}(For ungrouped data) 
5 
Mean deviation = _{}(For ungrouped data) 
6 
Mean Deviation = _{}(For grouped data) 
Additional
Points:
5.4.1 Assumed mean method for
calculation of mean for grouped data
This
method is very useful when class intervals and their frequencies are very
large. In this method we assume one of the midpoints to be the mean and find
the deviation from that midpoint and hence this method is called ‘assumed mean method’.
Let us
take the example solved earlier (5.4.1 Example 1) to illustrate this method.
Let 25 be the assumed mean (any
score can be assumed to be the mean but we normally
take the score which is in the middle part of the distribution as assumed mean)
The
Deviation D (D = Score Assumed mean) is calculated for each of the score.
Then
Average (mean) = A + (_{})/Number of scores
CI 
Frequency (f) 
Mid Point (x) of CI 
Deviation D= AM 
fD= f*D 
010 
7 
5 
20(=525) 
140 
1020 
13 
15 
10(=1525) 
130 
2030 
24 
25= A 
0 
0 
3040 
26 
35 
10(=3525) 
260 
4050 
18 
45 
20(=4525) 
360 
5060 
12 
55 
30(=5525) 
360 
6070 
10 
65 
40(=6525) 
400 
Total

N=110 


_{} =1110 
Average (mean) = A + (_{})/Number of scores = 25+1110/110 = 25+10 = 35
This is the same
value(approximate) which we got earlier.