5.4. Dispersion (Deviation) of data:

5.4.1 Mean, Median, Mode for grouped data

Sometimes when the scores are large it becomes difficult to calculate Mean, Median and Modes. When scores are large we use class intervals to represent data as studied in the 5.1.1. Example 2. When scores are represented in class intervals we follow a slightly different method for the calculation of Mean, Median and Modes. Let us study the method using an example.

5.4.1. Example 1:  Assume that the following data about the presence of 110 people from different age groups in a marriage function is collected.

Working:

 Class Interval      (CI) (Age groups) Frequency (f) 0-10 7 10-20 13 20-30 24 30-40 26 40-50 18 50-60 12 60-70 10

Note:  In the above distribution, we notice, that in each CI, upper limit of a class interval appears again as a lower limit in the next class interval (for example 10 appears twice, once in CI: (0-10) and in CI:(10-20)).

Thus the question arises where should the score for upper limit (10) be included? However, by convention the upper limit is not included in the corresponding class interval and is included in the next class interval.

(i.e. the score 10 is included in CI: 10-20 and not in CI: 0-10)

Let us calculate the mean, median and mode for grouped data.

To recollect, if we had ungrouped scores then

Mean = ( )/Number of scores

Similarly Median would be in the interval ‘30-40’ (which has 55th and 56th occurrence of the score).

Since we do not have individual scores, it will not be possible for us to arrive at the exact mode and exact median easily. In such cases we follow a different method:

We use the following notations to arrive at values as shown below

N = Total number of scores = 110

Mid point’( Or ‘Class mark’)(x) = f= frequency

f(x) = f*x

‘Cumulative frequency’ of a class interval is sum of all the frequencies of all the class intervals up to this class interval.

 C-I Frequency (f) Cumulative frequency(cf) Mid Point (x) of CI f(x) =f*x 0-10 7 7 5 35 10-20 13 20=7+13 15 195 20-30 24 44=20+24 25 600 30-40 26 70=44+26 35 910 40-50 18 88=70+18 45 810 50-60 12 100=88+12 55 660 60-70 10 110=100+10 65 650 Total N=110 = 3860

By definition Mean = = = 35.09 35.1

Since number of score is 110, Median must be between 55th and 56th score which is in the class interval ‘30-40’.(because up to the class interval 20-30 we have 44 (cf) scores and up to the class interval 30-40 we have 70 scores (cf)).

Let

i= size of the class interval = 11(There are 11 scores in each class interval)

L= Lower limit of the class interval which includes the median score (This CI (’30-40’) is also called Median class interval) = 30 ??

F =Cumulative frequency up to the median class interval = 44

m = frequency of the median class interval = 26

Then

Median = L+ ( )*i

= 30+ ( )*11 = 30+ *11 = 30+4.65 = 34.65

Mode lies in the class interval ‘30-40’ and the formula for mode is

Mode = 3*median-2mean

= 3*34.65- 2*35.1

= 33.75

5.4.2 Measures of dispersion: Range, Deviations

Let us take the following example of attendance of a class for 2 different weeks in a month.

First week    : 45,44,41,10,40,60 : Mean (average) = 40

Second week:  35,45,40,45,40,35:  Mean (average) = 40

In both the cases, the average attendance is 40. But we also observe the following:

1. First week has registered a very low attendance of 10 and a high attendance of 60, with maximum deviations (dispersions) from average where as

2. In the second week, the deviations from average are not high. In simple terms we can say that attendance is consistent in the second week.

Thus we conclude that, average may not give a correct picture.

Therefore we need other measures to arrive at meaningful conclusions.

We introduce the following concepts:

The difference between two extreme scores of a distribution is called the ‘Range’

Range = Highest Score- Lowest Score= H-L

Co-efficient of Range = = We have learnt that, median is a score that divides the distribution of score in to two equal parts. Similarly we define Quartile as the distribution of scores in to four equal parts. In such cases the distribution is divided in to four parts as:

1st Quartile (Q1), 2nd Quartile (Q2), 3rd Quartile (Q3). They are scores at 1/4th, 1/2nd and 3/4th the distribution of scores.

We note that 2nd Quartile is the Median itself.

Quartile deviation( Semi interquartile-range) is calculated as

QD = (Q3-Q1)/2

5.4.2 Example 1 : Calculate Range, Co-efficient of Range ,Quartile deviation  and Co –efficient of Quartile deviation for the scores 16,40,23,25,29,24,20,30,32,34,43

Working:

By arranging the scores in ascending order, we get

16,20,23,25,29,30,32,34,40,43.

Note that L= 16, H =43 and N=11

Therefore

Range = H-L = 43-16 = 27

Co-efficient of Range = = =0.46

Since there are 11elements

-for Q1 the score to be considered is 3rd (1/4th of 11) score = 23.

-for Q3 the score to be considered is 8th (3/4th of 11) score = 34

QD = (Q3-Q1)/2

= = 5.5

Co-efficient of QD = (Q3-Q1)/ (Q3+Q1) = = =0.1

For grouped data, we have seen earlier that

If N = Total number of scores,

i = Size of the class interval,

L = Lower limit of the Median class interval,

F = Cumulative frequency (cf) up to the median class interval and

f = frequency of the median class interval

Then

Median = L+ ( )*i = Q2

Similarly for grouped data we calculate

Q1 = L+ ( )*i

Q3 = L+ ( )*i

Where

L = Lower limit of the respective Quartile class interval

F = Cumulative frequency (cf) up to the respective Quartile class interval

f = frequency of the respective Quartile class interval

5.4.2 Example 2: Calculate Range, Co-efficient of Range, Quartile deviation and Co –efficient of Quartile deviation for the grouped data of 100 scores

 CI f 4-8 6 9-13 10 14-18 18 19-23 20 24-28 15 29-33 15 34-38 9 39-43 7

Working:

Here we have N = 100, i = 5 and let us calculate cumulative frequency as follows:

 CI f cf 4-8 6 6 9-13 10 16 14-18 18 34 19-23 20 54 24-28 15 69 29-33 15 84 34-38 9 93 39-43 7 100

For Q1 we need to find 25th (1/4th of 100) element which lies in the class interval ’14-18’

L= 13.5, F=16, f= 18

Q1 = L+ ( ) * i

= 14 + *5 = 14 + 2.5 = 16.5

For Q3 we need to find 75th (3/4th of 100) element which lies in the class interval ’29-33’

L = 29, F = 69, f = 15

Q3 = L+ ( )*i

=29+ *5 = 29+2 =31

QD = (Q3-Q1)/2

= =7.25

Co-efficient of QD = (Q3-Q1)/ (Q3+Q1) = = =0.31

5.4. 3 Mean Deviation for Ungrouped data:

As the name suggests, here we calculate the average deviation from the mean.

Note: Mean Deviation can be found in two ways - using Median method or using Mean method.

5.4.3 Example 1.  Calculate the mean deviation for the scores given below, by BOTH methods.

90,125,115,100,110.

Working:

By rearranging the scores in increasing order we get

90,100,110,115,125

Here we have N= 5, = 90+100+110+115+125=540 The median (M) = 110 (3rd term)

The mean ( ) of scores

is = = =108

 Scores(X) I Method Deviation from Median D= X-M II Method Deviation from Mean D= -M 90 -20(90-110) -18(90-108) 100 -10(100-110) -8(100-108) 110 0(110-110) 2(110-108) 115 5(115-110) 7(115-108) 125 15(125-110) 17(125-108) = 540 =20+10+0+5+15= 50 =18+8+2+7+17= 52

In the above calculation |D| is the absolute value of D (we consider value of D as always positive).

By Median method, Mean deviation = = =10

By Mean method, Mean deviation = = =10.4

5.4.4 Mean Deviation for Grouped data:

Note: As in the case of ungrouped data, Mean Deviation can be found in two ways (Using Median method and Mean method)

5.4.4 Example 1. Compute Mean Deviation of

 C.I f 0-20 8 20-40 10 40-60 19 60-80 14 80-100 9

Workings:

Here we have N = 60 and i= 21

Median (M) = L+ ( )*i

= 40 + *21 = 40+13.3 = 53.3 (Use the values from the table arrived below)

Mean ( ) = = = 52 (Use the values from the table arrived below)

 C.I Mid Point (x) f I Method Deviation from Median II Method Deviation from Mean cf D = x-M f*|D| fx D = x- f*|D| 0-20 10 8 8 -43.3 346.4 80 -42 336 20-40 30 10 18 -23.3 233 300 -22 220 40-60 50 19 37 -3.3 62.7 950 -2 38 60-80 70 14 51 16.7 233.8 980 18 252 80-100 90 9 60 36.7 330.3 810 38 342 N=60 =1206.2 =3120 = 1188

By Median method, Mean Deviation = = = 20.10

By Mean method, Mean Deviation = = = 19.8

5.4.5. Graphical representation of frequency distribution

In earlier sessions we have seen that, graphical representation of data is always easy to understand and interpret. Two important types of representations are histogram and frequency polygon.

Histogram: Here we represent the distribution in vertical rectangles. The rectangles are drawn side by side. The vertical height is proportional to the frequency and is represented on y axis. The class intervals are represented on x-axis .

We need a graph sheet for this type of representation. Class intervals (CI) are marked as the base of rectangle on x axis. Frequencies are marked as the height of rectangle on y axis.

5.4.5 Example 1. Draw histogram and frequency polygon for

 C.I f 0-20 8 20-40 10 40-60 19 60-80 14 80-100 9

Working:

Use a suitable scale for representing Class interval and frequency

(In this case let 1C.I = 1cm and 2f=1cm)

Histogram:

Step 1: Take a graph sheet. Mark 0 and draw x –axis and y-axis.

Step 2: On the x-axis mark the class intervals adjacent to each other from 0.

Use 1cm as the width of each class interval. (Thus the scale for C.I. is 1C.I. = 1cm)

Step 3: Convert frequency to a suitable unit so that the graph fits into one page easily.

In this example use the scale 1cm = 2f. Therefore we have:

8f =4cm, 10f =5cm, 19f = 9.5cm, 14f = 7cm and 9f =4.5cm.

(Thus the scale for frequency is 2f = 1cm)

 C.I f X(base) Y(height) 0-20 8 0 to 1 cm 4cm 20-40 10 1 to 2 cm 5cm 40-60 19 2 to 3 cm 9.5cm 60-80 14 3 to 4 cm 7cm 80-100 9 4 to 5 cm 4.5cm

Step 4: Draw a rectangle of height 4cm representing the first CI (0-20)

Step 5:  Draw a rectangle of height 5cm representing the next CI 20-40,

next to the previous one, so that these two vertical bars have a common side.

Draw the remaining rectangles for other class intervals. Observations:

1.Class intervals are represented on x axis and frequency on y axis

2.The scales chosen for both the axes need not be same.

3. Since the sizes of class intervals are same, width of the rectangles are also same.

4. Since there are no gaps in the class intervals the rectangles are contiguous (No space in between them).

5. Height of the rectangle is proportional to the respective frequencies of the C.I.

Note : If there are breaks in the class intervals(usually in the beginning)  a zig-zag curve (is drawn between the  class intervals).

Frequency Polygon (Method I):

When the mid points of the adjacent tops of the rectangles are joined by straight lines, the figure so

obtained is called ‘frequency polygon’

Step 1:  Draw the histogram as above.

Step 2: Mark non existing class interval (since f = 0, height = 0cm)

one each at two extreme ends (i.e. (-20) - 0 on the left side and 100 -120 on the right side).

Step 3:  Identify middle point for each of the class interval bars

(at -0.5, 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5cms on x-axis and y being (0, 4, 5, 9.5, 7,4.5  and 0 ) respectively).

 C.I f x(Mid of base) y (x,y) (-20) -0 0 -0.5 0 (-0.5 ,0) 0-20 8 0.5 4 (0.5,4) 20-40 10 1.5 5 (1.5,5) 40-60 19 2 .5 9.5 (2.5,9.5) 60-80 14 3.5 7 (3.5,7) 80-100 9 4.5 4.5 (4.5,4.5) 100-120 0 5.5 0 (5.5,0)

Step 4:  Join two consecutive mid points of bars by a straight line to get the required polygon Frequency Polygon (Method II):

Step 1: Mark non existing class intervals one each at two extreme ends

(i.e. (-20) - 0 on the left side and 100 - 120 on the right side).

 C.I f x(Mid of base) y (x,y) -20 -0 0 -0.5 0 (0,0) 0-20 8 0.5 4 (.5,4) 20-40 10 1.5 5 (1.5,5) 40-60 19 2 .5 9.5 (2.5,9.5) 60-80 14 3.5 7 (3.5,7) 80-100 9 4.5 4.5 (4.5,4.5) 100-120 0 5.5 0 (5.5,0)

Step 2:  Identify middle point for each of the class intervals as per the scale used

(in this example 1C.I. = 1cm).

These points are -0.5, 0.5, 1.5, 2.5, 3.5, 4.5 and 5.5 on the x-axis.

Step 3:  Identify the height of frequency for each class interval as per the scale used (2f=1cm).

These points are 0, 4, 5, 9.5, 7,4.5  and 0  on the y-axis.

Step 4: Plot and join these points. Note : If the  mid points  of class intervals are very close,  then we get a frequency curve by joining these points by a smooth curve rather than  joining by straight lines.

5.4 Summary of learning

 No Points to remember 1 Mean = (For grouped data) 2 Median = L+ ( )*i(For grouped data) 3 Mode = 3*median-2mean(For grouped data) 4 Co-efficient of Range = (For ungrouped data) 5 Mean deviation = (For ungrouped data) 6 Mean Deviation = (For grouped data)

5.4.1 Assumed mean method for calculation of mean for grouped data

This method is very useful when class intervals and their frequencies are very large. In this method we assume one of the mid-points to be the mean and find the deviation from that mid-point and hence this method is called  ‘assumed mean method’.

Let us take the example solved earlier (5.4.1 Example 1) to illustrate this method.

Let  25  be the assumed mean (any score can be assumed to be the mean but we normally take the score which is in the middle part of the distribution as assumed mean)

The Deviation D (D = Score- Assumed mean) is calculated for each of the score.

Then Average (mean) = A + ( )/Number of scores

 C-I Frequency (f) Mid Point (x) of CI Deviation D= A-M fD= f*D 0-10 7 5 -20(=5-25) -140 10-20 13 15 -10(=15-25) -130 20-30 24 25= A 0 0 30-40 26 35 10(=35-25) 260 40-50 18 45 20(=45-25) 360 50-60 12 55 30(=55-25) 360 60-70 10 65 40(=65-25) 400 Total N=110 =1110

Average (mean) = A + ( )/Number of scores = 25+1110/110 = 25+10 = 35

This is the same value(approximate) which we got earlier.

5.4.5 Cumulative Frequency Curve (Ogive):

In this type of graph we plot the points corresponding to cumulative frequency for the given data (Ungrouped or grouped) and join the points by a smooth curve.

The given data (actual score or Upper class limit in case of grouped data) is marked along the x-axis. Cumulative frequency is marked along the y-axis.

Let us again consider   the same example we have taken in 5.4.5 Example 1.

5.4.5 Example 2. Draw Ogive for

 C.I f 0-20 8 20-40 10 40-60 19 60-80 14 80-100 9

Working:

1. First arrive at an ‘imaginary’ class interval with 0 frequency

(In this case -20 to 0).

2. Prepare the cumulative frequency table as shown below starting

with the imaginary class interval (-20 to 0).

 C.I f cf Point(x,y) -20-0 0 0 (0,0) 0-20 8 8 (20,8) 20-40 10 18 (40,18) 40-60 19 37 (60,37 60-80 14 51 (80,51) 80-100 9 60 (100,60

3. Use a suitable scale for x-axis for representing the upper Class limit

(In this case let 1cm=10 upper class limit).

4. Use a suitable scale for y-axis for representing the cumulative frequency

(In this case let 1cm =10cf)

5. Plot the points corresponding to each upper class limit as shown in the 