5.5 Standard Deviation:
Introduction:
You must have read
newspapers comparing performances of two batsmen in cricket. What do they
compare? They say one is more consistent than the other and one is more stylish
than the other.
Stylishness is a quality
which cannot be compared using runs scored by batsmen. However, they compare
one being better than the other and their consistency in batting, based on the
runs scored in several innings.
Let us study how Statistics
can help us in this regard.
Standard deviation:
You must have heard people
talking about deviations (deviation in rules, deviation in works, deviation in
results, etc)? Deviation is always compared with respect to a standard.
Standard could be thought
of as an average (also called arithmetic mean).
5.5 Example 1: Let a batsman’s score in 6 innings be 48,50,54,46,48,54
Working:
Notations used:
X = Set of scores (48,50,54,46,48,54)
N = Number of scores (=6)
=The Arithmetic mean (AM) = (_{})/N
d = Deviation from the arithmetic
mean = X 
Step 1: Find the Arithmetic
mean of his scores (AM) = 50 =(48+50+54+46+48+54)/6
Step 2: Find d (= XAM) and
d^{2 }for each of the scores
Table of Calculation: (with
actual AM)
No 
Runs(X) 
Deviation (d) = X 
(Deviation)^{2}
= d^{2} 
1 
48 
2 
4 
2 
50 
0 
0 
3 
54 
4 
16 
4 
46 
4 
16 
5 
48 
2 
4 
6 
54 
4 
16 

_{}=300 
_{}=0 
_{} = 56 
Step 3: Calculate Variance
as =_{}/ N
Step 4: Calculate Standard
deviation (SD) as = _{} = _{}
SD is denoted by Greek
letter_{}.
In the above
Definition: ‘Standard deviation’
is the square root of the arithmetic average of the squares of the deviations
from the mean.
Interpretation: In this
example we say that on an average, the batsman’s scores deviate from the
arithmetic mean (=50) by 3.05(_{}3 ).
It can be
predicted that, more or less this batsman is likely to score 4753 {(503)(50+3)} runs in future matches.
Note:
If the batsman’s score were to be 48,100,50,10,2,80,
it would not have been possible to predict reasonably accurately . Since the batsman was consistent with
scores around 50, it was possible to predict.
General Procedure:
Let X = {x_{1,} x_{2 }, x_{3………..} x_{n}}_{
}be the scores
N = Number of scores
= Arithmetic mean (AM) = (x_{1}+x_{2
}+ x_{3}+…… x_{n})/N= _{}/ N
Step 1: Calculate deviation
from AM, d (=X) and d^{2 }for
each of the scores
Step 2: Calculate Variance
= (_{})/ N
Step 3: Calculate Standard
deviation (SD)
SD =_{} = _{}
Alternate method of finding_{}, when AM is not a whole number.
In the above example the
Arithmetic mean (=50) happened to be an integer and our computations became
easy. If arithmetic mean contains decimals, finding d^{2 }will be tough
and in such cases we follow a different method.
To start with, we assume Arithmetic mean to be one of the scores itself.
Then we calculate d (= XA where A is the assumed AM) and d^{2} for
each of the scores. Then actual AM and SD are derived
as follows:
Actual AM = Assumed AM + (_{})/N
SD (_{}) = _{} [(_{}d^{2})/N  ((_{}d)/N)^{2}]
Let us take the above
example and find SD using this alternate method.
Let us assume AM to be 54
(A = 54.) Here N = 6.
Table of Calculation (with
assumed AM)
No 
Runs(X) 
Deviation(D) d= XA 
(Deviation)^{2} = d^{2} 
1 
48 
6 
36 
2 
50 
4 
16 
3 
54 
0 
0 
4 
46 
8 
64 
5 
48 
6 
36 
6 
54 
0 
0 


_{}= 24 
_{}= 152 
Actual AM = Assumed AM + (_{})/N= 54 + (24/6) = 544 = 50
SD(_{}) = _{} [(_{}d^{2})/N  ((_{}d)/N)^{2}]
= _{} [152/6 –(24/6)^{2}] = _{} (25.3316) = _{} (9.33) =3.05
You will notice that both
the methods give same SD in all cases.
When same scores repeat
many times in the data, we follow a slightly different method as listing
individual scores and calculating SD becomes tedious.
Standard
Deviation for grouped data:
Let the scores and
frequencies be
Scores(X) ŕ 
X_{1} 
X_{2} 
X_{3} 
……_{} 
X_{n}_{} 
Frequency(f)
ŕ 
f_{1} 
f_{2} 
f_{3} 
…….._{} 
f_{n} 
N = Total number of
frequencies = f_{1 }+ f_{2 }+ f_{3 }+…….. f_{n}= _{}
Step 1: Find f*x for each
of the scores
Step 2: Find the Arithmetic
mean = (_{})/N
Step 3: Find deviation for
each of the score d = (X)
Step 4: Find the variance
of distribution = (_{}(f*d^{2}))/N
Step 5: Calculate SD(_{}) = _{} [(_{}(f*d^{2}))/N]
5.5 Example 2: Marks obtained in a test by 60
students are given below. Find AM and SD.
Scores(X) ŕ 
10_{} 
20_{} 
30_{} 
40_{} 
50_{} 
60 
Frequency(f)
ŕ 
8_{} 
12_{} 
20_{} 
10_{} 
7_{} 
3 
Workings:
N (Total number of
frequencies) =_{} = 8+12+20+10+7+3=60
Score(X) 
Frequency(f) 
fX 
Deviation= (X) 
d^{2} 
f*d^{2} 
10 
8 
80 
20.83 
433.89 
3471.11 
20 
12 
240 
10.83 
117.29 
1407.47 
30 
20 
600 
.83 
0.69 
13.78 
40 
10 
400 
9.17 
84.09 
840.89 
50 
7 
350 
19.17 
367.49 
2572.42 
60 
3 
180 
29.17 
850.89 
2552.67 

N=_{}=60 
_{}= 1850 


_{}(f*d^{2})=10858.33 
Arithmetic Mean == (_{})/N= 1850/60 =30.83
Variance = (_{}f*d^{2})/N = 10858.33/60= 180.97
SD (_{}) = _{} [_{}(f*d^{2})/N] =_{} (180.97) =13.45
Interpretation: An average
mark of students is 30.83. The marks of students deviate from the Mean score by
about 13 marks.
In the above working you
must have observed that AM had decimals. Because of this reason d, d^{2}
and f*d^{2} were all decimals and calculations were difficult.
In such cases we use an
alternate method which is easier to work with.
Alternate Method
Step 1: Assume any of the score as Average (A)
Step 2: Find the deviation
d, from the assumed average for every score (d=XA).
Step 3: Find f*d, d^{2} ,f*d^{2} for each of the scores.
Step 4: Arrive at AM and SD
as given below.
Arithmetic Mean == A + _{}/N, where N =_{}
SD (_{})=_{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}
]
In the above example let us
assume 30 to be the
Average (A) and by following steps 1 to 3 we get
Score(X) 
Frequency(f) 
Deviation(d) =XA 
f*d 
d^{2} 
f*d^{2} 
10 
8 
20 
160 
400 
3200 
20 
12 
10 
120 
100 
1200 
30 
20 
0 
0 
0 
0 
40 
10 
10 
100 
100 
1000 
50 
7 
20 
140 
400 
2800 
60 
3 
30 
90 
900 
2700 

N=_{}=60 

_{}=50 

_{}(f*d^{2})=10900 
We note that AM = A+ (_{})/ (N) = 30+50/60 = 30+0.83= 30.83
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]
= _{} [(10900/60) – (50/60)^{2}]
= _{} (181.67  0.69) =_{} (180.97) =13.45
The average mark of
students is 30.83. The marks of students deviate from the Mean score by about
13 marks.
Observe that
we got same results in both the methods.
We have seen earlier that
many times data is collected in class intervals and not as individual scores.
In such cases we need to calculate AM in a different ways and not as average of
scores.
How to find SD and
interpret results if we have grouped data?
Step 1: Find the
midpoints(x) for each of the class interval.
Step 2: Find the product
f*x for each of the class interval.
Step3: Calculate the
arithmetic mean == ( _{})/N, where N =_{} .
Step 4: Find the Deviation
d from the arithmetic mean () for each the
class intervals. (d=X)
Step 5: Find d^{2 }and
f*d^{2} for each of the class interval.
Step 6: Calculate SD using
the formula SD (_{}) = _{} [_{}(f*d^{2})/N]
5.5 Example 3: Marks obtained in a test by
students are
Marks 
Frequency(f) 
Midpoint(x) 
f*x 
d=X) 
d^{2} 
f*d^{2} 
2530 
5 
28 
140 
9.2 
84.64 
423.2 
3035 
10 
33 
330 
4.2 
17.64 
176.4 
3540 
25 
38 
950 
0.8 
0.64 
16 
4045 
8 
43 
344 
5.8 
33.64 
269.12 
4550 
2 
48 
96 
10.8 
116.64 
233.28 

N =_{} = 50 

_{}=1860 


_{}(f*d^{2})=1118 
Working:
Arithmetic mean== _{}/N = 1860/50 = 37.2
SD (_{}) = _{} [_{}(f*d^{2})/N] = _{} (1118/50) = _{} (22.36) =4.728
Interpretation:
The average marks scored is 37.2. The marks of students deviate from the Mean
(average) score by about 5 marks.
In the above working you
must have observed that AM had decimals. Because of this reason d, d^{2}
and f*d^{2} had decimals and calculation was difficult. In such cases
we use an alternate method which is easier to work with.
Alternate Method (Step – Deviation Method)
Step 1: Assume one of the middle values of the class
interval as the arithmetic mean (A).
Step 2: Find the
‘stepdeviation’ (d) from the assumed mean d=(XA)/i:
Where ‘i’ is the size of the class interval
Step 3: Find d^{2},
f*d and f*d^{2} for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A + [_{}/N]*i
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
Let us workout the above
example using this method
In the above example let us
assume mean (A) to be 43. Note i = size of class
interval = 5.
By following steps 1 to 3
we have:
Marks 
Frequency(f) 
Midpoint(x) 
d=(XA)/i 
f*d 
d^{2} 
f*d^{2} 
2530 
5 
28 
3 
15 
9 
45 
3035 
10 
33 
2 
20 
4 
40 
3540 
25 
38 
1 
25 
1 
25 
4045 
8 
43 
0 
0 
0 
0 
4550 
2 
48 
1 
2 
1 
2 

N =_{} = 50 


_{}=  58 

_{}(f*d^{2})=112 
We have
AM = Arithmetic mean== A+ [_{}/N]*i = 43 + [(58/50)*5] = 43 +
(1.16)*5 = 435.8 = 37.2
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
= _{} [(112/50) {58/50}^{ 2}]*5
= _{} [2.24  {1.16}^{ 2}]*5
= _{} [2.24 – 1.3456]*5
= _{} [0.8944]*5
=.9457*5
=4.728
Interpretation:
The average marks scored is 37.2. The marks of students deviate from the Mean
(average) score by about 5 marks.
Very often we use the word
consistency in comparing performances of individuals, teams, etc. How do we
convert this adjective statistically?
We use the term ‘Co efficient of variation’ to measure the
consistency. It is a relative measure of dispersion. It is calculated as
CV = SD*100/AM.
Thus CV is independent of
units and is expressed as %. Lower the percentage more is the consistency (If
SD is a small figure when compared AM obviously the
variation is less)
In the above Example CV =
(4.728*100)/37.2 =12.68
5.5 Example 4:
The runs scored by 2 batsmen A and B in six innings
are as follows.
Batsman
A 
48 
50 
54 
46 
48 
54 
Batsman
B 
46 
44 
43 
46 
45 
46 
Determine who is a better scorer ?. Who is more consistent?
Working:
To know the consistency of
these two batsmen we need to find CV.
We have arrived at
following values for AM and SD for Batsman A in the Example (5.1) (worked out earlier)..
AM = 50
SD = 3.05
_{}CV =SD*100/AM = 3.05*100/50 =6.1%
Let us calculate these
figures for Batsman B
Table of Calculation: (with
actual AM) AM = 270/6 = 45
No 
Runs(X) 
Deviation (D) d= X 
(Deviation)^{2}
= d^{2} 
1 
46 
1 
1 
2 
44 
1 
1 
3 
43 
2 
4 
4 
46 
1 
1 
5 
45 
0 
0 
6 
46 
1 
1 

_{}=270 
_{}=0 
_{}=8 
SD = _{} = _{} = _{} (_{}/N)= _{} (8/6) =_{} (1.33) = 1.15
_{}CV =SD*100/AM = 1.15*100/45 =2.55%
Conclusion:
1. Since A’s AM is more
than that of B (50>45), we conclude that A is a
better scorer.
2. Since B’s CV is less
than A’s (1.15<6.1), we conclude that B is more
consistent.
5.5 Example 5: Marks obtained in a test by X
standard students of 2 sections A and B are given below:
Marks 
No of students in Section A 
No of students in Section B 
2530 
5 
5 
3035 
10 
12 
3540 
25 
20 
4045 
8 
8 
4550 
2 
5 
Which section’s performance
is better and which sections performance is more variable (not consistent)?
We need to find AM and CV
to answer these questions.
We have arrived at
following values of AM and SD for section A’s marks in example 5.3. (Worked
out earlier).
AM =37.2 and SD =4.728
_{}CV =SD*100/AM = 4.728*100/37.2 =12.7%
Now let us arrive at AM and
SD for Section B using StepDeviation Method (A is assumed).
Step 1: Let us chose
assumed mean A =38 (we can assume A=28,33,43,48 also)
Step 2: Find the
stepdeviation (d) from the assumed mean d=(XA)/i:
Where ‘i’ is the size of class interval = 5.
Step 3: Find d^{2},
f*d and f*d^{2} for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A+ [_{}/N]*i
SD (_{}) = _{} [Sum (fd^{2})/N
{Sum (fd)/N}^{ 2}]*i:
Marks 
Frequency(f) 
Midpoint(x) 
d=(XA)/i 
fd 
d^{2} 
fd^{2} 
2530 
5 
28 
2 
10 
4 
20 
3035 
12 
33 
1 
12 
1 
12 
3540 
20 
38 
0 
0 
0 
0 
4045 
8 
43 
1 
8 
1 
8 
4550 
5 
48 
2 
10 
4 
20 

N =_{} = 50 


_{}=  4 

Sum(fD^{2})=60 
We have
AM = Arithmetic mean== A+ [(_{})/N]*i = 38 +[(4/50)*5] =
38+ 0.08*5 = 380.4 = 37.6
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
= _{} [(60/50) {4/50}^{ 2}]*5
= _{} [1.2  {0.08}^{ 2}]*5
= _{} [1.2 – 0.0064]*5
= _{} [1.1936]*5
=1.0925*5 =5.4625
_{}CV = SD*100/AM = 5.4625*100/37.6 =14.52%
Conclusion:
1. Since Section B’s AM is
more than that of section A (37.6>37.2), we conclude that B’s performance is
a better than A.
2. Since B’s CV is more
than A’s (14.52>12.7), we conclude that Section B’s performance is less
consistent (more variable) than Section A’s.
5.5 Example 6: In 2 factories A and B, located in
the same industrial area, the average weekly wages in Rupees and SD are
Factory 
Average
wage in Rs. 
SD of
wage in Rs. 
A 
34.5 
6.21 
B 
28.5 
4.56 
Determine which Factory has
greater variability.
Workings:
We need to find CV
CV of Factory A =
SD*100/AM= 6.21*100/34.5=18%
CV of Factory B =
SD*100/AM= 4.56*100/28.5=16%
Conclusion: Since Factory A’s CV > Factory B’s
(18>16), A has more variability in wages. (Note:
Though Factory A pays more salary to its employees, it has large difference in
wages between its employees)
5.5
Summary of learning
X = Set of scores
=The Arithmetic mean (AM)
d = Deviation from the arithmetic
mean
f = frequency of score
i
= size of class interval
x= Midpoint of class interval^{X}
No 
Cases 
Options 
N= 
AM= 
Deviation (d) 
SD(_{}) 
1 
Individual Scores 

Number of scores 
=(_{})/N 
X 

A=Any score 
Number of scores 
= A+ (_{})/N 
XA 


2 
Scores with frequency 

_{} 
=_{}/N 
X 

A=Any score 
_{} 
= A + _{}/N 
XA 


3 
Class interval with frequency 

_{} 
= _{}/N 
X 

A = Any mid point 
_{} 
= A+ [_{}/N]*i 
d=(XA)/i 

Hint: For Standard Deviation always
remember the common formula:
Depending on the options,
you can substitute f=1 and i=1 to get the correct
formula as per the above table.
Also note also that _{}=0 when any value is not
chosen as an assumed average.
Additional Points:
Combined Standard deviation of two groups:
If the means and standard
deviations of two series are known, then the mean and the standard deviation of
the combined series can be calculated without considering the actual values of
the data in the series.
Let the means and standard
deviations of two series containing n_{1 }and n_{2} values be X_{1} and X_{2 }and SD_{1} and SD_{2
}respectively.
Then:
1. The
combined mean = = (n_{1} X_{1}+ n_{2}
X_{2})/( n_{1 }+ n_{2})
2. The
combined S.D. = _{}{(n_{1} SD_{1}^{2 }+ n_{2 }SD_{2}^{2}
+ n_{1 }d_{1}^{2 }+ n_{2 }d_{2}^{2})/(
n_{1 }+ n_{2})} where d_{1 }= X_{1} and d_{2
}= X_{2}
5.5 Example 7: The first of the two samples has
100 items with mean 15 and standard deviation 3. If the combined group has 250
items with mean 15.6 and standard deviation_{}. Find the mean and the standard deviation of the second
group.
Solution:
Here n_{1 }= 100, n_{1}+n_{2
}= 250, X_{1} = 15, SD =_{}, =15.6.
We need to find X_{2} and SD_{2}.
Note that n_{2}=150.
But
= (n_{1} X_{1}+ n_{2}
X_{2})/( n_{1 }+ n_{2})
_{}15.6 = (100*15+150* X_{2})/250
i.e. 150* X_{2}=
{(15.6*250) – (100*15)} = 39001500 = 2400
_{} X_{2 }=
2400/150 = 16
d_{1}= X_{1} =
1515.6 = 0.6, d_{2}= X_{2} =
1615.6 = 0.4
S.D = _{} {(n_{1} SD_{1}^{2
}+ n_{2 }SD_{2}^{2} + n_{1}d_{1}^{2
}+ n_{2}d_{2}^{2})/( n_{1
}+ n_{2})}
_{} =_{} {(100*9 +150*SD_{2}^{2}+100*0.36+150*0.16)/250}
_{}13.44 = (900+150SD_{2}^{2}+36+24)/250
i.e. 150SD_{2}^{2}=
3360960 = 2400
_{} SD_{2}^{2}=
2400/150 = 16
_{} SD_{2} = 4
Thus the mean of the second
group (X_{2}) is 16 and the standard deviation of the second group (SD_{2})
is 4.