5.5 Standard Deviation:
Introduction:
You must have read newspapers comparing performances
of two batsmen in cricket. What do they compare? They say one is more
consistent than the other and one is more stylish than the other.
Stylishness is a quality which cannot be compared
using runs scored by batsmen. However, they compare one being better than the
other and their consistency in batting, based on the runs scored in several
innings.
Let us study how Statistics can help us in this
regard.
Standard
deviation:
You must have heard people talking about deviations
(deviation in rules, deviation in works, deviation in results, etc)? Deviation
is always compared with respect to a standard.
Standard could be thought of as an average (also
called arithmetic mean).
5.5 Example 1: Let a batsman’s score in 6 innings be
48,50,54,46,48,54
Working:
Notations used:
X = Set of scores
(48,50,54,46,48,54)
N = Number of scores
(=6)
=The Arithmetic mean (AM) = (_{})/N
d = Deviation from the
arithmetic mean = X 
Step 1: Find the Arithmetic mean of his scores (AM)
= 50
=(48+50+54+46+48+54)/6
Step 2: Find d (= XAM) and d^{2 }for each
of the scores
Table of Calculation: (with actual AM)
No 
Runs(X) 
Deviation (d) = X 
(Deviation)^{2}
= d^{2} 
1 
48 
2 
4 
2 
50 
0 
0 
3 
54 
4 
16 
4 
46 
4 
16 
5 
48 
2 
4 
6 
54 
4 
16 

_{}=300 
_{}=0 
_{} = 56 
Step 3: Calculate Variance as =_{}/ N
Step 4: Calculate Standard deviation (SD) as = _{} = _{}
SD is denoted by Greek letter_{}.
In the above
Definition: ‘Standard deviation’ is the square root of the
arithmetic average of the squares of the deviations from the mean.
Interpretation: In this example we say that on an
average, the batsman’s scores deviate from the arithmetic mean (=50) by 3.05(_{}3 ).
It can be predicted that, more or
less this batsman is likely to score 4753 {(503)(50+3)} runs in future
matches.
Note: If the batsman’s score were to be
48,100,50,10,2,80, it would not have been possible to predict reasonably
accurately . Since the batsman was
consistent with scores around 50, it was possible to predict.
General
Procedure:
Let X = {x_{1,} x_{2 }, x_{3………..}
x_{n}}_{ }be the scores
N = Number of scores
= Arithmetic mean (AM) = (x_{1}+x_{2
}+ x_{3}+…… x_{n})/N= _{}/ N
Step 1: Calculate deviation from AM, d (=X) and d^{2 }for
each of the scores
Step 2: Calculate Variance = (_{})/ N
Step 3: Calculate Standard deviation (SD)
SD =_{} = _{}
Alternate
method of finding_{}, when AM is not a whole number.
In the above example the Arithmetic mean (=50)
happened to be an integer and our computations became easy. If arithmetic mean
contains decimals, finding d^{2 }will be tough and in such cases we
follow a different method. To start
with, we assume Arithmetic mean to be one of the scores itself. Then we
calculate d (= XA where A is the assumed AM) and d^{2} for each of the
scores. Then actual AM and SD are derived as follows:
Actual AM = Assumed AM + (_{})/N
SD (_{}) = _{} [(_{}d^{2})/N  ((_{}d)/N)^{2}]
Let us take the above example and find SD using
this alternate method.
Let us assume AM to be 54 (A = 54.) Here N = 6.
Table of Calculation (with assumed AM)
No 
Runs(X) 
Deviation(D) d= XA 
(Deviation)^{2} = d^{2} 
1 
48 
6 
36 
2 
50 
4 
16 
3 
54 
0 
0 
4 
46 
8 
64 
5 
48 
6 
36 
6 
54 
0 
0 


_{}= 24 
_{}= 152 
Actual AM = Assumed AM + (_{})/N= 54 + (24/6) = 544 = 50
SD(_{}) = _{} [(_{}d^{2})/N  ((_{}d)/N)^{2}]
= _{} [152/6 –(24/6)^{2}]
= _{} (25.3316) = _{} (9.33) =3.05
You will notice that both the methods give same SD
in all cases.
When same scores repeat many times in the data, we
follow a slightly different method as listing individual scores and calculating
SD becomes tedious.
Standard Deviation for grouped
data:
Let the scores and frequencies be
Scores(X) ŕ 
X_{1} 
X_{2} 
X_{3} 
……_{} 
X_{n} 
Frequency(f)
ŕ 
f_{1} 
f_{2} 
f_{3} 
…….._{} 
f_{n} 
N = Total number of frequencies = f_{1 }+ f_{2
}+ f_{3 }+…….. f_{n}= _{}
Step 1: Find f*x for each of the scores
Step 2: Find the Arithmetic mean = (_{})/N
Step 3: Find deviation for each of the score d =
(X)
Step 4: Find the variance of distribution = (_{}(f*d^{2}))/N
Step 5: Calculate SD(_{}) = _{} [(_{}(f*d^{2}))/N]
5.5 Example 2: Marks obtained in a test by 60
students are given below. Find AM and SD.
Scores(X) ŕ 
10_{} 
20_{} 
30_{} 
40_{} 
50_{} 
60 
Frequency(f)
ŕ 
8_{} 
12_{} 
20_{} 
10_{} 
7_{} 
3 
Workings:
N (Total number of frequencies) =_{} = 8+12+20+10+7+3=60
Score(X) 
Frequency(f) 
fX 
Deviation= (X) 
d^{2} 
f*d^{2} 
10 
8 
80 
20.83 
433.89 
3471.11 
20 
12 
240 
10.83 
117.29 
1407.47 
30 
20 
600 
.83 
0.69 
13.78 
40 
10 
400 
9.17 
84.09 
840.89 
50 
7 
350 
19.17 
367.49 
2572.42 
60 
3 
180 
29.17 
850.89 
2552.67 

N=_{}=60 
_{}= 1850 


_{}(f*d^{2})=10858.33 
Arithmetic Mean == (_{})/N= 1850/60 =30.83
Variance = (_{}f*d^{2})/N = 10858.33/60= 180.97
SD (_{}) = _{} [_{}(f*d^{2})/N] =_{} (180.97) =13.45
Interpretation: An average mark of students is
30.83. The marks of students deviate from the Mean score by about 13 marks.
In the above working you must have observed that AM
had decimals. Because of this reason d, d^{2} and f*d^{2} were
all decimals and calculations were difficult.
In such cases we use an alternate method which is
easier to work with.
Alternate
Method
Step 1: Assume
any of the score as Average (A)
Step 2: Find the deviation d, from the assumed
average for every score (d=XA).
Step 3: Find f*d, d^{2} ,f*d^{2}
for each of the scores.
Step 4: Arrive at AM and SD as given below.
Arithmetic Mean == A + _{}/N, where N =_{}
SD (_{})=_{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}
]
In the above example let us assume 30 to be the Average (A) and by following steps 1 to 3 we
get
Score(X) 
Frequency(f) 
Deviation(d) =XA 
f*d 
d^{2} 
f*d^{2} 
10 
8 
20 
160 
400 
3200 
20 
12 
10 
120 
100 
1200 
30 
20 
0 
0 
0 
0 
40 
10 
10 
100 
100 
1000 
50 
7 
20 
140 
400 
2800 
60 
3 
30 
90 
900 
2700 

N=_{}=60 

_{}=50 

_{}(f*d^{2})=10900 
We note that AM = A+ (_{})/ (N) = 30+50/60 = 30+0.83= 30.83
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]
= _{} [(10900/60) – (50/60)^{2}]
= _{} (181.67  0.69) =_{} (180.97) =13.45
The average mark of students is 30.83. The marks of
students deviate from the Mean score by about 13 marks.
Observe that we got same results in
both the methods.
We have seen earlier that many times data is
collected in class intervals and not as individual scores. In such cases we
need to calculate AM in a different ways and not as average of scores.
How to find SD and interpret results if we have
grouped data?
Step 1: Find the midpoints(x) for each of the
class interval.
Step 2: Find the product f*x for each of the class
interval.
Step3: Calculate the arithmetic mean == ( _{})/N, where N =_{} .
Step 4: Find the Deviation d from the arithmetic
mean () for each the
class intervals. (d=X)
Step 5: Find d^{2 }and f*d^{2} for
each of the class interval.
Step 6: Calculate SD using the formula SD (_{}) = _{} [_{}(f*d^{2})/N]
5.5 Example 3: Marks obtained in a test by
students are
Marks 
Frequency(f) 
Midpoint(x) 
f*x 
d=X) 
d^{2} 
f*d^{2} 
2530 
5 
28 
140 
9.2 
84.64 
423.2 
3035 
10 
33 
330 
4.2 
17.64 
176.4 
3540 
25 
38 
950 
0.8 
0.64 
16 
4045 
8 
43 
344 
5.8 
33.64 
269.12 
4550 
2 
48 
96 
10.8 
116.64 
233.28 

N =_{} = 50 

_{}=1860 


_{}(f*d^{2})=1118 
Working:
Arithmetic mean== _{}/N = 1860/50 = 37.2
SD (_{}) = _{} [_{}(f*d^{2})/N] = _{} (1118/50) = _{} (22.36) =4.728
Interpretation: The average marks
scored is 37.2. The marks of students deviate from the Mean (average) score by
about 5 marks.
In the above working you must have observed that AM
had decimals. Because of this reason d, d^{2} and f*d^{2} had
decimals and calculation was difficult. In such cases we use an alternate
method which is easier to work with.
Alternate
Method (Step – Deviation Method)
Step 1: Assume one of the middle values of the class interval
as the arithmetic mean (A).
Step 2: Find the ‘stepdeviation’ (d) from the
assumed mean d=(XA)/i: Where ‘i’ is the size of the class interval
Step 3: Find d^{2}, f*d and f*d^{2}
for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A + [_{}/N]*i
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
Let us workout the above example using this method
In the above example let us assume mean (A) to be
43. Note i = size of class interval = 5.
By following steps 1 to 3 we have:
Marks 
Frequency(f) 
Midpoint(x) 
d=(XA)/i 
f*d 
d^{2} 
f*d^{2} 
2530 
5 
28 
3 
15 
9 
45 
3035 
10 
33 
2 
20 
4 
40 
3540 
25 
38 
1 
25 
1 
25 
4045 
8 
43 
0 
0 
0 
0 
4550 
2 
48 
1 
2 
1 
2 

N =_{} = 50 


_{}=  58 

_{}(f*d^{2})=112 
We have
AM = Arithmetic mean== A+ [_{}/N]*i = 43 + [(58/50)*5] = 43 + (1.16)*5 = 435.8 = 37.2
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
= _{} [(112/50) {58/50}^{
2}]*5
= _{} [2.24  {1.16}^{ 2}]*5
= _{} [2.24 – 1.3456]*5
= _{} [0.8944]*5
=.9457*5
=4.728
Interpretation: The average marks
scored is 37.2. The marks of students deviate from the Mean (average) score by
about 5 marks.
Very often we use the word consistency in comparing
performances of individuals, teams, etc. How do we convert this adjective
statistically?
We use the term ‘Co
efficient of variation’ to measure the consistency. It is a relative
measure of dispersion. It is calculated as
CV = SD*100/AM.
Thus CV is independent of units and is expressed as
%. Lower the percentage more is the consistency (If SD is a small figure when
compared AM obviously the variation is less)
In the above Example CV = (4.728*100)/37.2 =12.68
5.5 Example 4: The runs scored by 2
batsmen A and B in six innings are as follows.
Batsman
A 
48 
50 
54 
46 
48 
54 
Batsman
B 
46 
44 
43 
46 
45 
46 
Determine who is a better scorer ?. Who is more
consistent?
Working:
To know the consistency of these two batsmen we
need to find CV.
We have arrived at following values for AM and SD for Batsman A in the Example (5.1) (worked out earlier)..
AM = 50
SD = 3.05
_{}CV =SD*100/AM = 3.05*100/50 =6.1%
Let us calculate these figures for Batsman B
Table of Calculation: (with actual AM) AM = 270/6 =
45
No 
Runs(X) 
Deviation (D) d= X 
(Deviation)^{2}
= d^{2} 
1 
46 
1 
1 
2 
44 
1 
1 
3 
43 
2 
4 
4 
46 
1 
1 
5 
45 
0 
0 
6 
46 
1 
1 

_{}=270 
_{}=0 
_{}=8 
SD = _{} = _{} = _{} (_{}/N)= _{} (8/6) =_{} (1.33) = 1.15
_{}CV =SD*100/AM = 1.15*100/45 =2.55%
Conclusion:
1. Since A’s AM is more than that of B (50>45),
we conclude that A is a better scorer.
2. Since B’s CV is less than A’s (1.15<6.1), we
conclude that B is more consistent.
5.5 Example 5: Marks obtained in a test by X
standard students of 2 sections A and B are given below:
Marks 
No of students in Section A 
No of students in Section B 
2530 
5 
5 
3035 
10 
12 
3540 
25 
20 
4045 
8 
8 
4550 
2 
5 
Which section’s performance is better and which
sections performance is more variable (not consistent)?
We need to find AM and CV to answer these
questions.
We have arrived at following values of AM and SD
for section A’s marks in example 5.3. (Worked out earlier).
AM =37.2 and SD =4.728
_{}CV =SD*100/AM = 4.728*100/37.2 =12.7%
Now let us arrive at AM and SD for Section B using
StepDeviation Method (A is assumed).
Step 1: Let us chose assumed mean A =38 (we can assume
A=28,33,43,48 also)
Step 2: Find the stepdeviation (d) from the
assumed mean d=(XA)/i: Where ‘i’ is the size of class interval = 5.
Step 3: Find d^{2}, f*d and f*d^{2}
for each of the class intervals
Step 4: Compute AM and SD as follows
AM = Arithmetic mean== A+ [_{}/N]*i
SD (_{}) = _{} [Sum (fd^{2})/N
{Sum (fd)/N}^{ 2}]*i:
Marks 
Frequency(f) 
Midpoint(x) 
d=(XA)/i 
fd 
d^{2} 
fd^{2} 
2530 
5 
28 
2 
10 
4 
20 
3035 
12 
33 
1 
12 
1 
12 
3540 
20 
38 
0 
0 
0 
0 
4045 
8 
43 
1 
8 
1 
8 
4550 
5 
48 
2 
10 
4 
20 

N =_{} = 50 


_{}=  4 

Sum(fD^{2})=60 
We have
AM = Arithmetic mean== A+ [(_{})/N]*i = 38
+[(4/50)*5] = 38+ 0.08*5 = 380.4 = 37.6
SD (_{}) = _{} [_{}(f*d^{2})/N  (_{}(f*d)/N)^{2}]*i
= _{} [(60/50) {4/50}^{
2}]*5
= _{} [1.2  {0.08}^{ 2}]*5
= _{} [1.2 – 0.0064]*5
= _{} [1.1936]*5
=1.0925*5
=5.4625
_{}CV = SD*100/AM = 5.4625*100/37.6 =14.52%
Conclusion:
1. Since Section B’s AM is more than that of
section A (37.6>37.2), we conclude that B’s performance is a better than A.
2. Since B’s CV is more than A’s (14.52>12.7),
we conclude that Section B’s performance is less consistent (more variable)
than Section A’s.
5.5 Example 6: In 2 factories A and B, located
in the same industrial area, the average weekly wages in Rupees and SD are
Factory 
Average
wage in Rs. 
SD
of wage in Rs. 
A 
34.5 
6.21 
B 
28.5 
4.56 
Determine which Factory has greater variability.
Workings:
We need to find CV
CV of Factory A = SD*100/AM= 6.21*100/34.5=18%
CV of Factory B = SD*100/AM= 4.56*100/28.5=16%
Conclusion:
Since Factory A’s CV > Factory B’s (18>16), A has more variability
in wages. (Note: Though Factory A pays more salary to its employees, it has
large difference in wages between its employees)
5.5 Summary of learning
X = Set of scores
=The Arithmetic mean (AM)
d = Deviation from the
arithmetic mean
f = frequency of score
i = size of class interval
x= Midpoint of class interval^{X}
No 
Cases 
Options 
N= 
AM= 
Deviation (d) 
SD(_{}) 
1 
Individual Scores 

Number of scores 
=(_{})/N 
X 

A=Any score 
Number of scores 
= A+ (_{})/N 
XA 


2 
Scores with frequency 

_{} 
=_{}/N 
X 

A=Any score 
_{} 
= A + _{}/N 
XA 


3 
Class interval with frequency 

_{} 
= _{}/N 
X 

A = Any mid point 
_{} 
= A+ [_{}/N]*i 
d=(XA)/i 

Hint: For Standard Deviation
always remember the common formula:
Depending on the options, you can substitute f=1
and i=1 to get the correct formula as per the above table.
Also note also that _{}, _{}=0 when any value is not
chosen as an assumed average.
Additional Points:
Combined
Standard deviation of two groups:
If the means and standard deviations of two series
are known, then the mean and the standard deviation of the combined series can
be calculated without considering the actual values of the data in the series.
Let the means and standard deviations of two series
containing n_{1 }and n_{2} values be X_{1} and X_{2 }and
SD_{1} and SD_{2 }respectively.
Then:
1.
The combined mean =
= (n_{1} X_{1}+ n_{2}
X_{2})/( n_{1 }+ n_{2})
2.
The combined S.D. = _{}{(n_{1} SD_{1}^{2 }+ n_{2 }SD_{2}^{2}
+ n_{1 }d_{1}^{2 }+ n_{2 }d_{2}^{2})/(
n_{1 }+ n_{2})} where d_{1 }= X_{1} and d_{2
}= X_{2}
5.5 Example 7: The first of the two samples has
100 items with mean 15 and standard deviation 3. If the combined group has 250
items with mean 15.6 and standard deviation_{}. Find the mean and the standard deviation of the second
group.
Solution:
Here n_{1 }= 100, n_{1}+n_{2 }=
250, X_{1} = 15, SD =_{}, =15.6. We
need to find X_{2} and SD_{2}.
Note that n_{2}=150. But
= (n_{1} X_{1}+ n_{2}
X_{2})/( n_{1 }+ n_{2})
_{}15.6 = (100*15+150* X_{2})/250
i.e. 150* X_{2}= {(15.6*250) – (100*15)} =
39001500 = 2400
_{} X_{2 }=
2400/150 = 16
d_{1}= X_{1} =
1515.6 = 0.6, d_{2}= X_{2} =
1615.6 = 0.4
S.D = _{} {(n_{1} SD_{1}^{2
}+ n_{2 }SD_{2}^{2} + n_{1}d_{1}^{2
}+ n_{2}d_{2}^{2})/( n_{1 }+ n_{2})}
_{} =_{} {(100*9 +150*SD_{2}^{2}+100*0.36+150*0.16)/250}
_{}13.44 = (900+150SD_{2}^{2}+36+24)/250
i.e. 150SD_{2}^{2}= 3360960 = 2400
_{} SD_{2}^{2}=
2400/150 = 16
_{} SD_{2} = 4
Thus the mean of the second group (X_{2}) is
16 and the standard deviation of the second group (SD_{2}) is 4.