How to determine the sample size in order to estimate a proportion in a survey research study
Researchers are normally confronted
with the question, “How large a sample size do I need in the present research
study?” The answer depends on a number of factors. The answer is different
depending on whether the study is a survey designed to find out the proportion
of a parameter, or is designed to find its sample mean. Thus, we consider these
cases separately and first of all we consider the case of estimating a
proportion.
Sample size to estimate a proportion
Supposing one is interested to
determine the proportion of mathematics students who smoke in a city. Question
asked is, “How large a sample size do I need?” To answer a question like this, the
researcher need to decide how accurately (margin of error) does he need the
answer and at what level of confidence does he intend to use the estimate. Also
one need to know from some experience about what is the current estimate of the
proportion of Mathematics students who smoke?
Possible decisions/ assumptions/ requirements for the above
three factors might be:
A. We need a margin of error say, less
than 2.5%. Typical surveys have margins of error ranging from less than 1% to
something of the order of 5% — we can choose any margin of error we like but need
to specify it.
B. 95% confidence intervals are typical
but not in any way mandatory — we could do 90%, 99% or something else entirely.
For this example, we assume 95%.
C. May be guided by past surveys or
general knowledge of public opinion. Let’s suppose estimate is 30%
approximately.
Calculation of sample size:
In general the formula is ME = z x
[√[p’(1-p’)/n]
where
- ME is the desired margin of error
- z is the z-score, and it is 1.645 for a 90% confidence interval, 1.96 for a 95% confidence interval, 2.58 for a 99% confidence interval
- p’ is our prior well thought assumption of the correct value of p.
- n is the sample size (to be found)
So in this case let us set ME equal to
0.025, z = 1.96 and p’= 0.3 (it could be 0.2 or any other figure), and the
equation becomes
0.025 = 1.96 x √ [(0.3 × 0.7)/n] or (0.3
× 0.7)/ n = (0.025/1.96)2 = .0001627 which translates into n = (0.3
× 0.7)/ .0001627 = 1291.
So we would need a sample of around
1291, say 1300 mathematics students.
We could clearly try varying any of the
elements of this. For example, may be the researcher would be satisfied with a
90% confidence interval, for which z = 1.645. In this case the equation becomes
0.025 = 1.645 √[(0.3 × 0.7)/n] from
which we can find n = 909.
Thus, if we are willing to accept a
lower confidence level, we can get away with a smaller sample size.
What happens when we do not have initial
estimate of p. In this case, we normally assume p’= 0.5.
The reason is that the standard error
formula [√[p’(1-p’)/n] is largest when p’ = 0.5, so this is a conservative
assumption that allows for p’ being unknown a priori. If we repeat the
calculation with p’ = 0.5 (but having z = 1.96), we get
0.025 = 1.96 x √[ (0.5 × 0.5)/n] which
results in n = 1537.
Thus, the cost of p’ being unknown is
an increase in the sample size, though p’ if were known and already quite close to 0.5, this would not be
too important a feature.
No comments:
Post a Comment