Thursday, August 13, 2015

How to determine the sample size in order to estimate a proportion in a survey research study

How to determine the sample size in order to estimate a proportion in a survey research study 

Researchers are normally confronted with the question, “How large a sample size do I need in the present research study?” The answer depends on a number of factors. The answer is different depending on whether the study is a survey designed to find out the proportion of a parameter, or is designed to find its sample mean. Thus, we consider these cases separately and first of all we consider the case of estimating a proportion.


Sample size to estimate a proportion

Supposing one is interested to determine the proportion of mathematics students who smoke in a city. Question asked is, “How large a sample size do I need?” To answer a question like this, the researcher need to decide how accurately (margin of error) does he need the answer and at what level of confidence does he intend to use the estimate. Also one need to know from some experience about what is the current estimate of the proportion of Mathematics students who smoke?


Possible decisions/ assumptions/ requirements for the above three factors might be:

A. We need a margin of error say, less than 2.5%. Typical surveys have margins of error ranging from less than 1% to something of the order of 5% — we can choose any margin of error we like but need to specify it.
B. 95% confidence intervals are typical but not in any way mandatory — we could do 90%, 99% or something else entirely. For this example, we assume 95%.
C. May be guided by past surveys or general knowledge of public opinion. Let’s suppose estimate is 30% approximately.

Calculation of sample size:

In general the formula is ME = z x [√[p’(1-p’)/n]

where
  • ME is the desired margin of error
  • z is the z-score, and it is 1.645 for a 90% confidence interval, 1.96 for a 95% confidence interval, 2.58 for a 99% confidence interval
  •  p’ is our prior well thought assumption of the correct value of p.
  • n is the sample size (to be found)

So in this case let us set ME equal to 0.025, z = 1.96 and p’= 0.3 (it could be 0.2 or any other figure), and the equation becomes
0.025 = 1.96 x √ [(0.3 × 0.7)/n] or (0.3 × 0.7)/ n = (0.025/1.96)2 = .0001627 which translates into n = (0.3 × 0.7)/ .0001627 = 1291.
So we would need a sample of around 1291, say 1300 mathematics students.

We could clearly try varying any of the elements of this. For example, may be the researcher would be satisfied with a 90% confidence interval, for which z = 1.645. In this case the equation becomes
0.025 = 1.645 √[(0.3 × 0.7)/n] from which we can find n = 909.
Thus, if we are willing to accept a lower confidence level, we can get away with a smaller sample size.

What happens when we do not have initial estimate of p. In this case, we normally assume p’= 0.5.
The reason is that the standard error formula [√[p’(1-p’)/n] is largest when p’ = 0.5, so this is a conservative assumption that allows for p’ being unknown a priori. If we repeat the calculation with p’ = 0.5 (but having z = 1.96), we get
0.025 = 1.96 x √[ (0.5 × 0.5)/n] which results in n = 1537.


Thus, the cost of p’ being unknown is an increase in the sample size, though p’ if were known and already quite close to 0.5, this would not be too important a feature.

No comments:

Post a Comment