Research Methodology: August 2015

Sunday, August 23, 2015

How to determine the Sample size in order to estimate population mean in a survey research study

How to determine the Sample size in order to estimate population mean in a survey research study

Researchers are normally confronted with the question, “How large a sample size do I need in the present research study?” The answer depends on a number of factors. The answer is different depending on whether the study is a survey designed to find out the mean of a parameter, or is designed to find its sample proportion. Thus, we consider here the case of estimating a mean.

Sample size to estimate a mean

Supposing one is interested to determine the mean number of marks obtained by Economics students of a University in a examination. Question asked is, “How large a sample size do I need?” To answer a question like this, the researcher need to decide how accurately (margin of error) does he need the answer and at what level of confidence does he intend to use the estimate. Also one need to know from some experience about what is the current estimate of the mean of Economics students of that University?

Calculation of sample size:

We are designing a survey or an experiment to estimate a population mean. In this case, the formula is ME = t s /√ n, where

ü ME is the desired margin of error
ü t is the t-score that we use to calculate the confidence interval, that depends on both the degrees of freedom and the desired confidence level,
ü s is the standard deviation,
ü n is the sample size we want to find.

A. We need a margin of error say, less than 1 mark.

B. 95% confidence intervals are typical but not in any way mandatory — we could do 90%, 99% or something else entirely. For this example, we assume 95%. Here, the sample size affects t as well as n. However, when n ≥ 30, the value of t is quite close to the value of z that we would get if we ignore the distinction between the normal and t distributions, so often we do ignore that distinction and just use the z value, e.g. 1.96 for a 95% confidence interval and so on.

C. In this case we need to specify s. In practice, s will be the sample standard deviation, computed after the sample is taken. So we can’t possibly know that in advance. But s is typically a guess, based either on past experience or on rough estimates of what sort of variability we would expect.

Taking an Example. We would like to estimate the mean Economics marks in the University under consideration, with 95% confidence, to accuracy within 0.5 marks. In such situations, we have literally no idea what s would be. So in the absence of anything better, let’s use that as our guess for s as say 2. In this case the 95% confidence interval translates to a z or t of 1.96. Therefore, equation becomes 0.5 = 1.96 × 2 /√ n which solves to n = (1.96 × 2 / 0.5)² = 61.47 or 61 to the nearest whole number.

Thursday, August 13, 2015

How to determine the sample size in order to estimate a proportion in a survey research study

How to determine the sample size in order to estimate a proportion in a survey research study

Researchers are normally confronted with the question, “How large a sample size do I need in the present research study?” The answer depends on a number of factors. The answer is different depending on whether the study is a survey designed to find out the proportion of a parameter, or is designed to find its sample mean. Thus, we consider these cases separately and first of all we consider the case of estimating a proportion.

Sample size to estimate a proportion

Supposing one is interested to determine the proportion of mathematics students who smoke in a city. Question asked is, “How large a sample size do I need?” To answer a question like this, the researcher need to decide how accurately (margin of error) does he need the answer and at what level of confidence does he intend to use the estimate. Also one need to know from some experience about what is the current estimate of the proportion of Mathematics students who smoke?

Possible decisions/ assumptions/ requirements for the above three factors might be:

A. We need a margin of error say, less than 2.5%. Typical surveys have margins of error ranging from less than 1% to something of the order of 5% — we can choose any margin of error we like but need to specify it.

B. 95% confidence intervals are typical but not in any way mandatory — we could do 90%, 99% or something else entirely. For this example, we assume 95%.

C. May be guided by past surveys or general knowledge of public opinion. Let’s suppose estimate is 30% approximately.

Calculation of sample size:

In general the formula is ME = z x [√[p’(1-p’)/n]

where

ME is the desired margin of error
z is the z-score, and it is 1.645 for a 90% confidence interval, 1.96 for a 95% confidence interval, 2.58 for a 99% confidence interval
p’ is our prior well thought assumption of the correct value of p.
n is the sample size (to be found)

So in this case let us set ME equal to 0.025, z = 1.96 and p’= 0.3 (it could be 0.2 or any other figure), and the equation becomes

0.025 = 1.96 x √ [(0.3 × 0.7)/n] or (0.3 × 0.7)/ n = (0.025/1.96)²= .0001627 which translates into n = (0.3 × 0.7)/ .0001627 = 1291.

So we would need a sample of around 1291, say 1300 mathematics students.

We could clearly try varying any of the elements of this. For example, may be the researcher would be satisfied with a 90% confidence interval, for which z = 1.645. In this case the equation becomes

0.025 = 1.645 √[(0.3 × 0.7)/n] from which we can find n = 909.

Thus, if we are willing to accept a lower confidence level, we can get away with a smaller sample size.

What happens when we do not have initial estimate of p. In this case, we normally assume p’= 0.5.

The reason is that the standard error formula [√[p’(1-p’)/n] is largest when p’ = 0.5, so this is a conservative assumption that allows for p’ being unknown a priori. If we repeat the calculation with p’ = 0.5 (but having z = 1.96), we get

0.025 = 1.96 x √[ (0.5 × 0.5)/n] which results in n = 1537.

Thus, the cost of p’ being unknown is an increase in the sample size, though p’ if were known and already quite close to 0.5, this would not be too important a feature.