Sample Size Requirements and Power Calculations for Means

These functions estimate the minimum sample size required to (i) satisfy specific precision constraints in the estimation of means and to (ii) attain specified levels of significance and power in a statistical test that compares two means. The inverse problems of finding, given a specified sample size, (iii) the expected precision of the estimator of the mean and (iv) the expected power or (v) minimum detectable effect for the test that compares two means are also addressed.

n.mean(prec, prec.ind = c("ME", "RME", "SE", "CV"), sigmaY, muY = NULL,
       DEFF = 1, RR = 1,
       F = 1, hhSize = 1, AVEhh = F * hhSize,
       old.clus.size = NULL, new.clus.size = NULL,
       N = NULL, alpha = 0.05, verbose = TRUE)

prec.mean(n, prec.ind = c("ME", "RME", "SE", "CV"), sigmaY, muY = NULL,
          DEFF = 1, RR = 1,
          F = 1, hhSize = 1, AVEhh = F * hhSize,
          old.clus.size = NULL, new.clus.size = NULL,
          N = NULL, alpha = 0.05, verbose = TRUE)

n.comp2mean(sigmaY, MDE, K1 = 1/2,
            alpha = 0.05, beta = 0.2, sides = c("two-tailed", "one-tailed"),
            DEFF = 1, RR = 1,
            F = 1, hhSize = 1, AVEhh = F * hhSize,
            old.clus.size = NULL, new.clus.size = NULL,
            verbose = TRUE)

pow.comp2mean(n, sigmaY, MDE, K1 = 1/2,
              alpha = 0.05, sides = c("two-tailed", "one-tailed"),
              DEFF = 1, RR = 1,
              F = 1, hhSize = 1, AVEhh = F * hhSize,
              old.clus.size = NULL, new.clus.size = NULL,
              verbose = TRUE)

mde.comp2mean(n, sigmaY, K1 = 1/2,
              alpha = 0.05, beta = 0.2, sides = c("two-tailed", "one-tailed"),
              DEFF = 1, RR = 1,
              F = 1, hhSize = 1, AVEhh = F * hhSize,
              old.clus.size = NULL, new.clus.size = NULL,
              verbose = TRUE)

Arguments

prec	The precision level you want to attain in estimation.
prec.ind	The precision indicator for which the specified precison level `prec` has to be attained. The following uncertainty measures can be used as precision indicators: Margin of Error (`'ME'`, the default), Relative Margin of Error (`'RME'`), Standard Error (`'SE'`), and Coefficient of Variation (`'CV'`, also known as Relative Standard Error).
sigmaY	Anticipated estimate of the standard deviation of interest variable Y for your target (sub)population.
muY	Anticipated estimate of the mean of interest variable Y for your target (sub)population.
DEFF	Anticipated estimate of the design effect of the estimator of the mean.
RR	Anticipated estimate of the response rate.
F	Anticipated estimate of the proportion of your target subpopulation in the general population that will be sampled.
hhSize	Anticipated estimate of the average household size.
AVEhh	Anticipated estimate of the average number of individuals belonging to the target subpopulation per houshold.
old.clus.size	Average number of households sampled per PSU in the survey you have used to compute the input value for the `DEFF` argument.
new.clus.size	Average number of households to be sampled per PSU in the survey you are planning.
N	The size of your target (sub)population. Only needed if you don't want to neglect finite population corrections (which is the default behaviour if `N = NULL`).
alpha	Significance level used to build confidence intervals (with confidence level equal to `1 - alpha`) or probability of Type I error (False Positive) for hypothesis testing.
verbose	Print on screen some description or just return the results?
n	The sample size for which the expected precision or power or MDE must be calculated (see ‘Details’).
MDE	Minimum Detectable Effect for the difference between means.
K1	Ratio between the sample size of group 1 and the total sample size of groups 1 and 2 (defaults to `1/2`, i.e. equal sized groups).
beta	Probability of Type II error (False Negative) for hypothesis testing (so that `Power = 1 - beta`).
sides	Do you want a `'two-tailed'` test (the default) or a `'one-tailed'` test?

Details

These functions are intended as simple everyday tools for basic sampling design decisions and survey planning. They address sample size requirements and power calculations for means in surveys (typically household surveys) adopting one- or two-stage sampling designs. The means acting as estimation targets can be defined either in terms of individuals or in terms of households. Specific arguments (such as hhSize, AVEhh, old.clus.size, and new.clus.size) can be tweaked to cover different designs (one or two stages) and target populations (individual-level or household-level means), see the ‘Examples’ section.

The formulas needed to guesstimate the minimum sample size required to satisfy specific precision constraints in the estimation of a mean (function n.mean) or the expected precision of the estimator of the mean given a specified sample size (function prec.mean) are available in many textbooks and spreadsheet templates (see, e.g., [Rosner 2006], [Lance, Hattori 2016], [ILO 2014a], [ILO 2014b]).

The formula needed to guesstimate the minimum sample size required to attain specified levels of significance and power in a statistical test that compares two means (function n.comp2mean) can be derived from equation 8.27 on page 302 of [Rosner 2006], by extending it to real-world complex sampling designs (see, e.g., arguments RR and DEFF). Functions pow.comp2mean and mde.comp2mean solve the resulting equation for Power (= 1 - beta) and MDE given a specified total sample size, n.

NOTE: For both pow.comp2mean and mde.comp2mean, argument n must be the total sample size, namely the sum of the sample sizes of the two groups (group 1 and group 2) whose estimated means need to be compared. The way the total sample size is allocated to the two groups is controlled by argument K1.

NOTE: Despite most of the arguments to the above five functions come with default values, those defaults must not be considered as suggested values: in general, users will need to specify actual values that suit their specific design needs. For instance, the default value muY = NULL is not valid when relative measures of precision are used ('RME' and 'CV').

NOTE: Arguments F, hhSize, and AVEhh are necessarily related to each other, as it is illustrated by the expression of the default of AVEhh, thus it is not required to specify all of them. If all are explicitly passed, the function will check that they comply with the identity AVEhh = F * hhSize.

NOTE: Most of the arguments to the above five functions are vectorized, meaning that users can pass numeric vectors to them to investigate multiple design scenarios in a single shot (e.g. via MDE = c(750, 1000, 1250)). When vectors of length greater than 1 are passed to multiple arguments, R recycling rule will kick in.

NOTE: Input values for sample size, n, and population size (if any), N, should be positive integers. The functions will silently make them so, by using their round(abs( )) values. When both n and N are specified, an error will be raised if n > N.

NOTE: Since, as shown in the ‘Examples’ section, the functions can be tweaked to cover different designs (one or two stages) and target populations (individual-level or household-level means), users who do not want to neglect finite population corrections must specify the population size N in terms of the analytical units that are relevant to their analysis.

NOTE: Both functions n.mean and n.comp2mean return the ceiling of the analytically calculated minimum sample sizes (which are not integers in general). This is conservative and will result in slightly better sampling performances than the nominally requested ones (e.g. higher precision or power, and smaller confidence intervals or MDEs).

Value

For n.mean, prec.mean, pow.comp2mean, and mde.comp2mean, a numeric vector whose length depends on the length of the numeric inputs passed to the function (under R recycling rule).

For n.comp2mean, a list of length 3 with names n1, n2, and n, providing the required sample sizes for group 1, group 2, and the overall sample. Each of the elements of the list is a numeric vector whose length depends on the length of the numeric inputs passed to the function (under R recycling rule).

References

ILO (2014a) ILO-IPEC TOOLS. URL: https://www.ilo.org/resource/training-material/interactive-tools-sampling-household-based-child-labour-surveys.

ILO (2014b) ILO-IPEC TOOLS USER GUIDE. URL: https://www.ilo.org/sites/default/files/2024-09/Child_labour_Sampling_Tools_Tool_01_Sample_Size_and_Margin_of_Error_Guidelines_2014.pdf.

Rosner, B. (2006) Fundamentals of biostatistics (7th ed.). Boston, MA: Brooks/Cole.

Lance, P., Hattori, A. (2016) Sampling and evaluation. Chapel Hill, North Carolina: MEASURE Evaluation, University of North Carolina (2016).

Examples

########################
# Reproducible example #
########################
# Solve the problem used as illustrating example in Diagram 1b on page 12 of the ILO-IPEC
# TOOLS USER GUIDE (which can be freely downloaded from ILO-IPEC TOOLS USER GUIDE URL in
# the References section above).
# This is the only example provided by the document for the estimate of a *mean* (which
# the document, somewhat oddly, refers to as an <<indicator of amount>>):

n.mean(prec= 20, prec.ind= "ME", sigmaY= 250, muY= 400, DEFF= 4.0, AVEhh= 0.75, RR= 0.9)
#> # Precision constraint:
#>   ME = 20
#>   alpha = 0.05
#> # Anticipated estimates:
#>   muY = 400
#>   sigmaY = 250
#>   DEFF = 4
#>   RR = 0.9
#>   F =
#>   hhSize =
#>   AVEhh = 0.75
#> # -> Required sample size:
#>   n = 3557 
#> 

# NOTE: We get 3557. The result of the ILO-IPEC template is a bit larger: 3704. This is
#       simply because the ILO-IPEC template approximates the 1 - 0.05/2 quantile of the
#       standard normal distribution with 2. Indeed, one can see that 3704 is exactly
#       equal to:
round( 3557 * (2 / 1.96)^2 )
#> [1] 3704


##################################################
# Impact of Finite Population Corrections (fpc). #
##################################################
# The sample size above was obtained by neglecting finite population corrections. Now let's
# assume we know the population size is in the ballpark of 100 thousand households. If we
# correctly factor in the fpc terms, then the required sample size is reduced as follows:
n.mean(prec= 20, prec.ind= "ME", sigmaY= 250, muY= 400, DEFF= 4.0, AVEhh= 0.75, RR= 0.9,
       N= 100000)
#> # Precision constraint:
#>   ME = 20
#>   alpha = 0.05
#> # Anticipated estimates:
#>   muY = 400
#>   sigmaY = 250
#>   DEFF = 4
#>   RR = 0.9
#>   F =
#>   hhSize =
#>   AVEhh = 0.75
#>   N = 1e+05
#> # -> Required sample size:
#>   n = 3435 
#> 



######################################################
# Power calculations for the comparison of two means #
######################################################
# A new national assistance policy aims to increase by 7% the average monthly household
# consumption, which was estimated to be equal to 362000 (in the country currency) by an
# household survey conducted two years before. The DEFF of that estimate was reported to
# be 3.5, and the same survey estimated the population standard deviation of the monthly
# household consumption to be 461000. To evaluate the impact of the intervention, two
# household surveys with the same sample sizes will be conducted before (T0) and after
# (T1) the implementation of the policy. Assuming that (i) the average monthly household
# consumption at T0 will still be close to 362000 and (ii) the new policy will be the only
# driver of changes, success of the policy would require a value of at least 387340 at T1
# (387340 = 362000 * 1.07), namely an increase of 25340. However,the government would like
# the study to be able to detect a possibly *smaller* consumption increase of 20000.
# Moreover, the government is confident the new policy cannot result in harming household
# consumption (so a one-tailed test is deemed appropriate). Assuming 20 households per EA
# will be sampled as it was the case for the survey conducted two years before, and
# anticipating a response rate of 0.9, and how many households must be selected to achieve
# 80% power for the detection of the sought-after consumption increase at significance
# level 5%?

n.comp2mean(sigmaY = 362000, MDE = 20000, sides = "one-tailed", DEFF = 3.5, RR = 0.9,
            old.clus.size= 20, new.clus.size= 20)
#> # Minimum Detectable Effect:
#>   MDE = 20000
#> # Significance:
#>   alpha = 0.05
#> # Power:
#>   1 - beta = 0.8
#> # Anticipated estimates:
#>   sigmaY = 362000
#>   DEFF = 3.5
#>   RR = 0.9
#>   F = 1
#>   hhSize = 1
#>   AVEhh = 1
#> # Design parameters:
#>   K1 = 0.5
#>   old.clus.size = 20
#>   ROH = 0.1315789
#>   new.clus.size = 20
#>   new.DEFF = 3.5
#> # -> Required sample size:
#>   n1 = 15760
#>   n2 = 15760
#>   n = 31520 
#> 

# NOTE: The required sample size is 31520 households, 15760 at baseline (T0) and 15760 at 
#       endline (T1). The EAs to be visited are 15760 / 20 = 788 per round.

# What would be the implications of using designs with 15 or 12 households per EA, instead
# of 20?

n.comp2mean(sigmaY = 362000, MDE = 20000, sides = "one-tailed", DEFF = 3.5, RR = 0.9,
            old.clus.size= 20, new.clus.size= c(15, 12))
#> # Minimum Detectable Effect:
#>   MDE = 20000
#> # Significance:
#>   alpha = 0.05
#> # Power:
#>   1 - beta = 0.8
#> # Anticipated estimates:
#>   sigmaY = 362000
#>   DEFF = 3.5
#>   RR = 0.9
#>   F = 1
#>   hhSize = 1
#>   AVEhh = 1
#> # Design parameters:
#>   K1 = 0.5
#>   old.clus.size = 20
#>   ROH = 0.1315789
#>   new.clus.size = 15 12
#>   new.DEFF = 2.842105 2.447368
#> # -> Required sample size:
#>   n1 = 12795 11016
#>   n2 = 12795 11016
#>   n = 25590 22032 
#> 

# NOTE: When going from 20 to 15 or 12 households per EA, the required sample size is
#       expected to decrease to 25590 households (12795 per round) and 22032 households
#       (11016 per round), respectively. Conversely, the EAs to be visited are expected to
#       increase to 12795 / 15 = 853 and 11016 / 12 = 918 per round,respectively.

# How much sample size could be saved if the government resorts to detecting as significant
# a larger consumption increase of 25000, which is much closer to the actually intended
# goal of the policy?

n.comp2mean(sigmaY = 362000, MDE = 25000, sides = "one-tailed", DEFF = 3.5, RR = 0.9,
            old.clus.size= 20, new.clus.size= 20)
#> # Minimum Detectable Effect:
#>   MDE = 25000
#> # Significance:
#>   alpha = 0.05
#> # Power:
#>   1 - beta = 0.8
#> # Anticipated estimates:
#>   sigmaY = 362000
#>   DEFF = 3.5
#>   RR = 0.9
#>   F = 1
#>   hhSize = 1
#>   AVEhh = 1
#> # Design parameters:
#>   K1 = 0.5
#>   old.clus.size = 20
#>   ROH = 0.1315789
#>   new.clus.size = 20
#>   new.DEFF = 3.5
#> # -> Required sample size:
#>   n1 = 10100
#>   n2 = 10100
#>   n = 20200 
#> 

# NOTE: The required sample size would become 20200 households, 10100 per round. Therefore,
#       31520 - 20200 = 11320 households would be saved, 5660 per round.

# The above reduced sample size would, of course, imply a lower statistical power (i.e.
# smaller than the ideally desired 80%) for the detection of an increase of 20000 that
# might actually be induced by the policy. What would actually be the expected power for
# the detection of an effect of 20000 for n = 20200?

pow.comp2mean(n = 20200, sigmaY = 362000, MDE = 20000, sides = "one-tailed", DEFF = 3.5,
              RR = 0.9, old.clus.size= 20, new.clus.size= 20)
#> # Sample size(s):
#>   n = 20200
#>   n1 = 10100
#>   n2 = 10100
#>   K1 = 0.5
#> # Minimum Detectable Effect:
#>   MDE = 20000
#> # Significance:
#>   alpha = 0.05
#> # Anticipated estimates:
#>   sigmaY = 362000
#>   DEFF = 3.5
#>   RR = 0.9
#>   F = 1
#>   hhSize = 1
#>   AVEhh = 1
#> # Design parameters:
#>   old.clus.size = 20
#>   ROH = 0.1315789
#>   new.clus.size = 20
#>   new.DEFF = 3.5
#> # -> Expected Power:
#>   1 - beta = 0.635 
#> 

# NOTE: The power would be reduced to about 63.5%.

Sample Size Requirements and Power Calculations for Means

Arguments

Details

Value

References

See also

Examples

Contents

Author