Estimation of the Population Standard Deviation of a Variable

Computes estimates and sampling errors of the Population Standard Deviation of a numeric variable (in subpopulations too).

svySigma(design, y, by = NULL,
         fin.pop = TRUE,
         vartype = c("se", "cv", "cvpct", "var"),
         conf.int = FALSE, conf.lev = 0.95, deff = FALSE,
         na.rm = FALSE)

# S3 method for svySigma
coef(object, ...)
# S3 method for svySigma
SE(object, ...)
# S3 method for svySigma
VAR(object, ...)
# S3 method for svySigma
cv(object, ...)
# S3 method for svySigma
confint(object, ...)

Arguments

design	Object of class `analytic` (or inheriting from it) containing survey data and sampling design metadata.
y	Formula identifying the numeric interest variable.
by	Formula specifying the variables that define the "estimation domains". If `NULL` (the default option) estimates refer to the whole population.
fin.pop	If `TRUE` (the default) the estimation target is the finite population formula of the standard deviation, i.e. the one with N - 1 at denominator in the expression of the variance. If `FALSE` the estimation target is the standard deviation with N at denominator in the expression of the variance.
vartype	`character` vector specifying the desired variability estimators. It is possible to choose one or more of: standard error (`'se'`, the default), coefficient of variation (`'cv'`), percent coefficient of variation (`'cvpct'`), or variance (`'var'`).
conf.int	Compute confidence intervals for the estimates? The default is `FALSE`.
conf.lev	Probability specifying the desired confidence level: the default value is `0.95`.
deff	Should the design effect be computed? The default is `FALSE` (see ‘Details’).
na.rm	Should missing values (if any) be removed from the variable of interest? The default is `FALSE` (see ‘Details’).
object	An object of class `svySigma`.
...	Additional arguments to `coef`, ..., `confint` methods (if any).

Details

Function svySigma computes estimates and sampling errors of the Population Standard Deviation of a numeric variable. These estimates play an important role in many contexts, including sample size guesstimation and power calculations.

As the Population Standard Deviation is a complex estimator, svySigma automatically linearizes it to estimate its sampling variance. Automatic linearization is performed as function svystatL would do, along the lines illustrated in [Zardetto, 15]. This, of course, also entails the usage of the residuals technique when the input design object is calibrated (i.e. of class cal.analytic).

The mandatory argument y identifies the variable of interest. The design variable referenced by y must be numeric.

If variable y is binary (i.e. has only values 0 and 1), the estimated Population Standard Deviation coincides with the classical Bernoulli expression sqrt(p*(1 - p)), where p is the estimated proportion of population units with y = 1 (see ‘Examples’).

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by svySigma refer to the whole population. If specified, estimation domains must be defined by a formula, following the usual syntactic and semantic rules (see e.g. svystatTM).

Argument fin.pop allows the users to select which standard deviation formula they prefer as estimation target. If fin.pop = TRUE (the default) the finite population version of the standard deviation formula will be used, namely the one with N - 1 at denominator in the expression of the variance [Sarndal, Swensson, Wretman 92]. If fin.pop = FALSE the standard deviation formula with N at denominator in the expression of the variance will be used.

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95.

The optional argument deff allows to request the design effect [Kish 1995] for the estimates. By default deff=FALSE, that is the design effect is not provided. The design effect of an estimator is defined as the ratio between the sampling variance of the estimator under the actual sampling design and the sampling variance that would be obtained for an 'equivalent' estimator under a hypothetical simple random sampling without replacement of the same size. To obtain an estimate of the design effect comparing to simple random sampling “with replacement”, one must use deff="replace". See svystatTM for further details.

Missing values (NA) in interest variables should be avoided. If na.rm=FALSE (the default) they generate NAs in estimates (or even an error, if design is calibrated). If na.rm=TRUE, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.

Value

An object inheriting from the data.frame class, whose detailed structure depends on input parameters' values.

References

Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.

Kish, L. (1995). “Methods for design effects”. Journal of Official Statistics, Vol. 11, pp. 55-77.

Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi:10.1515/jos-2015-0013 .

Examples

## Load sbs data and create a design object:
data(sbs)
sbsdes <- e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,
          fpc=~fpc)

# Estimation of the population standard deviation of value added (variable
# 'va.imp2'):
svySigma(sbsdes, ~va.imp2, vartype = "cvpct", conf.int = TRUE, deff = TRUE)
#>            Sigma CI.l(95%) CI.u(95%)      CV%      DEff
#> va.imp2 9408.726  9101.409  9716.042 1.666508 0.3351002

# Compare with the true value computed from the sampling frame ('sbs.frame'):
sqrt(var(sbs.frame$va.imp2))
#> [1] 9211.955

# The same as above, by classes of macro-class of economic activity ('nace.macro'):
svySigma(sbsdes, ~va.imp2, ~nace.macro, vartype = "cvpct", conf.int = TRUE)
#>              nace.macro Sigma.va.imp2 CI.l(95%) CI.u(95%)      CV%
#> Agriculture Agriculture      7212.586  6536.069  7889.102 4.785633
#> Industry       Industry      9327.768  9140.093  9515.442 1.026547
#> Commerce       Commerce     10947.441  9632.713 12262.168 6.127384
#> Services       Services      8429.758  8254.179  8605.336 1.062692

# Compare with the true value computed from the sampling frame ('sbs.frame'):
sqrt(tapply(sbs.frame$va.imp2, sbs.frame$nace.macro, var))
#> Agriculture    Industry    Commerce    Services 
#>    6982.681    9278.931   10252.990    8384.435 

## An example with a binary variable
# Load household data and create a design object:
data(data.examples)
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
     weights=~weight)

# Build the indicator variable of the 'widowed' marital status:
des<-des.addvars(des, is.widowed = as.numeric(marstat == "widowed"))

# Estimate and store the population proportion of widowed people:
svystatTM(des, ~is.widowed, estimator = "Mean")
#>                  Mean          SE
#> is.widowed 0.08090736 0.006290394
# which of course is equal to what one would get directly:
svystatTM(des, ~marstat, estimator = "Mean")
#>                        Mean          SE
#> marstatmarried   0.58075906 0.010294795
#> marstatunmarried 0.33833358 0.010238546
#> marstatwidowed   0.08090736 0.006290394

# Store only the estimated proportion
p.widowed <- coef(svystatTM(des, ~is.widowed, estimator = "Mean"))

# Now estimate the population variance of the binary variable 'is.widowed' *with
# fin.pop = FALSE*, and verify that it *exactly* equals the Bernoulli expression
# sqrt(p.widowed * (1 - p.widowed))
svySigma(des, ~is.widowed, fin.pop = FALSE, conf.int = TRUE)
#>                Sigma        SE CI.l(95%) CI.u(95%)
#> is.widowed 0.2726928 0.0096675 0.2537448 0.2916408
sqrt(p.widowed * (1 - p.widowed))
#> is.widowed 
#>  0.2726928 

# ...as it must be.

Estimation of the Population Standard Deviation of a Variable

Arguments

Details

Value

References

See also

Examples

Contents

Author