svySigma2.Rd
Computes estimates and sampling errors of the Population Variance of a numeric variable (in subpopulations too).
svySigma2(design, y, by = NULL, fin.pop = TRUE, vartype = c("se", "cv", "cvpct", "var"), conf.int = FALSE, conf.lev = 0.95, deff = FALSE, na.rm = FALSE) # S3 method for svySigma2 coef(object, ...) # S3 method for svySigma2 SE(object, ...) # S3 method for svySigma2 VAR(object, ...) # S3 method for svySigma2 cv(object, ...) # S3 method for svySigma2 confint(object, ...)
design | Object of class |
---|---|
y | Formula identifying the numeric interest variable. |
by | Formula specifying the variables that define the "estimation domains". If |
fin.pop | If |
vartype |
|
conf.int | Compute confidence intervals for the estimates? The default is
|
conf.lev | Probability specifying the desired confidence level: the default value is |
deff | Should the design effect be computed? The default is |
na.rm | Should missing values (if any) be removed from the variable of interest? The default is
|
object | An object of class |
... | Additional arguments to |
Function svySigma2
computes estimates and sampling errors of the Population Variance of a numeric variable. These estimates play an important role in many contexts, including sample size guesstimation and power calculations.
As the Population Variance is a complex estimator, svySigma2
automatically linearizes it to estimate its sampling variance. Automatic linearization is performed as function svystatL
would do, along the lines illustrated in [Zardetto, 15]. This, of course, also entails the usage of the residuals technique when the input design
object is calibrated (i.e. of class cal.analytic
).
The mandatory argument y
identifies the variable of interest. The design
variable referenced by y
must be numeric
.
If variable y
is binary (i.e. has only values 0
and 1
), the estimated Population Variance coincides with the classical Bernoulli expression p*(1 - p)
, where p
is the estimated proportion of population units with y = 1
(see ‘Examples’).
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by svySigma2
refer to the whole population. If specified, estimation domains must be defined by a formula, following the usual syntactic and semantic rules (see e.g. svystatTM
).
Argument fin.pop
allows the users to select which population variance formula they prefer as estimation target. If fin.pop = TRUE
(the default) the finite population version of the variance formula will be used, namely the one with the N - 1 denominator [Sarndal, Swensson, Wretman 92]. If fin.pop = FALSE
the population variance formula with the N denominator will be used.
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
.
The optional argument deff
allows to request the design effect [Kish 1995] for the estimates. By default deff=FALSE
, that is the design effect is not provided. The design effect of an estimator is defined as the ratio between the sampling variance of the estimator under the actual sampling design and the sampling variance that would be obtained for an 'equivalent' estimator under a hypothetical simple random sampling without replacement of the same size. To obtain an estimate of the design effect comparing to simple random sampling “with replacement”, one must use deff="replace"
. See svystatTM
for further details.
Missing values (NA
) in interest variables should be avoided. If na.rm=FALSE
(the default) they generate NAs in estimates (or even an error, if design
is calibrated). If na.rm=TRUE
, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.
An object inheriting from the data.frame
class, whose detailed structure depends on input parameters' values.
Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.
Kish, L. (1995). “Methods for design effects”. Journal of Official Statistics, Vol. 11, pp. 55-77.
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.
Function svySigma
to estimate the Population Standard Deviation of a numeric variable. Estimators of Complex Analytic Functions of Totals and/or Means svystatL
. Estimators of Totals and Means svystatTM
, Ratios between Totals svystatR
, Shares svystatS
, Ratios between Shares svystatSR
, Multiple Regression Coefficients svystatB
, Quantiles svystatQ
, and all of the above svystat
.
## Load sbs data and create a design object: data(sbs) sbsdes <- e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight, fpc=~fpc) # Estimation of the population variance of value added (variable 'va.imp2'): svySigma2(sbsdes, ~va.imp2, vartype = "cvpct", conf.int = TRUE, deff = TRUE)#> Sigma2 CI.l(95%) CI.u(95%) CV% DEff #> va.imp2 88524116 82741195 94307036 3.333017 0.3351002# Compare with the true value computed from the sampling frame ('sbs.frame'): var(sbs.frame$va.imp2)#> [1] 84860120# The same as above, by classes of macro-class of economic activity ('nace.macro'): svySigma2(sbsdes, ~va.imp2, ~nace.macro, vartype = "cvpct", conf.int = TRUE)#> nace.macro Sigma2.va.imp2 CI.l(95%) CI.u(95%) CV% #> Agriculture Agriculture 52021393 42262524 61780262 9.571267 #> Industry Industry 87007247 83506084 90508411 2.053094 #> Commerce Commerce 119846457 91060651 148632263 12.254768 #> Services Services 71060814 68100650 74020978 2.125384# Compare with the true value computed from the sampling frame ('sbs.frame'): tapply(sbs.frame$va.imp2, sbs.frame$nace.macro, var)#> Agriculture Industry Commerce Services #> 48757828 86098554 105123801 70298756## An example with a binary variable # Load household data and create a design object: data(data.examples) des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Build the indicator variable of the 'widowed' marital status: des<-des.addvars(des, is.widowed = as.numeric(marstat == "widowed")) # Estimate and store the population proportion of widowed people: svystatTM(des, ~is.widowed, estimator = "Mean")#> Mean SE #> is.widowed 0.08090736 0.006290394# which of course is equal to what one would get directly: svystatTM(des, ~marstat, estimator = "Mean")#> Mean SE #> marstatmarried 0.58075906 0.010294795 #> marstatunmarried 0.33833358 0.010238546 #> marstatwidowed 0.08090736 0.006290394# Store only the estimated proportion p.widowed <- coef(svystatTM(des, ~is.widowed, estimator = "Mean")) # Now estimate the population variance of the binary variable 'is.widowed' *with # fin.pop = FALSE*, and verify that it *exactly* equals the Bernoulli expression # p.widowed * (1 - p.widowed) svySigma2(des, ~is.widowed, fin.pop = FALSE, conf.int = TRUE)#> Sigma2 SE CI.l(95%) CI.u(95%) #> is.widowed 0.07436136 0.005272515 0.06402742 0.0846953p.widowed * (1 - p.widowed)#> is.widowed #> 0.07436136# ...as it must be.