Calculates estimates, standard errors and confidence intervals for Shares of a numeric variable within subpopulations.

svystatS(design, y, classes, by = NULL,
         vartype = c("se", "cv", "cvpct", "var"),
         conf.int = FALSE, conf.lev = 0.95, deff = FALSE,
         na.rm = FALSE)

# S3 method for svystatS
coef(object, ...)
# S3 method for svystatS
SE(object, ...)
# S3 method for svystatS
VAR(object, ...)
# S3 method for svystatS
cv(object, ...)
# S3 method for svystatS
deff(object, ...)
# S3 method for svystatS
confint(object, ...)

Arguments

design

Object of class analytic (or inheriting from it) containing survey data and sampling design metadata.

y

Formula defining the interest variable.

classes

Formula defining the population groups whose y shares must be estimated.

by

Formula specifying the variables that define the "estimation domains". If NULL (the default option) estimates refer to the whole population.

vartype

character vector specifying the desired variability estimators. It is possible to choose one or more of: standard error ('se', the default), coefficient of variation ('cv'), percent coefficient of variation ('cvpct'), or variance ('var').

conf.int

Compute confidence intervals for the estimates? The default is FALSE.

conf.lev

Probability specifying the desired confidence level: the default value is 0.95.

deff

Should the design effect be computed? The default is FALSE (see ‘Details’).

na.rm

Should missing values (if any) be removed from the variables of interest? The default is FALSE (see ‘Details’).

object

An object of class svystatS.

...

Additional arguments to coef, ..., confint methods (if any).

Details

This function computes weighted estimates for Shares of a numeric variable, using suitable weights depending on the class of design: calibrated weights for class cal.analytic and direct weights otherwise. Standard errors are calculated using the Taylor linearization technique.

Shares are a special case of Ratios. Therefore, at the price of some additional (and possibly heavy) data preparation effort, shares could also be estimated using function svystatR. However, svystatS makes estimation by far easier, in particular when shares have to be estimated for many population groups and/or within many domains.

The mandatory argument y identifies the variable of interest, that is the variable for which estimates of shares have to be calculated. The design variable referenced by y must be numeric.

The mandatory argument classes identifies population groups whose shares of y have to be estimated. The design variables referenced by classes must be of class factor. Groups can be identified by crossing factors, e.g. statement classes = ~C1:C2 selects as groups the subpopulations determined by crossing the levels of factors C1 and C2.

The optional argument by specifies the variables defining the "estimation domains", that is the subpopulations within which shares of y by classes must be estimated. If by=NULL (the default option), the estimates produced by svystatS refer to the whole population. Estimation domains must be defined by a formula: for instance the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. Notice that a formula like by=~B1+B2 will be automatically translated into the factor-crossing formula by=~B1:B2: if you need to compute estimates for domains B1 and B2 separately, you have to call svystatS twice. The design variables referenced by by (if any) should be of type factor, otherwise they will be coerced.

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95.

The optional argument deff allows to request the design effect [Kish 1995] for the estimates. By default deff=FALSE, that is the design effect is not provided. The design effect of an estimator is defined as the ratio between the variance of the estimator under the actual sampling design and the variance that would be obtained for an 'equivalent' estimator under a hypothetical simple random sampling without replacement of the same size. To obtain an estimate of the design effect comparing to simple random sampling “with replacement”, one must use deff="replace".
Being Ratios nonlinear estimators, the design effect is estimated on the linearized version of the estimator (that is: for the estimator of the total of the linearized variable, aka "Woodruff transform").
When dealing with domain estimation, the design effects referring to a given subpopulation are currently computed by taking the ratios between the actual variance estimates and those that would have been obtained if a simple random sampling were carried out within that subpopulation. This is the same as the srssubpop option for Stata's function estat.

Missing values (NA) in interest variables should be avoided. If na.rm=FALSE (the default) they generate NAs in estimates (or even an error, if design is calibrated). If na.rm=TRUE, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.

Value

An object inheriting from the data.frame class, whose detailed structure depends on input parameters' values.

References

Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.

Kish, L. (1995). “Methods for design effects”. Journal of Official Statistics, Vol. 11, pp. 55-77.

European Commission, Eurostat, (2013). “Handbook on precision requirements and variance estimation for ESS households surveys: 2013 edition”, Publications Office. doi: 10.2785/13579

See also

Estimators of Totals and Means svystatTM, Ratios between Totals svystatR, Ratios between Shares svystatSR, Multiple Regression Coefficients svystatB, Quantiles svystatQ, Complex Analytic Functions of Totals and/or Means svystatL, and all of the above svystat.

Examples

# Load household data: data(data.examples) # Create a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Add convenience variable 'ones' to estimate counts: des<-des.addvars(des,ones=1) ### Simple examples to illustrate the syntax: # Shares of income for sex classes: svystatS(des, y=~income, classes=~sex, vartype="cvpct")
#> income.Share CV% #> sexf 0.5087441 1.930302 #> sexm 0.4912559 1.999019
# Shares of income for sex and 5 age classes: svystatS(des, y=~income, classes=~age5c:sex, vartype="cvpct")
#> income.Share CV% #> age5c1:sexf 0.05796082 7.805591 #> age5c2:sexf 0.14847459 4.479104 #> age5c3:sexf 0.20256471 4.483051 #> age5c4:sexf 0.08597464 7.877755 #> age5c5:sexf 0.01376936 21.802443 #> age5c1:sexm 0.04895332 7.902200 #> age5c2:sexm 0.14261140 4.720679 #> age5c3:sexm 0.20645056 3.965306 #> age5c4:sexm 0.07578320 8.730504 #> age5c5:sexm 0.01745741 16.912053
# Shares of income for sex classes within region domains: svystatS(des, y=~income, classes=~sex, by=~regcod, vartype="cvpct")
#> regcod income:sexf income:sexm CV%.income:sexf CV%.income:sexm #> 6 6 0.5039098 0.4960902 3.445298 3.499604 #> 7 7 0.5192129 0.4807871 2.883144 3.113573 #> 10 10 0.4950567 0.5049433 4.025673 3.946851
# Shares of income for sex classes within domains defined by crossing region and # 5 age classes: svystatS(des, y=~income, classes=~sex, by=~age5c:regcod, vartype="cvpct")
#> age5c regcod income:sexf income:sexm CV%.income:sexf CV%.income:sexm #> 1.6 1 6 0.5475741 0.4524259 8.479840 10.263207 #> 2.6 2 6 0.4951517 0.5048483 6.420218 6.296906 #> 3.6 3 6 0.4980648 0.5019352 6.727295 6.675421 #> 4.6 4 6 0.5379676 0.4620324 9.822718 11.437085 #> 5.6 5 6 0.3248639 0.6751361 40.234002 19.359913 #> 1.7 1 7 0.5509027 0.4490973 8.265477 10.139171 #> 2.7 2 7 0.5175089 0.4824911 5.334786 5.721970 #> 3.7 3 7 0.4994485 0.5005515 5.175540 5.164136 #> 4.7 4 7 0.5366222 0.4633778 7.796564 9.028936 #> 5.7 5 7 0.5511434 0.4488566 17.142948 21.049532 #> 1.10 1 10 0.5202883 0.4797117 10.134536 10.991769 #> 2.10 2 10 0.5147879 0.4852121 6.464970 6.859038 #> 3.10 3 10 0.4848407 0.5151593 5.370243 5.054188 #> 4.10 4 10 0.5074344 0.4925656 11.400020 11.744144 #> 5.10 5 10 0.2730298 0.7269702 43.017406 16.156146
# MARGINAL, CONDITIONAL and JOINT relative frequencies (see also ?svystatTM) # MARGINAL: e.g. proportions of people by provinces: svystatS(des, y=~ones, classes=~procod, vartype="cvpct")
#> ones.Share CV% #> procod8 0.05068687 17.050743 #> procod9 0.07938924 7.993672 #> procod10 0.24593613 2.735905 #> procod11 0.06838915 5.393165 #> procod30 0.13702924 5.691987 #> procod31 0.03560746 5.435348 #> procod32 0.06561856 3.551743 #> procod54 0.17151042 3.150978 #> procod55 0.06652745 8.344750 #> procod93 0.07930548 4.148163
# CONDITIONAL: e.g. proportions of people by sex within provinces: svystatS(des, y=~ones, classes=~sex, by=~procod, vartype="cvpct")
#> procod ones:sexf ones:sexm CV%.ones:sexf CV%.ones:sexm #> 8 8 0.6008480 0.3991520 4.955975 7.460285 #> 9 9 0.5207357 0.4792643 5.370137 5.834823 #> 10 10 0.5145582 0.4854418 3.439184 3.645464 #> 11 11 0.4627800 0.5372200 12.260925 10.561986 #> 30 30 0.5015948 0.4984052 5.211764 5.245117 #> 31 31 0.4808858 0.5191142 8.465081 7.841700 #> 32 32 0.5000000 0.5000000 7.363144 7.363144 #> 54 54 0.4995583 0.5004417 3.852386 3.845586 #> 55 55 0.5019625 0.4980375 8.317523 8.383072 #> 93 93 0.5161715 0.4838285 4.733737 5.050178
# JOINT: e.g. proportions of people cross-classified by sex and procod: svystatS(des, y=~ones, classes=~sex:procod, vartype="cvpct")
#> ones.Share CV% #> sexf:procod8 0.03045510 16.758473 #> sexm:procod8 0.02023176 19.955734 #> sexf:procod9 0.04134081 9.171583 #> sexm:procod9 0.03804843 10.359006 #> sexf:procod10 0.12654846 4.208379 #> sexm:procod10 0.11938767 4.740613 #> sexf:procod11 0.03164913 10.460703 #> sexm:procod11 0.03674002 14.175111 #> sexf:procod30 0.06873316 8.946338 #> sexm:procod30 0.06829608 6.269223 #> sexf:procod31 0.01712312 12.009082 #> sexm:procod31 0.01848434 7.154418 #> sexf:procod32 0.03280928 8.088211 #> sexm:procod32 0.03280928 8.260894 #> sexf:procod54 0.08567946 5.468345 #> sexm:procod54 0.08583096 4.426399 #> sexf:procod55 0.03339428 15.091163 #> sexm:procod55 0.03313316 7.091033 #> sexf:procod93 0.04093523 5.785082 #> sexm:procod93 0.03837025 7.019324
### One more complicated example: ######################################################## # Shares of income held by people for income quintiles # ######################################################## # First: estimate income quintiles inc.Q5 <- svystatQ(des, y=~income, probs=seq(0.2, 0.8, 0.2), ties="rounded") inc.Q5
#> income.Q[p] SE CI.l(95%) CI.u(95%) #> p = 0.200 888.019 8.790575 868.0873 902.5458 #> p = 0.400 1134.873 9.562784 1117.9118 1155.3972 #> p = 0.600 1359.643 11.348459 1335.7082 1380.1933 #> p = 0.800 1606.796 14.910089 1581.4703 1639.9168
# Second: add a convenience factor variable classifying people by income # quintiles des<-des.addvars(des, quintile = cut(income, breaks = c(0, coef(inc.Q5), Inf), labels = 1:5, include.lowest=TRUE) ) # Third: estimate income shares by income quintiles svystatS(des, y=~income, classes=~quintile, vartype="cvpct")
#> income.Share CV% #> quintile1 0.1053057 4.223871 #> quintile2 0.1605442 3.928511 #> quintile3 0.1993865 3.936444 #> quintile4 0.2345513 3.372641 #> quintile5 0.3002123 3.719747
### NOTE: Procedure above yields *correct point estimates* of income shares by ### income quintiles, while *variance estimation is approximated* since ### we neglected the sampling variability of the estimated quintiles.