svystatSR.Rd
Calculates estimates, standard errors and confidence intervals for Ratios between Shares of a numeric variables in subpopulations.
svystatSR(design, y, classes, by = NULL, vartype = c("se", "cv", "cvpct", "var"), conf.int = FALSE, conf.lev = 0.95, deff = FALSE, na.rm = FALSE) # S3 method for svystatSR coef(object, ...) # S3 method for svystatSR SE(object, ...) # S3 method for svystatSR VAR(object, ...) # S3 method for svystatSR cv(object, ...) # S3 method for svystatSR deff(object, ...) # S3 method for svystatSR confint(object, ...)
design | Object of class |
---|---|
y | Formula defining the interest variable. |
classes | Formula defining the population groups among which ratios of |
by | Formula specifying the variables that define the "estimation domains". If |
vartype |
|
conf.int | Compute confidence intervals for the estimates? The default is
|
conf.lev | Probability specifying the desired confidence level: the default value is |
deff | Should the design effect be computed? The default is |
na.rm | Should missing values (if any) be removed from the variables of interest? The default is
|
object | An object of class |
... | Additional arguments to |
This function computes weighted estimates for Ratios between Shares of a numeric variable, using suitable weights depending on the class of design
: calibrated weights for class cal.analytic
and direct weights otherwise. Standard errors are calculated using the Taylor linearization technique.
Ratios of Shares are a special case of Ratios. Therefore, at the price of some additional and heavy data preparation effort, ratios of shares could also be estimated using function svystatR
. However, svystatSR
makes estimation by far easier, in particular when share ratios have to be estimated for many population groups and/or within many domains.
The mandatory argument classes
identifies population groups whose ratios of y
shares have to be estimated. Note that ratios of shares will be estimated and returned for all the ordered pairs of population groups defined by classes
. Therefore, if classes
defines G
groups, svystatSR
will have to compute estimates and sampling errors for G * (G - 1)
share ratios. To prevent combinatorial explosions (e.g. G = 20
would generate 380
share ratios), classes
formula can reference just a single design
variable, which must be a factor
.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by svystatSR
refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. Notice that a formula like by=~B1+B2
will be automatically translated into the factor-crossing formula by=~B1:B2
: if you need to compute estimates for domains B1
and B2
separately, you have to call svystatSR
twice. The design
variables referenced by by
(if any) should be of type factor
, otherwise they will be coerced.
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
.
The optional argument deff
allows to request the design effect [Kish 1995] for the estimates. By default deff=FALSE
, that is the design effect is not provided. The design effect of an estimator is defined as the ratio between the variance of the estimator under the actual sampling design and the variance that would be obtained for an 'equivalent' estimator under a hypothetical simple random sampling without replacement of the same size. To obtain an estimate of the design effect comparing to simple random sampling “with replacement”, one must use deff="replace"
.
Being Ratios nonlinear estimators, the design effect is estimated on the linearized version of the estimator (that is: for the estimator of the total of the linearized variable, aka "Woodruff transform").
When dealing with domain estimation, the design effects referring to a given subpopulation are currently computed by taking the ratios between the actual variance estimates and those that would have been obtained if a simple random sampling were carried out within that subpopulation. This is the same as the srssubpop
option for Stata's function estat
.
Missing values (NA
) in interest variables should be avoided. If na.rm=FALSE
(the default) they generate NAs in estimates (or even an error, if design
is calibrated). If na.rm=TRUE
, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.
An object inheriting from the data.frame
class, whose detailed structure depends on input parameters' values.
Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.
Kish, L. (1995). “Methods for design effects”. Journal of Official Statistics, Vol. 11, pp. 55-77.
European Commission, Eurostat, (2013). “Handbook on precision requirements and variance estimation for ESS households surveys: 2013 edition”, Publications Office. doi: 10.2785/13579
Estimators of Totals and Means svystatTM
, Ratios between Totals svystatR
, Shares svystatS
, Multiple Regression Coefficients svystatB
, Quantiles svystatQ
, Complex Analytic Functions of Totals and/or Means svystatL
, and all of the above svystat
.
# Load household data: data(data.examples) # Create a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Add convenience variable 'ones' to estimate counts: des<-des.addvars(des,ones=1) ### Simple examples to illustrate the syntax: # Population sex ratios: svystatSR(des, y=~ones, classes=~sex, vartype="cvpct")#> ones.ShareRatio CV% #> sexf/sexm 1.0352839 3.683917 #> sexm/sexf 0.9659187 3.683917# Population sex ratios within provinces: svystatSR(des, y=~ones, classes=~sex, by=~procod, vartype="cvpct")#> procod ones:sexf/sexm ones:sexm/sexf CV%.ones:sexf/sexm CV%.ones:sexm/sexf #> 8 8 1.5053112 0.6643144 12.416260 12.416260 #> 9 9 1.0865315 0.9203599 11.204960 11.204960 #> 10 10 1.0599794 0.9434146 7.084647 7.084647 #> 11 11 0.8614347 1.1608541 22.822911 22.822911 #> 30 30 1.0063997 0.9936410 10.456881 10.456881 #> 31 31 0.9263585 1.0794957 16.306781 16.306781 #> 32 32 1.0000000 1.0000000 14.726289 14.726289 #> 54 54 0.9982349 1.0017682 7.697972 7.697972 #> 55 55 1.0078809 0.9921807 16.700595 16.700595 #> 93 93 1.0668481 0.9373406 9.783914 9.783914# Ratios of population shares for 5 age classes: # NOTE: This yields 5*(5-1)=20 ratios svystatSR(des, y=~ones, classes=~age5c, vartype="cvpct")#> ones.ShareRatio CV% #> age5c1/age5c2 0.43767583 5.791164 #> age5c1/age5c3 0.36168712 6.141410 #> age5c1/age5c4 1.04907175 8.526818 #> age5c1/age5c5 6.07113292 14.308045 #> age5c2/age5c3 0.82638130 4.421771 #> age5c2/age5c4 2.39691499 7.073318 #> age5c2/age5c5 13.87130056 13.109706 #> age5c3/age5c4 2.90049521 7.083608 #> age5c3/age5c5 16.78559354 12.963318 #> age5c4/age5c5 5.78714748 14.607945 #> age5c2/age5c1 2.28479606 5.791164 #> age5c3/age5c1 2.76482063 6.141410 #> age5c4/age5c1 0.95322365 8.526818 #> age5c5/age5c1 0.16471390 14.308045 #> age5c3/age5c2 1.21009515 4.421771 #> age5c4/age5c2 0.41720295 7.073318 #> age5c5/age5c2 0.07209129 13.109706 #> age5c4/age5c3 0.34476871 7.083608 #> age5c5/age5c3 0.05957490 12.963318 #> age5c5/age5c4 0.17279670 14.607945### One more complicated example: ####################################################################### # Ratios between shares of income held by people for income quintiles # ####################################################################### # First: estimate income quintiles inc.Q5 <- svystatQ(des, y=~income, probs=seq(0.2, 0.8, 0.2), ties="rounded") inc.Q5#> income.Q[p] SE CI.l(95%) CI.u(95%) #> p = 0.200 888.019 8.790575 868.0873 902.5458 #> p = 0.400 1134.873 9.562784 1117.9118 1155.3972 #> p = 0.600 1359.643 11.348459 1335.7082 1380.1933 #> p = 0.800 1606.796 14.910089 1581.4703 1639.9168# Second: add a convenience factor variable classifying people by income # quintiles des<-des.addvars(des, quintile = cut(income, breaks = c(0, coef(inc.Q5), Inf), labels = 1:5, include.lowest=TRUE) ) # Third: estimate income shares by income quintiles QS5 <- svystatSR(des, y=~income, classes=~quintile, vartype="cvpct") QS5#> income.ShareRatio CV% #> quintile1/quintile2 0.6559301 5.960354 #> quintile1/quintile3 0.5281489 6.185028 #> quintile1/quintile4 0.4489668 5.630257 #> quintile1/quintile5 0.3507709 6.146778 #> quintile2/quintile3 0.8051908 5.517612 #> quintile2/quintile4 0.6844736 5.620953 #> quintile2/quintile5 0.5347687 6.447332 #> quintile3/quintile4 0.8500763 5.624728 #> quintile3/quintile5 0.6641515 6.687639 #> quintile4/quintile5 0.7812846 6.032199 #> quintile2/quintile1 1.5245527 5.960354 #> quintile3/quintile1 1.8934055 6.185028 #> quintile4/quintile1 2.2273361 5.630257 #> quintile5/quintile1 2.8508639 6.146778 #> quintile3/quintile2 1.2419417 5.517612 #> quintile4/quintile2 1.4609768 5.620953 #> quintile5/quintile2 1.8699674 6.447332 #> quintile4/quintile3 1.1763650 5.624728 #> quintile5/quintile3 1.5056806 6.687639 #> quintile5/quintile4 1.2799433 6.032199### Therefore, for instance, the *S80/S20 income quintile share ratio* is: S80.20 <- QS5["quintile5/quintile1",] S80.20#> income.ShareRatio CV% #> quintile5/quintile1 2.850864 6.146778### NOTE: Procedure above yields *correct point estimates* of income quintile ### share ratios, while *variance estimation is approximated* since ### we neglected the sampling variability of the estimated quintiles.