Estimation of Ratios in Subpopulations

Calculates estimates, standard errors and confidence intervals for Ratios between Totals in subpopulations.

svystatR(design, num, den, by = NULL, cross = FALSE,
         vartype = c("se", "cv", "cvpct", "var"),
         conf.int = FALSE, conf.lev = 0.95, deff = FALSE,
         na.rm = FALSE)

# S3 method for svystatR
coef(object, ...)
# S3 method for svystatR
SE(object, ...)
# S3 method for svystatR
VAR(object, ...)
# S3 method for svystatR
cv(object, ...)
# S3 method for svystatR
deff(object, ...)
# S3 method for svystatR
confint(object, ...)

Arguments

design	Object of class `analytic` (or inheriting from it) containing survey data and sampling design metadata.
num	Formula defining the numerator variables for the ratios.
den	Formula defining the denominator variables for the ratios.
by	Formula specifying the variables that define the "estimation domains". If `NULL` (the default option) estimates refer to the whole population.
cross	Should ratios be estimated for all the pairs of variables in `'num'` and `'den'`? The default is `FALSE`, meaning that ratios get estimated parallel-wise (see ‘Details’).
vartype	`character` vector specifying the desired variability estimators. It is possible to choose one or more of: standard error (`'se'`, the default), coefficient of variation (`'cv'`), percent coefficient of variation (`'cvpct'`), or variance (`'var'`).
conf.int	Compute confidence intervals for the estimates? The default is `FALSE`.
conf.lev	Probability specifying the desired confidence level: the default value is `0.95`.
deff	Should the design effect be computed? The default is `FALSE` (see ‘Details’).
na.rm	Should missing values (if any) be removed from the variables of interest? The default is `FALSE` (see ‘Details’).
object	An object of class `svystatR`.
...	Additional arguments to `coef`, ..., `confint` methods (if any).

Details

This function computes weighted estimates for Ratios between Totals using suitable weights depending on the class of design: calibrated weights for class cal.analytic and direct weights otherwise. Standard errors are calculated using the Taylor linearization technique.

The mandatory argument num (den) identifies the variables whose totals appear as numerators (denominators) in the Ratios: the corresponding formula must be of the type num = ~num.1 + ... + num.k (den = ~den.1 + ... + den.l). The design variables referenced by num (den) must be numeric.

If cross=TRUE, the function computes estimates for all the Ratios between pairs of variables coming from num and den (that is k*l estimates for the formulae above). If, on the contrary, cross=FALSE (the default), Ratios get estimated parallel-wise and R recycling rule is applied whenever k!=l: for the formulae above, this generates r Ratios, where r=max(k,l).

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by svystatR refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. Notice that a formula like by=~B1+B2 will be automatically translated into the factor-crossing formula by=~B1:B2: if you need to compute estimates for domains B1 and B2 separately, you have to call svystatR twice. The design variables referenced by by (if any) should be of type factor, otherwise they will be coerced.

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95.

The optional argument deff allows to request the design effect [Kish 1995] for the estimates. By default deff=FALSE, that is the design effect is not provided. The design effect of an estimator is defined as the ratio between the variance of the estimator under the actual sampling design and the variance that would be obtained for an 'equivalent' estimator under a hypothetical simple random sampling without replacement of the same size. To obtain an estimate of the design effect comparing to simple random sampling “with replacement”, one must use deff="replace".
Being Ratios nonlinear estimators, the design effect is estimated on the linearized version of the estimator (that is: for the estimator of the total of the linearized variable, aka "Woodruff transform").
When dealing with domain estimation, the design effects referring to a given subpopulation are currently computed by taking the ratios between the actual variance estimates and those that would have been obtained if a simple random sampling were carried out within that subpopulation. This is the same as the srssubpop option for Stata's function estat.

Missing values (NA) in interest variables should be avoided. If na.rm=FALSE (the default) they generate NAs in estimates (or even an error, if design is calibrated). If na.rm=TRUE, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased. Notice that the na.rm=TRUE option is only allowed for a single Ratio, i.e. if num and den reference a single interest variable.

Value

An object inheriting from the data.frame class, whose detailed structure depends on input parameters' values.

Warning

It can happen that, in some subpopulations, the estimate of the Total of some den variables turns out to be zero. In such cases svystatR estimates are either NaN or Inf, and NaN is returned for the corresponding SE estimates.

References

Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.

Kish, L. (1995). “Methods for design effects”. Journal of Official Statistics, Vol. 11, pp. 55-77.

European Commission, Eurostat, (2013). “Handbook on precision requirements and variance estimation for ESS households surveys: 2013 edition”, Publications Office. doi: 10.2785/13579

Examples

# Creation of a design object:
data(sbs)
des<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,
     fpc=~fpc)

# Estimation of the average value added per employee
# at the nation level:
svystatR(des,~va.imp2,~emp.num)
#>                    Ratio       SE
#> va.imp2/emp.num 57.14199 1.018897

# The same as above by economic activity macro-sector:
svystatR(des,~va.imp2,~emp.num,~nace.macro,vartype="cvpct")
#>              nace.macro va.imp2/emp.num CV%.va.imp2/emp.num
#> Agriculture Agriculture        58.59297            5.500097
#> Industry       Industry        47.93647            1.372657
#> Commerce       Commerce       226.99062            5.683984
#> Services       Services        36.89123            1.563336


# Another design object:
data(data.examples)
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
     weights=~weight)

# Estimation of the ratios y1/x1, y1/x2, y2/x1 and y2/x2 by region,
# notice the use of argument cross:
svystatR(des,~y1+y2,~x1+x2,by=~regcod,cross=TRUE)
#>    regcod    y1/x1    y2/x1    y1/x2    y2/x2  SE.y1/x1  SE.y2/x1 SE.y1/x2
#> 6       6 7.164188 6.793907 17.44359 16.54202 1.0930880 1.0331919 3.790573
#> 7       7 5.441836 5.020335 19.52299 18.01082 0.9005887 0.8241623 3.873493
#> 10     10 9.041784 8.507911 22.57800 21.24488 1.4328263 1.3608361 5.279342
#>    SE.y2/x2
#> 6  3.610181
#> 7  3.601568
#> 10 4.986415

# ... compare the latter with the default (i.e. cross=FALSE)
svystatR(des,~y1+y2,~x1+x2,by=~regcod)
#>    regcod    y1/x1    y2/x2  SE.y1/x1 SE.y2/x2
#> 6       6 7.164188 16.54202 1.0930880 3.610181
#> 7       7 5.441836 18.01082 0.9005887 3.601568
#> 10     10 9.041784 21.24488 1.4328263 4.986415


# Estimation of the ratios z/x1, z/x2 e z/x3
# for the whole population (notice the recycling rule):
svystatR(des,~z,~x1+x2+x3,conf.int=TRUE)
#>         Ratio       SE CI.l(95%) CI.u(95%)
#> z/x1 2135.696 194.8578  1753.782  2517.610
#> z/x2 6244.631 772.9561  4729.665  7759.597
#> z/x3 5965.892 851.0741  4297.817  7633.966

# Estimators of means can be thought as 
# estimators of ratios:
svystatTM(des,~income,estimator="Mean")
#>            Mean       SE
#> income 1256.166 8.552545
svystatR(des.addvars(des,ones=1),num=~income,den=~ones)
#>                Ratio       SE
#> income/ones 1256.166 8.552545


##################################################
# Household-level averages in household surveys. #
##################################################

# For an introduction on this topic, see ?svystatTM examples.

  # Load survey data:
  data(data.examples)

  # Define the survey design (variable famcod identifies households) 
  exdes<-e.svydesign(data=example,ids=~towcod+famcod,strata=~stratum,
         weights=~weight)

  # Collapse strata to eliminate lonely PSUs
  exdes<-collapse.strata(design=exdes,block.vars=~sr:procod)
#> 
#> # All lonely strata (45) successfully collapsed!
#> 
#> Warning: No similarity score specified: achieved strata aggregation depends on the ordering of sample data

  # Now add new convenience variables to the design object:
    ## 'ones':       to estimate individuals counts
    ## 'housize':    to classify individuals by household size
    ## 'houdensity': to estimate households counts
  exdes<-des.addvars(exdes,
                     ones=1,
                     housize=factor(ave(famcod,famcod,FUN = length)),
                     houdensity=ave(famcod,famcod,FUN = function(x) 1/length(x))
                    )

  # Estimate the average number of household components by region:
  svystatR(exdes,num=~ones,den=~houdensity,by=~regcod,
           vartype="cvpct",conf.int=TRUE)
#>    regcod ones/houdensity CI.l(95%).ones/houdensity CI.u(95%).ones/houdensity
#> 6       6        1.260202                  1.226598                  1.293806
#> 7       7        1.240827                  1.204895                  1.276759
#> 10     10        1.293958                  1.252664                  1.335251
#>    CV%.ones/houdensity
#> 6             1.360513
#> 7             1.477467
#> 10            1.628209

  # Estimate the average household income for the whole population:
  svystatR(exdes,num=~income,den=~houdensity,vartype="cvpct",
           conf.int=TRUE)
#>                      Ratio CI.l(95%) CI.u(95%)      CV%
#> income/houdensity 1581.869  1546.801  1616.937 1.131079

  # ...for household size categories:
  svystatR(exdes,num=~income,den=~houdensity,by=~housize,
           vartype="cvpct",conf.int=TRUE)
#>   housize income/houdensity CI.l(95%).income/houdensity
#> 1       1          1262.978                    1241.878
#> 2       2          2494.997                    2437.738
#> 3       3          3632.002                    3462.927
#> 4       4          5862.479                    4853.610
#>   CI.u(95%).income/houdensity CV%.income/houdensity
#> 1                    1284.078             0.8523761
#> 2                    2552.256             1.1709157
#> 3                    3801.077             2.3751243
#> 4                    6871.348             8.7802161

  # ...and for province and household size:
  svystatR(exdes,num=~income,den=~houdensity,by=~housize:procod,
           vartype="cvpct")
#>      housize procod income/houdensity CV%.income/houdensity
#> 1.8        1      8          1218.423          3.498595e+00
#> 2.8        2      8          2662.211          5.793224e+00
#> 3.8        3      8          3319.570          1.394928e+01
#> 1.9        1      9          1292.456          3.882137e+00
#> 2.9        2      9          2542.949          2.233150e+00
#> 3.9        3      9          3554.366          4.319602e+00
#> 1.10       1     10          1292.177          1.711626e+00
#> 2.10       2     10          2504.872          2.373636e+00
#> 3.10       3     10          3742.118          4.581306e+00
#> 1.11       1     11          1324.953          3.256715e+00
#> 2.11       2     11          2365.077          7.840995e+00
#> 3.11       3     11          4159.863          3.792977e+00
#> 4.11       4     11          7483.000          1.139451e-14
#> 1.30       1     30          1191.891          3.324862e+00
#> 2.30       2     30          2386.506          3.914073e+00
#> 3.30       3     30          3476.722          1.061944e+01
#> 4.30       4     30          5185.000          1.754088e-14
#> 1.31       1     31          1253.714          2.745500e+00
#> 2.31       2     31          2319.011          3.459736e+00
#> 3.31       3     31          3219.664          8.640756e+00
#> 4.31       4     31          5326.000          0.000000e+00
#> 1.32       1     32          1247.437          2.969107e+00
#> 2.32       2     32          2607.850          4.750956e+00
#> 3.32       3     32          3114.500          1.829131e+01
#> 1.54       1     54          1242.540          1.415123e+00
#> 2.54       2     54          2479.523          2.146544e+00
#> 3.54       3     54          3786.966          3.931424e+00
#> 4.54       4     54          5566.346          5.220366e+00
#> 1.55       1     55          1229.519          3.380565e+00
#> 2.55       2     55          2526.219          2.260006e+00
#> 3.55       3     55          3860.897          6.782238e+00
#> 1.93       1     93          1321.382          2.385572e+00
#> 2.93       2     93          2608.374          3.130341e+00
#> 3.93       3     93          3364.158          4.813151e+00
#> 4.93       4     93          4656.000          1.949567e-14

Arguments

Details

Value

Warning

References

See also

Examples

Contents

Author