Estimates the covariance and the correlation of Complex Estimators in subpopulations. A Complex Estimator can be any analytic function of (Horvitz-Thompson or Calibration) estimators of Totals and Means.

CoV(design, expr1, expr2,
     by = NULL, na.rm = FALSE)
Corr(design, expr1, expr2,
     by = NULL, na.rm = FALSE)

Arguments

design

Object of class analytic (or inheriting from it) containing survey data and sampling design metadata.

expr1

R expression defining the first Complex Estimator (see ‘Details’).

expr2

R expression defining the second Complex Estimator (see ‘Details’).

by

Formula specifying the variables that define the "estimation domains". If NULL (the default option) estimates refer to the whole population.

na.rm

Should missing values (if any) be removed from the variables of interest? The default is FALSE (see ‘Details’).

Details

This function allows to estimate the covariance and the correlation of two arbitrary Complex Estimators. Estimates are calculated using the Taylor linearization technique.

The mandatory arguments expr1 and expr2 identify the Complex Estimators: both must be of class expression. For further details on the syntax and the semantics of such expressions, see svystatL.

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by CoV (Corr) refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. Notice that a formula like by=~B1+B2 will be automatically translated into the factor-crossing formula by=~B1:B2: if you need to compute estimates for domains B1 and B2 separately, you have to call CoV (Corr) twice. The design variables referenced by by (if any) should be of type factor, otherwise they will be coerced.

Missing values (NA) in interest variables should be avoided. If na.rm=FALSE (the default) they generate NAs in estimates (or even an error, if design is calibrated). If na.rm=TRUE, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.

Value

An object inheriting from the data.frame class, whose detailed structure depends on input parameters' values.

References

Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.

See also

Estimators of Totals and Means svystatTM, Ratios between Totals svystatR, Shares svystatS, Ratios between Shares svystatSR, Multiple Regression Coefficients svystatB, Quantiles svystatQ, and Complex Analytic Functions of Totals and/or Means svystatL.

Examples

######################################## # Some checks and some simple examples # # to illustrate the syntax. # ######################################## # Load survey data: data(data.examples) # Creation of a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Let's start with some natural checks: ## The covariance of any estimator with itself is its variance ## (use mean income as an example): CoV(des,expression(income/ones),expression(income/ones))
#> CoV(income/ones, income/ones) #> 1 73.14603
VAR(svystatL(des,expression(income/ones)))
#> VAR.income/ones #> 1 73.14603
VAR(svystatTM(des,~income,estimator="Mean"))
#> VAR.income #> 1 73.14603
## The correlation of any estimator with itself is 1 ## (use mean income as an example): Corr(des,expression(income/ones),expression(income/ones))
#> Corr(income/ones, income/ones) #> 1 1
# Switch to non trivial examples: ## Correlation of mean income with population size: Corr(des,expression(income/ones),expression(ones))
#> Corr(income/ones, ones) #> 1 -0.081836
## Correlation of mean income with total income: # at population level: Corr(des,expression(income/ones),expression(income))
#> Corr(income/ones, income) #> 1 0.2745306
# for regions: Corr(des,expression(income/ones),expression(income),by=~regcod)
#> regcod Corr(income/ones, income) #> 6 6 0.030855148 #> 7 7 0.448395063 #> 10 10 0.007503017
## Correlation of a product of two totals and a ratio of two totals: # at population level: Corr(des,expression(y1*y2),expression(x1/x2))
#> Corr(y1 * y2, x1/x2) #> 1 -0.1763557
# for provinces: Corr(des,expression(income/ones),expression(income),by=~procod)
#> procod Corr(income/ones, income) #> 8 8 0.740222988 #> 9 9 0.433180988 #> 10 10 0.172926549 #> 11 11 0.827502672 #> 30 30 -0.059964482 #> 31 31 0.777242593 #> 32 32 0.580070961 #> 54 54 0.137998445 #> 55 55 -0.180946481 #> 93 93 0.003210321
###################################################### # A more meaningful and complex example: correlation # # between Geometric, Harmonic and Arithmetic Means. # ###################################################### # Creation of another design object: data(sbs) sbsdes<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight, fpc=~fpc) # Let's use variable emp.num, which is ok as it is always strictly positive: ## Add a convenience variable for estimating the harmonic mean (see ?svystatL ## for details) and prepare the formal estimator expression: sbsdes<-des.addvars(sbsdes,emp.num.m1=1/emp.num) h<-expression(ones/emp.num.m1) ## Add a convenience variable for estimating the geometric mean (see ?svystatL ## for details) and prepare the formal estimator expression: sbsdes<-des.addvars(sbsdes,log.emp.num=log(emp.num)) g<-expression(exp(log.emp.num/ones)) ## prepare the formal estimator expression for the arithmetic mean: m<-expression(emp.num/ones) # Now compute correlations: ## Harmonic with Arithmetic: Corr(sbsdes,h,m)
#> Corr(ones/emp.num.m1, emp.num/ones) #> 1 0.4305941
## Geometric with Arithmetic: Corr(sbsdes,g,m)
#> Corr(exp(log.emp.num/ones), emp.num/ones) #> 1 0.8017319
## Harmonic with Geometric: Corr(sbsdes,h,g)
#> Corr(ones/emp.num.m1, exp(log.emp.num/ones)) #> 1 0.8510011
## Hence, while correlations g-m and g-h are high, correlation h-m is low. #################################################### # Another example: is a ratio estimator of a total # # expected to be more efficient than an HT one? # #################################################### # Let's recall that the ratio estimator of a total is # expected to be more efficient than HT, if the # correlation of numerator and denominator exceeds # half of the ratio between the CVs of denominator # and numerator. # Compute the HT estimate of the total of value added (variable va.imp2): VA<-svystatTM(sbsdes,~va.imp2) VA
#> Total SE #> va.imp2 56293880 1011167
# Compute the HT estimate of the total of emp.num: EMP<-svystatTM(sbsdes,~emp.num) EMP
#> Total SE #> emp.num 985157.9 1421.315
# Now estimate the correlation of the numerator # and denominator totals: corr <- Corr(sbsdes,expression(va.imp2),expression(emp.num)) corr
#> Corr(va.imp2, emp.num) #> 1 0.1308511
# and compare it with (1/2)*( CV(den)/CV(num) ) stopifnot( corr > 0.5*cv(EMP)/cv(VA) ) # As the comparison holds TRUE, we expect an efficiency gain # of the ratio estimator of the total compared to HT. # Let's check...: # Compute the ratio estimate of the total of value added using # as auxiliary variable the number of emloyees, whose total # is 984394: TOT.emp.num <- sum(sbs.frame$emp.num) TOT.emp.num
#> [1] 984394
VA.ratio<-svystatL(sbsdes, expression(TOT.emp.num * (va.imp2/emp.num)))
#> Error in get(x): object 'TOT.emp.num' not found
VA.ratio
#> Error in eval(expr, envir, enclos): object 'VA.ratio' not found
# Compare standard errors sizes: SE(VA.ratio)
#> Error in eval(expr, envir, enclos): object 'VA.ratio' not found
SE(VA)
#> SE.va.imp2 #> 1 1011167
stopifnot( SE(VA.ratio) < SE(VA) )
#> Error in eval(expr, envir, enclos): object 'VA.ratio' not found
# ...as expected.