Corr.Rd
Estimates the covariance and the correlation of Complex Estimators in subpopulations. A Complex Estimator can be any analytic function of (Horvitz-Thompson or Calibration) estimators of Totals and Means.
CoV(design, expr1, expr2, by = NULL, na.rm = FALSE) Corr(design, expr1, expr2, by = NULL, na.rm = FALSE)
design | Object of class |
---|---|
expr1 | R expression defining the first Complex Estimator (see ‘Details’). |
expr2 | R expression defining the second Complex Estimator (see ‘Details’). |
by | Formula specifying the variables that define the "estimation domains". If |
na.rm | Should missing values (if any) be removed from the variables of interest? The default is
|
This function allows to estimate the covariance and the correlation of two arbitrary Complex Estimators. Estimates are calculated using the Taylor linearization technique.
The mandatory arguments expr1
and expr2
identify the Complex Estimators: both must be of class expression
. For further details on the syntax and the semantics of such expressions, see svystatL
.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by CoV
(Corr
) refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. Notice that a formula like by=~B1+B2
will be automatically translated into the factor-crossing formula by=~B1:B2
: if you need to compute estimates for domains B1
and B2
separately, you have to call CoV
(Corr
) twice. The design
variables referenced by by
(if any) should be of type factor
, otherwise they will be coerced.
Missing values (NA
) in interest variables should be avoided. If na.rm=FALSE
(the default) they generate NAs in estimates (or even an error, if design
is calibrated). If na.rm=TRUE
, observations containing NAs are dropped, and estimates get computed on non missing values only. This implicitly assumes that missing values hit interest variables completely at random: should this not be the case, computed estimates would be biased.
An object inheriting from the data.frame
class, whose detailed structure depends on input parameters' values.
Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.
Estimators of Totals and Means svystatTM
, Ratios between Totals svystatR
, Shares svystatS
, Ratios between Shares svystatSR
, Multiple Regression Coefficients svystatB
, Quantiles svystatQ
, and Complex Analytic Functions of Totals and/or Means svystatL
.
######################################## # Some checks and some simple examples # # to illustrate the syntax. # ######################################## # Load survey data: data(data.examples) # Creation of a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Let's start with some natural checks: ## The covariance of any estimator with itself is its variance ## (use mean income as an example): CoV(des,expression(income/ones),expression(income/ones))#> CoV(income/ones, income/ones) #> 1 73.14603#> VAR.income/ones #> 1 73.14603#> VAR.income #> 1 73.14603## The correlation of any estimator with itself is 1 ## (use mean income as an example): Corr(des,expression(income/ones),expression(income/ones))#> Corr(income/ones, income/ones) #> 1 1# Switch to non trivial examples: ## Correlation of mean income with population size: Corr(des,expression(income/ones),expression(ones))#> Corr(income/ones, ones) #> 1 -0.081836## Correlation of mean income with total income: # at population level: Corr(des,expression(income/ones),expression(income))#> Corr(income/ones, income) #> 1 0.2745306#> regcod Corr(income/ones, income) #> 6 6 0.030855148 #> 7 7 0.448395063 #> 10 10 0.007503017## Correlation of a product of two totals and a ratio of two totals: # at population level: Corr(des,expression(y1*y2),expression(x1/x2))#> Corr(y1 * y2, x1/x2) #> 1 -0.1763557#> procod Corr(income/ones, income) #> 8 8 0.740222988 #> 9 9 0.433180988 #> 10 10 0.172926549 #> 11 11 0.827502672 #> 30 30 -0.059964482 #> 31 31 0.777242593 #> 32 32 0.580070961 #> 54 54 0.137998445 #> 55 55 -0.180946481 #> 93 93 0.003210321###################################################### # A more meaningful and complex example: correlation # # between Geometric, Harmonic and Arithmetic Means. # ###################################################### # Creation of another design object: data(sbs) sbsdes<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight, fpc=~fpc) # Let's use variable emp.num, which is ok as it is always strictly positive: ## Add a convenience variable for estimating the harmonic mean (see ?svystatL ## for details) and prepare the formal estimator expression: sbsdes<-des.addvars(sbsdes,emp.num.m1=1/emp.num) h<-expression(ones/emp.num.m1) ## Add a convenience variable for estimating the geometric mean (see ?svystatL ## for details) and prepare the formal estimator expression: sbsdes<-des.addvars(sbsdes,log.emp.num=log(emp.num)) g<-expression(exp(log.emp.num/ones)) ## prepare the formal estimator expression for the arithmetic mean: m<-expression(emp.num/ones) # Now compute correlations: ## Harmonic with Arithmetic: Corr(sbsdes,h,m)#> Corr(ones/emp.num.m1, emp.num/ones) #> 1 0.4305941## Geometric with Arithmetic: Corr(sbsdes,g,m)#> Corr(exp(log.emp.num/ones), emp.num/ones) #> 1 0.8017319## Harmonic with Geometric: Corr(sbsdes,h,g)#> Corr(ones/emp.num.m1, exp(log.emp.num/ones)) #> 1 0.8510011## Hence, while correlations g-m and g-h are high, correlation h-m is low. #################################################### # Another example: is a ratio estimator of a total # # expected to be more efficient than an HT one? # #################################################### # Let's recall that the ratio estimator of a total is # expected to be more efficient than HT, if the # correlation of numerator and denominator exceeds # half of the ratio between the CVs of denominator # and numerator. # Compute the HT estimate of the total of value added (variable va.imp2): VA<-svystatTM(sbsdes,~va.imp2) VA#> Total SE #> va.imp2 56293880 1011167#> Total SE #> emp.num 985157.9 1421.315# Now estimate the correlation of the numerator # and denominator totals: corr <- Corr(sbsdes,expression(va.imp2),expression(emp.num)) corr#> Corr(va.imp2, emp.num) #> 1 0.1308511# and compare it with (1/2)*( CV(den)/CV(num) ) stopifnot( corr > 0.5*cv(EMP)/cv(VA) ) # As the comparison holds TRUE, we expect an efficiency gain # of the ratio estimator of the total compared to HT. # Let's check...: # Compute the ratio estimate of the total of value added using # as auxiliary variable the number of emloyees, whose total # is 984394: TOT.emp.num <- sum(sbs.frame$emp.num) TOT.emp.num#> [1] 984394#> Error in get(x): object 'TOT.emp.num' not foundVA.ratio#> Error in eval(expr, envir, enclos): object 'VA.ratio' not found#> Error in eval(expr, envir, enclos): object 'VA.ratio' not foundSE(VA)#> SE.va.imp2 #> 1 1011167#> Error in eval(expr, envir, enclos): object 'VA.ratio' not found# ...as expected.