ext.calibrated.Rd
Enables ReGenesees to provide correct variance estimates of (functions of) calibration estimators, even if the survey weights have not been calibrated by ReGenesees.
ext.calibrated(data, ids, strata = NULL, weights, fpc = NULL, self.rep.str = NULL, check.data = TRUE, weights.cal, calmodel, partition = FALSE, sigma2 = NULL)
data | The same as in function |
---|---|
ids | The same as in function |
strata | The same as in function |
weights | The same as in function |
fpc | The same as in function |
self.rep.str | The same as in function |
check.data | The same as in function |
weights.cal | Formula identifying the externally calibrated weights. |
calmodel | The same as in function |
partition | The same as in function |
sigma2 | The same as in function |
Owing to ReGenesees's ability to provide proper variance estimates for (complex functions of) calibration estimators, some users may be tempted to exploit ReGenesees in the estimation phase even if they did not use ReGenesees for calibration.
This result cannot be achieved naively, by simply passing to ReGenesees function e.svydesign
the survey data and supplying the externally calibrated weights through its weights
argument.
Indeed, variance estimation methods of ReGenesees's summary statistics functions (svystatTM
, svystatR
, svystatS
, svystatSR
, svystatB
, svystatQ
, svystatL
and svystat
) are dispatched according to the class of the input design
object:
If the design object is un-calibrated (i.e. its class is ‘analytic’), variance formulas are appropriate to Horvitz-Thompson estimators (and functions of them).
If the design object is calibrated (i.e. its class is ‘cal.analytic’), variance formulas are appropriate to Calibration estimators (and functions of them).
Therefore, the naive approach of passing the externally calibrated weights weights.cal
to e.svydesign
as if they were initial or design weights cannot succeed, since it would result in HT-like variance estimates, leading generally to variance overestimation (with bigger upward bias for variables that are better explained by the calibration model).
Function ext.calibrated
has been designed exactly to avoid the aforementioned pitfalls and to allow ReGenesees provide correct variance estimates of (functions of) calibration estimators, even if the survey weights have been calibrated externally by other software.
Argument weights.cal
identifies the externally calibrated weights of the units included in the sample. The data
variable referenced by weights.cal
must be numeric. Currently, only positive externally calibrated weights can be handled (see the dedicated section below).
Other arguments to ext.calibrated
derive either from function e.svydesign
or from function e.calibrate
. The former serve the purpose of passing the survey data and the corresponding sampling design metadata, the latter are meant to tell ext.calibrated
how the externally calibrated weights have been obtained.
From a methodological perspective, negative calibration weights are legitimate. However, owing to software implementation details whose modification would not be trivial, function ext.calibrated
is not yet able to cope with this case. Note that the problem is actually due to the external origin of the negative calibration weights. In fact, ReGenesees calibration and estimation facilities are entirely able to cope with possibly negative calibration weights, provided they were computed internally.
An object of class cal.analytic
, storing the original survey data plus all the sampling design and calibration metadata needed for proper variance estimation.
Exactly as ReGenesees's base functions e.svydesign
and e.calibrate
would do, ext.calibrated
too will wrap inside its return value a local copy of data
. As usual, this copy will be stored inside the variables
slot of the output list. As usual, again, the calibrated weights will be accessible by using the weights
function.
e.svydesign
to bind survey data and sampling design metadata, and e.calibrate
for calibrating survey weights within ReGenesees.
# Load data sbs data data(sbs) ######################################################################### # Simulate an external calibration procedure and compute some benchmark # # estimates and errors to test function ext.calibrated # ######################################################################### # Define a survey design sbsdes <- e.svydesign(data= sbs, ids= ~id, strata= ~strata, weights= ~weight, fpc= ~fpc) # Build a template for population totals pop <- pop.template(data= sbsdes, calmodel= ~y:nace.macro + emp.cl + emp.num - 1, partition= ~dom3) # Have a look at the template structure pop.desc(pop)#> # Data frame of known totals for a *partitioned* calibration task #> - Number of rows: 4 #> - Number of columns: 11 #> - Number of known totals: 40 #> #> # The data frame structure is as follows #> ## Column 1 identifies 4 *calibration domains* (one for each row) #> #> dom3 #> 1 A #> 2 B #> 3 C #> 4 D #> #> ## Columns 2-11 identify known totals organized into *3 BLOCKS* #> #> - BLOCK 1 ---------------------------------------------------------------- #> - Benchmark: Counts of emp.cl #> - Known totals: 20 #> - Auxiliary variables: 5 #> - Columns: 2-6 #> #> 2 3 4 5 6 #> aux "[6,9]" "(9,19]" "(19,49]" "(49,99]" "(99,Inf]" #> #> #> - BLOCK 2 ---------------------------------------------------------------- #> - Benchmark: Total of emp.num #> - Known totals: 4 #> - Auxiliary variables: 1 #> - Column: 7 #> #> 7 #> aux "emp.num" #> #> #> - BLOCK 3 ---------------------------------------------------------------- #> - Benchmark: Totals of y by nace.macro #> - Known totals: 16 #> - Auxiliary variables: 4 #> - Columns: 8-11 #> #> 8 9 10 11 #> aux "y:Agriculture" "y:Industry" "y:Commerce" "y:Services" #> #>#> #> # Coherence check between 'universe' and 'template': OK #># Calibrate sbscal <- e.calibrate(design= sbsdes, df.population= pop, calfun= "logit", bounds= c(0.8, 1.3), sigma2= ~ emp.num) # Compute benchmark estimates and errors (average value added per employee by # region) to be later compared with those obtained by using ext.calibrated benchmark <- svystatR(design= sbscal, num= ~va.imp2, den= ~emp.num, by= ~region) benchmark#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 0.9344696 #> Center Center 44.99534 1.5700976 #> South South 65.85808 2.8296036# Extract the 'externally' calibrated weights... w <- weights(sbscal) #...and add these 'externally' calibrated weights to the original survey data sbs.ext <- data.frame(sbs, w.ext = w) # NOTE: Now sbs.ext is just a data frame, without any knowledge of the # calibration metadata formerly stored inside sbscal (i.e. the object # calibrated by ReGenesees) ############################################################## # Let ReGenesees digest the 'externally' calibrated weights, # # then re-compute benchmark estimates and errors for testing # ############################################################## # Simply pass survey data along with sampling design and calibration model # metadata sbscal.ext <- ext.calibrated(data= sbs.ext, ids= ~id, strata= ~strata, weights= ~weight, fpc = ~fpc, weights.cal= ~w.ext, calmodel= ~y:nace.macro + emp.cl + emp.num - 1, partition= ~dom3, sigma2= ~emp.num) # Have a look at the output sbscal.ext#> Calibrated, Stratified Independent Unit Sampling Design #> - [664] strata #> - [6909] units #> #> Call: #> ext.calibrated(data = sbs.ext, ids = ~id, strata = ~strata, weights = ~weight, #> fpc = ~fpc, weights.cal = ~w.ext, calmodel = ~y:nace.macro + #> emp.cl + emp.num - 1, partition = ~dom3, sigma2 = ~emp.num)# Now re-compute benchmark estimates and errors by means of new object # ext.sbscal test <- svystatR(design= sbscal.ext, num= ~va.imp2, den= ~emp.num, by= ~region) test#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 0.9344696 #> Center Center 44.99534 1.5700976 #> South South 65.85808 2.8296036################################################################ # Compare benchmark estimates and errors to those derived from # # ext.calibrated return object # ################################################################ benchmark#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 0.9344696 #> Center Center 44.99534 1.5700976 #> South South 65.85808 2.8296036test#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 0.9344696 #> Center Center 44.99534 1.5700976 #> South South 65.85808 2.8296036# ...and they are identical, as it must be. # NOTE: All utility tools yield exactly the same results, e.g. identical(weights(sbscal), weights(sbscal.ext))#> [1] TRUE#> [1] TRUE########################################################################## # Show that the naive idea of directly passing the externally calibrated # # weights to e.svydesign does NOT work properly for variance estimation # ########################################################################## naive <- e.svydesign(data= sbs.ext, ids= ~id, strata= ~strata, weights= ~w.ext, fpc = ~fpc) # Estimated sampling errors derived by this naive design object... svystatR(design= naive, num= ~va.imp2, den= ~emp.num, by= ~region)#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 1.174274 #> Center Center 44.99534 1.639185 #> South South 65.85808 2.870172#...do NOT match benchmark values, overestimating them: benchmark#> region va.imp2/emp.num SE.va.imp2/emp.num #> North North 56.51349 0.9344696 #> Center Center 44.99534 1.5700976 #> South South 65.85808 2.8296036