get.residuals.Rd
Computes (scaled) residuals of a set of interest variables w.r.t. the calibration model adopted to build a calibrated object.
get.residuals(cal.design, y, scale = c("no", "w", "d", "g"))
cal.design | Object of class |
---|---|
y | Formula defining the variables of interest. |
scale |
|
This function has been designed mainly for programmers or advanced users willing to build upon ReGenesees: typical users are not expected to feel much need of it.
The residuals of an interest variable w.r.t. the linear model defined by the auxiliary variables used for calibration play a central role in estimating the variance of Calibration Estimators (see, e.g., [Deville, 1999]). Notice that if object cal.design
has been generated by running a partitioned calibration task (see e.calibrate
), the residuals will be correctly computed using the different estimated regression coefficients pertaining to the different domains belonging to the partition.
The mandatory argument y
behaves exactly the same way as it does in function svystatTM
.
The scale
argument allows to scale the computed residuals by multiplying them by different factors. By default scale="no"
, that is unscaled residuals are returned. Value "w"
returns the residuals times the calibrated weights; value "d"
returns the residuals times the initial weights; finally, value "g"
returns the residuals times the g-weights (i.e. the ratios between calibrated and initial weights). Notice that the semantics of argument scale
are slightly modified when the input object cal.design
has been obtained by a multi-step calibration procedure (see Section 'Note' below).
A matrix of residuals.
If cal.design
has undergone k
subsequent calibration steps (with k >= 1
), the function will return the residuals computed w.r.t. the linear assisting model underlying the last (i.e. k
-th) calibration step.
If k >= 2
, the scale parameter will be interpreted as follows:
SCALE MEANING "no"........no scale; "w"........last calibration weight (i.e. at step k); "d"........second to last calibration weight (i.e. at step k - 1); "g"........ratio between last and second to last calibration weights.
Deville, J. C. (1999) “Variance Estimation for Complex Statistics and Estimators: Linearization and Residual Techniques.”. Survey Methodology 25: 193-203.
weights
to extract the weights from a design object, e.calibrate
for calibrating weights and g.range
to get the range of the g-weights.
################################################################## # Just some checks on the consistency of the numerical results # # obtained by ReGenesees with well known theoretical properties. # ################################################################## # Load sbs data: data(sbs) # Create a design object to be calibrated: sbsdes<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,fpc=~fpc) ################ ## Property 1 ## ################ ## If the calibration model has (implicitly or explicitly) an intercept ## the weighted sum of residuals must be zero. # Suppose you calibrate on enterprise counts inside areas, i.e. a calibration # model WITH intercept (though implicitly): # calmodel= ~ area - 1 # First, build and fill the known totals template: pop<-pop.template(sbsdes, calmodel= ~ area - 1) pop<-fill.template(pop, universe=sbs.frame)#> #> # Coherence check between 'universe' and 'template': OK #># Then, calibrate: sbscal<-e.calibrate(sbsdes, pop) # Now, get the residuals of any variable (e.g. y and emp.num) scaled with the # direct weights: de <- get.residuals(sbscal, ~ y + emp.num, scale="d") # Lastly, compute the column sums... colSums(de)#> y emp.num #> 9.149915e-09 6.385954e-10#...which is (numerically) zero, as it must be. ################ ## Property 2 ## ################ ## If the calibration model does not have (implicitly or explicitly) an ## intercept term the weighted sum of residuals is generally different from zero. # Suppose you calibrate on employees counts inside areas, i.e. a calibration # model WITHOUT intercept: # calmodel= ~ emp.num:area - 1 # First, build and fill the known totals template: pop<-pop.template(sbsdes, calmodel= ~ emp.num:area - 1) pop<-fill.template(pop, universe=sbs.frame)#> #> # Coherence check between 'universe' and 'template': OK #># Then, calibrate: sbscal<-e.calibrate(sbsdes, pop) # Now, get the residuals of any variable (e.g. y and region) scaled with the # direct weights: de <- get.residuals(sbscal, ~ y + region, scale="d") # Lastly, compute the column sums... colSums(de)#> y regionNorth regionCenter regionSouth #> 8247599.055 9292.902 3062.676 2933.356#...which is far from zero, as expected. ################ ## Property 3 ## ################ ## In the Taylor linearization approach, estimating the variance of a ## Calibration Estimator amounts to estimating the variance of the HT total ## of its linearized variable (i.e. the g-scaled residual), under the sampling ## design at hand. # Suppose you calibrate on the total number of employees and enterprises # inside the domains obtained by: # i) crossing nace.macro and region; # ii) crossing emp.cl and region; # First, build and fill the population totals template: pop<-pop.template(sbsdes, calmodel=~(emp.num+ent):(nace.macro+emp.cl)-1, partition=~region) pop<-fill.template(universe=sbs.frame,template=pop)#> #> # Coherence check between 'universe' and 'template': OK #># Then, calibrate: sbscal<-e.calibrate(sbsdes,pop) # Now, compute the linearized variable of the Calibration Estimator of the # total of any variable (e.g. va.imp2): z_va.imp2 <- get.residuals(sbscal, ~ va.imp2, scale="g") # Now, treat z_va.imp2 as an ordinary variable and compute the standard error # of its HT total: sbsdes<-des.addvars(sbsdes, z_va.imp2 = z_va.imp2) SE(svystatTM(sbsdes, ~z_va.imp2))#> SE.z_va.imp2 #> 1 1004846# Lastly, compute directly the standard error of the Calibration Estimator... SE(svystatTM(sbscal, ~va.imp2))#> SE.va.imp2 #> 1 1004846#...and they are identical, as it must be. # Obviously the same result would hold for domain estimates (e.g. the total # of va.imp2 for the "Agriculture" nace.macro). # Compute the linearized variable of the Calibration Estimator of the domain # total: z_va.imp2.Agr <- get.residuals(sbscal, ~ I(va.imp2*(nace.macro=="Agriculture")), scale="g") # Now, treat z_va.imp2.Agr as an ordinary variable and compute the standard # error of its HT total: sbsdes<-des.addvars(sbsdes, z_va.imp2.Agr = z_va.imp2.Agr) SE(svystatTM(sbsdes, ~z_va.imp2.Agr))#> SE.z_va.imp2.Agr #> 1 54820.87# Lastly, compute directly the standard error of the Calibration Estimator # by domain... SE(svystatTM(sbscal, ~va.imp2, ~nace.macro))#> SE.Total.va.imp2 #> Agriculture 54820.87 #> Industry 384453.64 #> Commerce 910938.64 #> Services 185327.76#...and the "Agriculture" SEs are identical, as it must be.