bounds.hint.Rd
Suggests a sound bounds
value for which e.calibrate
is likely to converge.
bounds.hint(design, df.population, calmodel = if (inherits(df.population, "pop.totals")) attr(df.population, "calmodel"), partition = if (inherits(df.population, "pop.totals")) attr(df.population, "partition") else FALSE, msg = TRUE)
design | Object of class |
---|---|
df.population | Data frame containing the known population totals for the auxiliary variables. |
calmodel | Formula defining the linear structure of the calibration model. |
partition | Formula specifying the variables that define the "calibration domains" for the model; |
msg | Enables printing of a summary description of the result (the default is |
Function bounds.hint
returns a bounds
value for which e.calibtrate
is likely to converge. This interval is just a sound hint, not an exact result (see ‘Note’).
The mandatory argument design
identifies the analytic
object on which the calibration problem is defined.
The mandatory argument df.population
identifies the known totals data frame.
The argument calmodel
symbolically defines the calibration model you want to use: it identifies the auxiliary variables and the constraints for the calibration problem. The design
variables referenced by calmodel
must be numeric
or factor
and must not contain any missing value (NA
). The argument can be omitted provided df.population
is an object of class pop.totals
(see population.check
).
The optional argument partition
specifies the variables that define the calibration domains for the model. The default value (FALSE
) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). The design
variables referenced by partition
(if any) must be factor
and must not contain any missing value (NA
). The argument can be omitted provided df.population
is an object of class pop.totals
(see population.check
).
The optional argument msg
enables/disables printing of a summary description of the achieved result.
A numeric vector of length 2, representing the suggested value for the bounds
argument of e.calibrate
. The attributes of that vector store additional information, which can lead to better understand why a given calibration problem is (un)feasible (see ‘Examples’).
Assessing the feasibility of an arbitrary calibration problem is not an easy task. The problem is even more difficult whenever additional “range restrictions” are imposed. Indeed, even if one assumes that the calibration constraints define a consistent system, one also has to choose the bounds
such that the feasible region is non-empty.
One can argue that there must exist a minimum-length interval \(I=[L,U]\) such that, if it is covered by bounds
, the specified calibration problem is feasible. Unfortunately in order to compute exactly that minimum-length interval \(I\) one should solve a big linear programming problem [Vanderhoeft 01]. As an alternative, a trial and error procedure has been frequently proposed [Deville et al. 1993; Sautory 1993]: (i) start with a very large interval bounds.0
; (ii) if convergence is achieved, shrink it so as to obtain a new interval bounds.1
; (iii) repeat until you get a sufficiently tight feasible interval bounds.n
. The drawback is that this procedure can cost a lot of computer time since, for each choice of the bounds
, the full calibration problem has to be solved.
However, when both the benchmark population totals and the corresponding Horvitz-Thompson estimates are all non-negative, it is easy to find at least a given specific interval \(I^*=[L^*,U^*]\) such that, if it is not covered by bounds
, the current calibration problem is surely unfeasible. This means that any feasible bounds
value must necessarily contain the \(I^*\) interval. Function bounds.hint
: (i) first identifies such an \(I^*\) interval (by computing the range of the ratios between known population totals and corresponding direct Horvitz-Thompson estimates), (ii) then builds a new interval \(I^{sugg}\) with same midpoint and double length. The latter is the suggested value for the bounds
argument of e.calibrate
. The return value of bounds.hint
should be understood as a useful starting guess for bounds
, even though there is definitely no warranty that the calibration algorithm will actually converge.
Vanderhoeft, C. (2001) “Generalized Calibration at Statistic Belgium”, Statistics Belgium Working Paper n. 3.
Deville, J.C., Sarndal, C.E. and Sautory, O. (1993) “Generalized Raking Procedures in Survey Sampling”, Journal of the American Statistical Association, Vol. 88, No. 423, pp.1013-1020.
Sautory, O. (1993) “La macro CALMAR: Redressement d'un Echantillon par Calage sur Marges”, Document de travail de la Direction des Statistiques Demographiques et Sociales, no. F9310.
e.calibrate
for calibrating weights, pop.template
for constructing known totals data frames in compliance with the standard required by e.calibrate
, population.check
to check that the known totals data frame satisfies that standard, g.range
to compute the range of the obtained g-weights, and check.cal
to check if calibration constraints have been fulfilled.
# Creation of the object to be calibrated: data(data.examples) des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Calibration (partitioned solution) on the marginal distribution # of age in 5 classes (age5c) inside provinces (procod) # (totals in pop06p). Get a hint for feasible bounds: hint<-bounds.hint(des,pop06p,~age5c-1,~procod)#> #> A starting suggestion: try to calibrate with bounds=c(0.219, 1.786) #> #> Remark: this is just a hint, not an exact result #> Feasible bounds for calibration problem must cover the interval [0.611, 1.394] #># Let's verify if calibration converges with the suggested # value for the bounds argument (i.e. c(0.219, 1.786) ): descal06p<-e.calibrate(design=des,df.population=pop06p, calmodel=~age5c-1,partition=~procod,calfun="logit", bounds=hint,aggregate.stage=2) # Now let's verify that calibration fails, if bounds don't cover # the interval [0.611, 1.394]: if (FALSE) { descal06p<-e.calibrate(design=des,df.population=pop06p, calmodel=~age5c-1,partition=~procod,calfun="logit", bounds=c(0.62,1.50),aggregate.stage=2,force=FALSE) } # The warning message raised by e.calibrate tells that # the population total of variable age5c5 (i.e. the fifth # age class frequency) was not matched. # By analysing ecal.status one understands that calibration # failed due to the sub-task identified by procod 30: ecal.status#> $return.code #> 8 9 10 11 30 31 32 54 55 93 #> code 0 0 0 0 0 0 0 0 0 0 #># this is easily explained by inspecting the "bounds" # attribute of the bounds.hint output object: attr(hint,"bounds")#> $lower #> 8 9 10 11 30 31 32 #> original 0.6773532 0.651573 0.8013544 0.8404024 0.6109341 0.6937631 1.002463 #> all 0.6773532 0.651573 0.8013544 0.8404024 0.6109341 0.6937631 1.002463 #> 54 55 93 #> original 0.8647034 0.6833283 0.6906802 #> all 0.8647034 0.6833283 0.6906802 #> #> $upper #> 8 9 10 11 30 31 32 #> original 1.394315 1.313149 1.322646 1.278943 0.840914 1.389327 1.298084 #> all 1.394315 1.313149 1.322646 1.278943 0.840914 1.389327 1.298084 #> 54 55 93 #> original 1.226745 1.210841 0.8432462 #> all 1.226745 1.210841 0.8432462 #># indeed the specified lower bound (0.62) was too high # for procod 30, where instead a value ~0.61 was required. # Recall that you can always "force" a calibration task that # would not converge: descal06p.forced<-e.calibrate(design=des,df.population=pop06p, calmodel=~age5c-1,partition=~procod,calfun="logit", bounds=c(0.62,1.50),aggregate.stage=2,force=TRUE)#> Warning: Failed to converge: worst achieved epsilon= 0.0148268907563025 in 51 iterations (variable age5c5), see ecal.status.# Notice, also, that forced sub-tasks can be tracked down by # directly looking at ecal.status... ecal.status#> $return.code #> 8 9 10 11 30 31 32 54 55 93 #> code 0 0 0 0 1 0 0 0 0 0 #> #> $fail.diagnostics #> $fail.diagnostics$`30` #> Variable Population.Total Achieved.Estimate Difference Relative.Difference #> 5 age5c5 1189 1206.644 17.644 0.01482689 #> #>#> Calibration Constraints missed (at tolerance level epsilon = 1e-07): 1 out of 50 #> - Summary of mismatches: #> #> $return.code #> 8 9 10 11 30 31 32 54 55 93 #> code 0 0 0 0 1 0 0 0 0 0 #> #> $fail.diagnostics #> $fail.diagnostics$`30` #> Variable Population.Total Achieved.Estimate Difference Relative.Difference #> 5 age5c5 1189 1206.644 17.644 0.01482689 #> #> #>