A Hint for Range Restricted Calibration

Suggests a sound bounds value for which e.calibrate is likely to converge.

bounds.hint(design, df.population,
    calmodel = if (inherits(df.population, "pop.totals"))
                   attr(df.population, "calmodel"),
    partition = if (inherits(df.population, "pop.totals"))
                    attr(df.population, "partition") else FALSE,
    msg = TRUE)

Arguments

design	Object of class `analytic` (or inheriting from it) containing survey data and sampling design metadata.
df.population	Data frame containing the known population totals for the auxiliary variables.
calmodel	Formula defining the linear structure of the calibration model.
partition	Formula specifying the variables that define the "calibration domains" for the model; `FALSE` (the default) implies no calibration domains.
msg	Enables printing of a summary description of the result (the default is `TRUE`).

Details

Function bounds.hint returns a bounds value for which e.calibtrate is likely to converge. This interval is just a sound hint, not an exact result (see ‘Note’).

The mandatory argument design identifies the analytic object on which the calibration problem is defined.

The mandatory argument df.population identifies the known totals data frame.

The argument calmodel symbolically defines the calibration model you want to use: it identifies the auxiliary variables and the constraints for the calibration problem. The design variables referenced by calmodel must be numeric or factor and must not contain any missing value (NA). The argument can be omitted provided df.population is an object of class pop.totals (see population.check).

The optional argument partition specifies the variables that define the calibration domains for the model. The default value (FALSE) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). The design variables referenced by partition (if any) must be factor and must not contain any missing value (NA). The argument can be omitted provided df.population is an object of class pop.totals (see population.check).

The optional argument msg enables/disables printing of a summary description of the achieved result.

Value

A numeric vector of length 2, representing the suggested value for the bounds argument of e.calibrate. The attributes of that vector store additional information, which can lead to better understand why a given calibration problem is (un)feasible (see ‘Examples’).

Note

Assessing the feasibility of an arbitrary calibration problem is not an easy task. The problem is even more difficult whenever additional “range restrictions” are imposed. Indeed, even if one assumes that the calibration constraints define a consistent system, one also has to choose the bounds such that the feasible region is non-empty.

One can argue that there must exist a minimum-length interval \(I=[L,U]\) such that, if it is covered by bounds, the specified calibration problem is feasible. Unfortunately in order to compute exactly that minimum-length interval \(I\) one should solve a big linear programming problem [Vanderhoeft 01]. As an alternative, a trial and error procedure has been frequently proposed [Deville et al. 1993; Sautory 1993]: (i) start with a very large interval bounds.0; (ii) if convergence is achieved, shrink it so as to obtain a new interval bounds.1; (iii) repeat until you get a sufficiently tight feasible interval bounds.n. The drawback is that this procedure can cost a lot of computer time since, for each choice of the bounds, the full calibration problem has to be solved.

However, when both the benchmark population totals and the corresponding Horvitz-Thompson estimates are all non-negative, it is easy to find at least a given specific interval \(I^*=[L^*,U^*]\) such that, if it is not covered by bounds, the current calibration problem is surely unfeasible. This means that any feasible bounds value must necessarily contain the \(I^*\) interval. Function bounds.hint: (i) first identifies such an \(I^*\) interval (by computing the range of the ratios between known population totals and corresponding direct Horvitz-Thompson estimates), (ii) then builds a new interval \(I^{sugg}\) with same midpoint and double length. The latter is the suggested value for the bounds argument of e.calibrate. The return value of bounds.hint should be understood as a useful starting guess for bounds, even though there is definitely no warranty that the calibration algorithm will actually converge.

References

Vanderhoeft, C. (2001) “Generalized Calibration at Statistic Belgium”, Statistics Belgium Working Paper n. 3.

Deville, J.C., Sarndal, C.E. and Sautory, O. (1993) “Generalized Raking Procedures in Survey Sampling”, Journal of the American Statistical Association, Vol. 88, No. 423, pp.1013-1020.

Sautory, O. (1993) “La macro CALMAR: Redressement d'un Echantillon par Calage sur Marges”, Document de travail de la Direction des Statistiques Demographiques et Sociales, no. F9310.

Examples

# Creation of the object to be calibrated:
data(data.examples)
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
     weights=~weight)

# Calibration (partitioned solution) on the marginal distribution
# of age in 5 classes (age5c) inside provinces (procod)
# (totals in pop06p). Get a hint for feasible bounds:
hint<-bounds.hint(des,pop06p,~age5c-1,~procod)
#> 
#> A starting suggestion: try to calibrate with bounds=c(0.219, 1.786)
#> 
#> Remark: this is just a hint, not an exact result
#> Feasible bounds for calibration problem must cover the interval [0.611, 1.394]
#> 

# Let's verify if calibration converges with the suggested 
# value for the bounds argument (i.e. c(0.219, 1.786) ):
descal06p<-e.calibrate(design=des,df.population=pop06p,
           calmodel=~age5c-1,partition=~procod,calfun="logit",
           bounds=hint,aggregate.stage=2)

# Now let's verify that calibration fails, if bounds don't cover
# the interval [0.611, 1.394]:
if (FALSE) {
descal06p<-e.calibrate(design=des,df.population=pop06p,
           calmodel=~age5c-1,partition=~procod,calfun="logit",
           bounds=c(0.62,1.50),aggregate.stage=2,force=FALSE)
}
# The warning message raised by e.calibrate tells that
# the population total of variable age5c5 (i.e. the fifth
# age class frequency) was not matched.

# By analysing ecal.status one understands that calibration
# failed due to the sub-task identified by procod 30:
ecal.status
#> $return.code
#>      8 9 10 11 30 31 32 54 55 93
#> code 0 0  0  0  0  0  0  0  0  0
#> 

# this is easily explained by inspecting the "bounds"
# attribute of the bounds.hint output object:
attr(hint,"bounds")
#> $lower
#>                  8        9        10        11        30        31       32
#> original 0.6773532 0.651573 0.8013544 0.8404024 0.6109341 0.6937631 1.002463
#> all      0.6773532 0.651573 0.8013544 0.8404024 0.6109341 0.6937631 1.002463
#>                 54        55        93
#> original 0.8647034 0.6833283 0.6906802
#> all      0.8647034 0.6833283 0.6906802
#> 
#> $upper
#>                 8        9       10       11       30       31       32
#> original 1.394315 1.313149 1.322646 1.278943 0.840914 1.389327 1.298084
#> all      1.394315 1.313149 1.322646 1.278943 0.840914 1.389327 1.298084
#>                54       55        93
#> original 1.226745 1.210841 0.8432462
#> all      1.226745 1.210841 0.8432462
#> 

# indeed the specified lower bound (0.62) was too high
# for procod 30, where instead a value ~0.61 was required.

# Recall that you can always "force" a calibration task that
# would not converge:
descal06p.forced<-e.calibrate(design=des,df.population=pop06p,
                  calmodel=~age5c-1,partition=~procod,calfun="logit",
                  bounds=c(0.62,1.50),aggregate.stage=2,force=TRUE)
#> Warning: Failed to converge: worst achieved epsilon= 0.0148268907563025 in 51 iterations (variable age5c5), see ecal.status.

# Notice, also, that forced sub-tasks can be tracked down by
# directly looking at ecal.status...
ecal.status
#> $return.code
#>      8 9 10 11 30 31 32 54 55 93
#> code 0 0  0  0  1  0  0  0  0  0
#> 
#> $fail.diagnostics
#> $fail.diagnostics$`30`
#>   Variable Population.Total Achieved.Estimate Difference Relative.Difference
#> 5   age5c5             1189          1206.644     17.644          0.01482689
#> 
#> 

# ...or by using function check.cal:
check.cal(descal06p.forced)
#> Calibration Constraints missed (at tolerance level epsilon = 1e-07): 1 out of 50
#> - Summary of mismatches: 
#> 
#> $return.code
#>      8 9 10 11 30 31 32 54 55 93
#> code 0 0  0  0  1  0  0  0  0  0
#> 
#> $fail.diagnostics
#> $fail.diagnostics$`30`
#>   Variable Population.Total Achieved.Estimate Difference Relative.Difference
#> 5   age5c5             1189          1206.644     17.644          0.01482689
#> 
#> 
#>

Arguments

Details

Value

Note

References

See also

Examples

Contents

Author