A hint for range restricted calibration

Suggests a sound bounds value for which kottcalibrate is likely to converge.

bounds.hint(deskott, df.population,
    calmodel = if (inherits(df.population, "pop.totals"))
                   attr(df.population, "calmodel"),
    partition = if (inherits(df.population, "pop.totals"))
                    attr(df.population, "partition") else FALSE)

Arguments

deskott	Object of class `kott.design` containing the replicated survey data.
df.population	Data frame containing the known population totals for the auxiliary variables.
calmodel	Formula defining the linear structure of the calibration model.
partition	Formula specifying the variables that define the "calibration domains" for the model; `FALSE` (the default) implies no calibration domains.

Details

The function bounds.hint returns a bounds value for which kottcalibtrate is likely to converge. This interval is just a sound hint, not an exact result (see 'Note').

The mandatory argument deskott identifies the kott.design object on which the calibration problem is defined.

The mandatory argument df.population identifies the known totals data frame.

The argument calmodel symbolically defines the calibration model you want to use: it identifies the auxiliary variables and the constraints for the calibration problem. The deskott variables referenced by calmodel must be numeric or factor and must not contain any missing value (NA). The argument can be omitted provided df.population is an object of class pop.totals (see population.check).

The optional argument partition specifies the variables that define the calibration domains for the model. The default value (FALSE) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorised). The deskott variables referenced by partition (if any) must be factor and must not contain any missing value (NA). The argument can be omitted provided df.population is an object of class pop.totals (see population.check).

Value

A numeric vector of length 2, representing the suggested value for the bounds argument of kottcalibrate. The attributes of that vector store additional information, which can lead to better understand why a given calibration problem is (un)feasible (see 'Examples').

Note

Assessing the feasibility of an arbitrary calibration problem is not an easy task. The problem is even more difficult whenever additional "range restrictions" are imposed. Indeed, even if one assumes that the calibration constraints define a consistent system, one also has to choose the bounds such that the feasible region is non-empty.

One can argue that there must exist a minimun-length interval \(I=[L,U]\) such that, if it is covered by bounds, the specified calibration problem is feasible. Unfortunately in order to compute exactly that minimun-length interval \(I\) one should solve a big linear programming problem [Vanderhoeft 01]. As an alternative, a trial and error procedure has been frequently proposed [Deville et al 1993; Sautory 1993]: (i) start with a very large interval bounds.0; (ii) if convergence is achieved, shrink it so as to obtain a new inteval bounds.1; (iii) repeat until you get a sufficiently tight feasible interval bounds.n. The drawback is that this procedure can cost a lot of computer time since, for each choice of the bounds, the full calibration problem has to be solved.

A rather easy task is, on the contrary, the one of finding at least a given specific interval \(I^*=[L^*,U^*]\) such that, if it is not covered by bounds, the current calibration problem is surely unfeasible. This means that any feasible bounds value must necessarily contain the \(I^*\) interval. The function bounds.hint: (i) first identifies such an \(I^*\) interval (by computing the range of the ratios between known population totals and corresponding direct Horvitz-Thompson estimates), (ii) then builds a new interval \(I^{sugg}\) with same midpoint and double length. The latter is the suggested value for the bounds argument of kottcalibrate. The return value of bounds.hint should be understood as a useful starting guess for bounds, even though there is definitely no warranty that the calibration algorithm will actually converge.

References

Vanderhoeft, C. (2001) "Generalized Calibration at Statistic Belgium", Statistics Belgium Working Paper n. 3, http://www.statbel.fgov.be/studies/paper03_en.asp.

Deville, J.C., Sarndal, C.E. and Sautory, O. (1993) "Generalized Raking Procedures in Survey Sampling", Journal of the American Statistical Association, Vol. 88, No. 423, pp.1013-1020.

Sautory, O. (1993) "La macro CALMAR: Redressement d'un Echantillon par Calage sur Marges", Document de travail de la Direction des Statistiques Demographiques et Sociales, no. F9310.

Examples

# Load sample data (the only reason for fixing
# the RNG seed is to achieve reproducible examples)
data(data.examples)
set.seed(123)

# Creation of the object to be calibrated:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
      weights=~weight,nrg=15)

# Calibration (global solution) on the joint distribution
# of sex and marstat (totals in pop03). Get a hint for feasible bounds:
hint<-bounds.hint(kdes,pop03,~marstat:sex-1)
#> 
#> A starting suggestion: try to calibrate with bounds=c(0.899, 1.129)
#> 
#> Remark: this is just a hint, not an exact result
#> Feasible bounds for calibration problem must cover the interval [0.956, 1.071]
#> 

# Let's first verify if calibration converges with the suggested 
# value for the bounds argument (i.e. c(0.909, 1.062) ):
kdescal03<-kottcalibrate(deskott=kdes,df.population=pop03,
           calmodel=~marstat:sex-1,calfun="logit",bounds=hint)

# Now let's verify that calibration fails, if bounds don't cover
# the interval [0.947, 1.023]:
# NOT RUN {
kdescal03<-kottcalibrate(deskott=kdes,df.population=pop03,
           calmodel=~marstat:sex-1,calfun="logit",bounds=c(0.95, 1.03))
# }

# Calibration (iterative solution) on the totals for the quantitative
# variables x1, x2 and x3 in the subpopulations defined by the
# regcod variable (totals in pop04p): Get a hint for feasible bounds:
hint<-bounds.hint(kdes,pop04p,~x1+x2+x3-1,~regcod)
#> 
#> A starting suggestion: try to calibrate with bounds=c(0.038, 2.72)
#> 
#> Remark: this is just a hint, not an exact result
#> Feasible bounds for calibration problem must cover the interval [0.709, 2.049]
#> 

# Let's verify if calibration converges with the suggested 
# value for the bounds argument (i.e. c(0.133, 2.497) ):
kdescal04p<-kottcalibrate(deskott=kdes,df.population=pop04p,
            calmodel=~x1+x2+x3-1,partition=~regcod,calfun="logit",
            bounds=hint,aggregate.stage=2)

# Now let's verify that calibration fails, if bounds don't cover
# the interval [0.724, 1.906]:
# NOT RUN {
kdescal04p<-kottcalibrate(deskott=kdes,df.population=pop04p,
            calmodel=~x1+x2+x3-1,partition=~regcod,calfun="logit",
            bounds=c(0.71,1.89),aggregate.stage=2)
# }
# By analysing kottcal.status one understands that calibration
# failed due to the sub-task identified by replicate.12 and 
# regcod 6:
kottcal.status
#> $call
#> kottcalibrate(deskott = kdes, df.population = pop04p, calmodel = ~x1 + 
#>     x2 + x3 - 1, partition = ~regcod, calfun = "logit", bounds = hint, 
#>     aggregate.stage = 2)
#> 
#> $return.code
#>              6 7 10
#> original     0 0  0
#> replicate.1  0 0  0
#> replicate.2  0 0  0
#> replicate.3  0 0  0
#> replicate.4  0 0  0
#> replicate.5  0 0  0
#> replicate.6  0 0  0
#> replicate.7  0 0  0
#> replicate.8  0 0  0
#> replicate.9  0 0  0
#> replicate.10 0 0  0
#> replicate.11 0 0  0
#> replicate.12 0 0  0
#> replicate.13 0 0  0
#> replicate.14 0 0  0
#> replicate.15 0 0  0
#> 

# this is easily explained by inspectioning the "bounds"
# attribute of the bounds.hint output object:
hint
#> [1] 0.038 2.720
#> attr(,"star.interval")
#> [1] 0.7085247 2.0491790
#> attr(,"bounds")
#> attr(,"bounds")$call
#> bounds.hint(kdes, pop04p, ~x1 + x2 + x3 - 1, ~regcod)
#> 
#> attr(,"bounds")$lower
#>                      6         7        10
#> original     0.8045835 0.7735073 0.8987247
#> replicate.1  0.7443905 0.7530853 0.8052112
#> replicate.2  0.8692697 0.7212123 0.9761906
#> replicate.3  0.8202755 0.7743616 0.8669947
#> replicate.4  0.8175836 0.7633970 0.8639573
#> replicate.5  0.7598176 0.7085247 0.9005086
#> replicate.6  0.8380082 0.7850858 0.9016069
#> replicate.7  0.7831603 0.7501212 0.8892474
#> replicate.8  0.7577942 0.7527572 0.9105671
#> replicate.9  0.7803752 0.7705280 0.9600142
#> replicate.10 0.8982550 0.7813897 0.8809964
#> replicate.11 0.7340459 0.7687987 0.8885818
#> replicate.12 0.9029446 0.8027554 0.9701130
#> replicate.13 0.8579082 0.7359546 0.8810956
#> replicate.14 0.7635730 0.7884327 0.8419929
#> replicate.15 0.7915788 0.7716666 0.9854769
#> all          0.7340459 0.7085247 0.8052112
#> 
#> attr(,"bounds")$upper
#>                     6         7       10
#> original     1.534247 0.9326981 1.297280
#> replicate.1  1.549311 0.9464416 1.308133
#> replicate.2  1.526246 1.0000166 1.249711
#> replicate.3  1.585542 0.8927598 1.272591
#> replicate.4  1.558306 0.9737890 1.269757
#> replicate.5  1.481951 0.8543082 1.273368
#> replicate.6  1.490087 0.9053758 1.370347
#> replicate.7  2.049179 0.9615852 1.307158
#> replicate.8  1.299465 0.8923319 1.303436
#> replicate.9  1.356716 0.9831481 1.251063
#> replicate.10 1.972342 0.9286247 1.322551
#> replicate.11 1.457476 0.8923319 1.307297
#> replicate.12 1.498696 1.1089917 1.330477
#> replicate.13 1.497552 0.8899994 1.387637
#> replicate.14 1.497552 0.9740486 1.251564
#> replicate.15 1.497552 0.8476070 1.281982
#> all          2.049179 1.1089917 1.387637
#> 
#> attr(,"class")
#> [1] "bounds.hint" "numeric"    

# indeed the specified upper bound (1.89) was too low
# for replicate.12 and regcod 6