Set, Reset or Switch Off Contrasts for Calibration Models

These functions control the way ReGenesees translates a symbolic calibration model (as specified by the calmodel formula in e.calibrate, pop.template, fill.template, aux.estimates, ...) to its numeric encoding (i.e. the model-matrix used by the internal algorithms to perform actual computations).

contrasts.RG()
contrasts.off()
contrasts.reset()
contr.off(n, base = 1, contrasts = TRUE, sparse = FALSE)

Arguments

n	Formally as in function `contr.treatment` (see ‘Details’).
base	Formally as in function `contr.treatment` (see ‘Details’).
contrasts	Fictitious, but formally as in function `contr.treatment`. (see ‘Details’)
sparse	Formally as in function `contr.treatment`. (see ‘Details’)

Details

All the calibration facilities in package ReGenesees transform symbolic calibration models (as specified by the user via calmodel) into numeric model-matrices. Factor variables occurring in calmodel play a special role in such transformations, as the encoding of a factor can (and, by default, do) depend on the structure of the formula in which it occurs. The ReGenesees functions documented below control the way factor levels are translated into auxiliary variables and mapped to columns of population totals data frames. The underlying technical tools are contrasts handling functions (see Section 'Technical Remarks and Warnings' for further details).

Under the calibration perspective, ordered and unordered factors appearing in calmodel must be treated the same way. This obvious constraint defines the ReGenesees default for contrasts handling. Such a default is silently set when loading the package. Moreover, you can set it also by calling contrasts.RG(). As can be understood by reading Section 'Technical Remarks and Warnings' below, the default setup can be seen as “efficient-but-slightly-risky”.

A call to contrasts.off() simply disables all contrasts and imposes a complete dummy coding of factors. Under this setup, all levels of factors occurring in calmodel generate a distinct model-matrix column, even if some of these columns can be linearly dependent. To be very concise, the contrasts.off() setup can be seen as “safe-but-less-efficient” as compared to the default one (read Section 'Technical Remarks and Warnings' for more details).

Function contr.off is not meant to be called directly by users: it serves only the purpose of enabling the contrasts.off() setup.

A call to contrasts.reset() restores R factory-fresh defaults for contrasts (which do distinguish ordered and unordered factors). Users may want to use this function after having completed a ReGenesees session, e.g. before switching to other R functions relying on contrasts (such as lm, glm, ...).

Technical Remarks and Warnings

“[...] the corner cases of model.matrix and friends is some of the more impenetrable code in the R sources.”
Peter Dalgaard

Contrasts handling functions tell R how to encode the model-matrix associated to a given model-formula on specific data (see, e.g., contr.treatment, contrasts, model.matrix, formula, and references therein). More specifically, contrasts control the way factor-terms and interaction-terms occurring in formulae get actually represented in the model matrix. For instance, R (by default) avoids the complete dummy coding of a factor whenever it is able to understand, on the basis of the structure of the model-formula, that some of the factor levels would generate linearly dependent (i.e. redundant) columns in the model-matrix (see Section ‘Examples’).

The usage of contrasts to build smaller, full-rank calibration model-matrices would be a good opportunity for ReGenesees, provided it comes without any information loss. Indeed, smaller model-matrices mean less population totals to be provided by users, and higher efficiency in computations.

Unfortunately, few controversial cases have been signalled in which R ability to "simplify" a model-matrix on the basis of the structure of the related model-formula seems to lead to strange, unexpected results (see, e.g., this R-help thread). No matter whether such R behaviour is or not an actual bug with respect to its impact on R linear model fitting or ANOVA facilities, it surely represents a concern for ReGenesees with respect to calibration (see Section ‘Examples’). The risk is the following: there could be rare cases in which exploiting R contrasts handling functions inside ReGenesees ends up with a wrong (i.e. incomplete) population totals template, and (eventually) with wrong calibration results.

Though one could adopt several ad-hoc countermeasures to sterilize the risk described above while still taking advantage of contrasts (see Section ‘Examples’), the choice of completely disabling contrasts via contrasts.off() would result in a 100% safety guarantee. If computational efficiency is not a serious concern for you, switching off contrasts may determine the best ReGenesees setup for your analyses.

References

Chambers, J. M. and Hastie, T. J. (1992) Statistical models. Chapter 2 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.

"Why does the order of terms in a formula translate into different models/model matrices?", R-help thread.

Examples

######################
# Easy things first: #
######################

  # 1) When ReGenesees is loaded, its standard way of handling contrasts
  #    (i.e. no ordered-unordered factor distinction) is silently set:
options("contrasts")
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#> 

  # 2) To switch off contrasts (i.e. apply always dummy coding to factors),
  #    simply type:
contrasts.off()
#> 
#> # Contrasts treatment has been switched off:
#> 
#> $contrasts
#>   unordered     ordered 
#> "contr.off" "contr.off" 
#> 

  # 3) To restore R factory-fresh defaults for contrasts, simply type:
contrasts.reset()
#> 
#> # Factory-fresh defaults for contrasts treatment have been restored:
#> 
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment"      "contr.poly" 
#> 

  # 4) To switch on again standard ReGenesees contrasts, simply type:
contrasts.RG()
#> 
#> # Standard ReGenesees contrasts treatment has been set:
#> 
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#> 


#############################################################
# A simple calibration example to understand the effects of #
# switching off contrasts.                                  #
#############################################################

# Load sbs data:
data(sbs)

# Create a design object:
sbsdes<-e.svydesign(data=sbs,ids=~id,strata=~strata,weights=~weight,fpc=~fpc)

# Suppose you want to calibrate on the marginals of 'region' (a factor
# with 3 levels: "North", "Center", and "South") and 'dom3' (a factor
# with 4 levels: "A", "B", "C", and "D").
# Let's see how things go under the 'contrast on' (default) and 'contrasts off'
# setups:

  ########################################
  # 1) ReGenesees default: contrasts ON. #
  ########################################
  # As you see contrasts are ON:
  options("contrasts")
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#> 

  # Build and fill the population totals template:
  temp1<-pop.template(data=sbsdes,calmodel=~region+dom3-1)
  pop1<-fill.template(universe=sbs.frame,template=temp1)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 

  # Now inspect the obtained known totals data.frame:
  pop1
#>   regionNorth regionCenter regionSouth dom3B dom3C dom3D
#> 1       10374         3541        3403  3459  5144  7020

  # As you see: (i) it has only 6 columns, and (ii) the "A" level of
  # factor 'dom3' is missing. This is because contrasts are ON, so that
  # R is able to understand that only 6 out of the 3 + 4 marginal counts
  # are actually independent. Indeed, the "A" counts...
  sum(sbs.frame$dom3=="A")
#> [1] 1695

  # ...are actually redundant, since they can be deduced by pop1:
  sum(pop1[,1:3])-sum(pop1[,4:6])
#> [1] 1695

  # Now calibrate:
  cal1<-e.calibrate(sbsdes,pop1)


  ##################################################
  # 2) Switch OFF contrasts: dummy coding for all! #
  ##################################################
  # To switch off contrasts simply call:
  contrasts.off()
#> 
#> # Contrasts treatment has been switched off:
#> 
#> $contrasts
#>   unordered     ordered 
#> "contr.off" "contr.off" 
#> 

  # Build and fill the population totals template:
  temp2<-pop.template(data=sbsdes,calmodel=~region+dom3-1)
  pop2<-fill.template(universe=sbs.frame,template=temp2)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 

  # Now inspect the obtained known totals data.frame:
  pop2
#>   regionNorth regionCenter regionSouth dom3A dom3B dom3C dom3D
#> 1       10374         3541        3403  1695  3459  5144  7020

  # As you see: (1) it has now 7 columns, and (2) the "A" level of factor
  # 'dom3' has been resurrected. This is because contrasts are OFF,
  # so that each level of factors in calmodel are coded to dummies.

  # Now calibrate. Since only 6 out of 7 dummy auxiliary variables are
  # actually independent, the model.matrix computed by e.calibrate will not be
  # full-rank. As a consequence, e.calibrate would use the Moore-Penrose
  # generalized inverse (in practice, this could depend on the machine R
  # is running on):
  cal2<-e.calibrate(sbsdes,pop2)

  # Compare the calibration weights generated under setups 1) and 2):
  all.equal(weights(cal2),weights(cal1))
#> [1] TRUE

  # Lastly set back contrasts to ReGenesees default:
  contrasts.RG()
#> 
#> # Standard ReGenesees contrasts treatment has been set:
#> 
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#> 


#############################################
# Weird results, risks and countermeasures. #
# ("When the going gets tough...")          #
#############################################

# Suppose you want to calibrate on: (A) the joint distribution of 'region' (a
# factor with 3 levels: "North", "Center", and "South") and 'nace.macro' (a
# factor with 4 levels: "Agriculture", "Industry", "Commerce", and "Services")
# and, at the same time, on (B) the total number of employees ('emp.num', a
# numeric variable) by 'nace.macro'.
#
# You rightly expect that 3*4 + 4 = 16 population totals are needed for such a
# calibration task. Indeed, knowing the enterprise counts for the 3*4 cells of
# the joint distribution (A) doesn't tell anything on the number of employees
# working in the 4 nace macrosectors (B), and vice-versa.
#
# Moreover, you might expect that calibration models:
  # (i)  calmodel = ~region:nace.macro  + emp.num:nace.macro - 1
  # (ii) calmodel = ~emp.num:nace.macro + region:nace.macro  - 1
#  
# should produce the same results.
# Unfortunately, WHEN CONTRASTS ARE ON, this is not the case: only model (i)
# leads to the expected, right results. Let's see.

  ###########################################
  # A strange result when contrasts are ON: #
  # the order of terms in calmodel matters! #
  ###########################################
  # As you see contrasts are ON:
  options("contrasts")
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#> 

  # Start with (i) calmodel = ~region:nace.macro + emp.num:nace.macro - 1
  # Build and fill the population totals template:
  temp1<-pop.template(data=sbsdes,~region:nace.macro+emp.num:nace.macro-1)
  pop1<-fill.template(universe=sbs.frame,template=temp1)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 

  # Now inspect the obtained known totals data.frame:
  pop1
#>   regionNorth:nace.macroAgriculture regionCenter:nace.macroAgriculture
#> 1                               283                                 61
#>   regionSouth:nace.macroAgriculture regionNorth:nace.macroIndustry
#> 1                               114                           5080
#>   regionCenter:nace.macroIndustry regionSouth:nace.macroIndustry
#> 1                            2290                           1925
#>   regionNorth:nace.macroCommerce regionCenter:nace.macroCommerce
#> 1                           1902                             554
#>   regionSouth:nace.macroCommerce regionNorth:nace.macroServices
#> 1                            604                           3109
#>   regionCenter:nace.macroServices regionSouth:nace.macroServices
#> 1                             636                            760
#>   nace.macroAgriculture:emp.num nace.macroIndustry:emp.num
#> 1                         17168                     579580
#>   nace.macroCommerce:emp.num nace.macroServices:emp.num
#> 1                      69525                     318121

  # and verify it stores the right, expected number of totals (i.e. 16):
  dim(pop1)
#> [1]  1 16

  # Now calibrate:
  cal1<-e.calibrate(sbsdes,pop1)


  # Now compare with (ii) calmodel = ~emp.num:nace.macro + region:nace.macro - 1
  # Build and fill the population totals template:
  temp2<-pop.template(data=sbsdes,~emp.num:nace.macro+region:nace.macro-1)
  pop2<-fill.template(universe=sbs.frame,template=temp2)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 

  # First check if it stores the right, expected number of totals (i.e. 16):
  dim(pop2)
#> [1]  1 12

  # Apparently 4 totals are missing; let's inspect the known totals data.frame
  # to understand which ones:
  pop2
#>   emp.num:nace.macroAgriculture emp.num:nace.macroIndustry
#> 1                         17168                     579580
#>   emp.num:nace.macroCommerce emp.num:nace.macroServices
#> 1                      69525                     318121
#>   nace.macroAgriculture:regionCenter nace.macroIndustry:regionCenter
#> 1                                 61                            2290
#>   nace.macroCommerce:regionCenter nace.macroServices:regionCenter
#> 1                             554                             636
#>   nace.macroAgriculture:regionSouth nace.macroIndustry:regionSouth
#> 1                               114                           1925
#>   nace.macroCommerce:regionSouth nace.macroServices:regionSouth
#> 1                            604                            760

  # Thus we are missing the 4 'nace.macro' totals for 'region' level "North".
  # Everything goes as if R contrasts functions mistakenly treated the term
  # emp.num:nace.macro as a factor-factor interaction (i.e. a 2 way joint
  # distribution), which would have justified to eliminate the 4 missing totals
  # as redundant.

  # Notice that calibrating on pop2 would generate wrong results...
  cal2<-e.calibrate(sbsdes,pop2)

  # ...indeed the 4 estimates of 'nace.macro' for 'region' level "North" are not
  # actually calibrated (look at the magnitude of SE estimates):
  svystatTM(cal2,~region,~nace.macro)
#>              nace.macro Total.regionNorth Total.regionCenter Total.regionSouth
#> Agriculture Agriculture          283.3861                 61               114
#> Industry       Industry         5078.5898               2290              1925
#> Commerce       Commerce         1902.6949                554               604
#> Services       Services         3109.4091                636               760
#>             SE.Total.regionNorth SE.Total.regionCenter SE.Total.regionSouth
#> Agriculture            0.2837772          1.048106e-14         2.898936e-14
#> Industry               0.8421128          5.905246e-13         7.752573e-14
#> Commerce               1.3567159          1.164382e-14         1.732362e-14
#> Services               0.3829344          4.196158e-14         6.501151e-15


  ################################################################
  # A possible countermeasure (still working with contrasts ON). #
  ################################################################
  # Empirical evidence tells that the weird case above is extremely rare
  # and that it manifests whenever a numeric (say X) and a factor (say F) both
  # interact with the same factor (say D), i.e. calmodel=~(X+F):D-1.
  #
  # The risky order-dependent nature of such models can be sterilized (while
  # still taking advantage of contrasts-driven simplifications for large,
  # complex calibrations) by using a numeric variable with values 1 for
  # all sample units.
  #
  # For instance, one could use variable 'ent' in the sbs data.frame, to
  # handle the (A) part of the calibration constraints. Indeed you may easily
  # verify that both the calmodel formulae below:
  # (i)  calmodel = ~ent:region:nace.macro  + emp.num:nace.macro - 1
  # (ii) calmodel = ~emp.num:nace.macro + ent:region:nace.macro  - 1
  #
  # produce exactly the same, right results.


  ##################################################################
  # THE ULTIMATE, 100% SAFE, COUNTERMEASURE: switch contrasts OFF! #
  ##################################################################
  # No contrasts means no model-matrix simplifications at all, hence
  # also no unwanted, wrong simplifications. Let's see:

  # To switch off contrasts simply call:
  contrasts.off()
#> 
#> # Contrasts treatment has been switched off:
#> 
#> $contrasts
#>   unordered     ordered 
#> "contr.off" "contr.off" 
#> 

  # Compare again, with contrasts OFF, the calibration models:
  # (i)  calmodel = ~region:nace.macro  + emp.num:nace.macro - 1
  # (ii) calmodel = ~emp.num:nace.macro + region:nace.macro  - 1

  # Build and fill the population totals templates:
  temp1<-pop.template(data=sbsdes,~region:nace.macro+emp.num:nace.macro-1)
  pop1<-fill.template(universe=sbs.frame,template=temp1)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 
  pop1
#>   regionNorth:nace.macroAgriculture regionCenter:nace.macroAgriculture
#> 1                               283                                 61
#>   regionSouth:nace.macroAgriculture regionNorth:nace.macroIndustry
#> 1                               114                           5080
#>   regionCenter:nace.macroIndustry regionSouth:nace.macroIndustry
#> 1                            2290                           1925
#>   regionNorth:nace.macroCommerce regionCenter:nace.macroCommerce
#> 1                           1902                             554
#>   regionSouth:nace.macroCommerce regionNorth:nace.macroServices
#> 1                            604                           3109
#>   regionCenter:nace.macroServices regionSouth:nace.macroServices
#> 1                             636                            760
#>   nace.macroAgriculture:emp.num nace.macroIndustry:emp.num
#> 1                         17168                     579580
#>   nace.macroCommerce:emp.num nace.macroServices:emp.num
#> 1                      69525                     318121

  temp2<-pop.template(data=sbsdes,~emp.num:nace.macro+region:nace.macro-1)
  pop2<-fill.template(universe=sbs.frame,template=temp2)
#> 
#> # Coherence check between 'universe' and 'template': OK
#> 
  pop2
#>   emp.num:nace.macroAgriculture emp.num:nace.macroIndustry
#> 1                         17168                     579580
#>   emp.num:nace.macroCommerce emp.num:nace.macroServices
#> 1                      69525                     318121
#>   nace.macroAgriculture:regionNorth nace.macroIndustry:regionNorth
#> 1                               283                           5080
#>   nace.macroCommerce:regionNorth nace.macroServices:regionNorth
#> 1                           1902                           3109
#>   nace.macroAgriculture:regionCenter nace.macroIndustry:regionCenter
#> 1                                 61                            2290
#>   nace.macroCommerce:regionCenter nace.macroServices:regionCenter
#> 1                             554                             636
#>   nace.macroAgriculture:regionSouth nace.macroIndustry:regionSouth
#> 1                               114                           1925
#>   nace.macroCommerce:regionSouth nace.macroServices:regionSouth
#> 1                            604                            760

  # Verify they store the same, right number of totals (i.e. 16):
  dim(pop1)
#> [1]  1 16
  dim(pop2)
#> [1]  1 16

  # Verify they lead to right calibrated objects...
  cal1<-e.calibrate(sbsdes,pop1)
  cal2<-e.calibrate(sbsdes,pop2)

  # ...with the same calibrated weights:
  all.equal(weights(cal2),weights(cal1))
#> [1] TRUE

  # Lastly set back contrasts to ReGenesees default:
  contrasts.RG()
#> 
#> # Standard ReGenesees contrasts treatment has been set:
#> 
#> $contrasts
#>         unordered           ordered 
#> "contr.treatment" "contr.treatment" 
#>

Set, Reset or Switch Off Contrasts for Calibration Models

Arguments

Details

Technical Remarks and Warnings

References

See also

Examples

Contents

Author