pop.template.Rd
Constructs a “template” data frame to store known population totals for a calibration problem.
pop.template(data, calmodel, partition = FALSE)
data | Data frame of survey data (or an object inheriting from class |
---|---|
calmodel | Formula defining the linear structure of the calibration model. |
partition | Formula specifying the variables that define the "calibration domains" for the model. |
This function creates an object of class pop.totals
. A pop.totals
object is made up by the union of a data frame (whose structure conforms to the standard required by e.calibrate
for the known totals) and the metadata describing the calibration problem.
The mandatory argument data
must identify the survey data frame on which the calibration problem is defined (or, as an alternative, an analytic
object built upon that data frame). Should empty levels be present in any factor variable belonging to data
, they would be dropped.
The mandatory argument calmodel
symbolically defines the calibration model you intend to use: it identifies the auxiliary variables and the constraints for the calibration problem. The data
variables referenced by calmodel
must be numeric
or factor
and must not contain any missing value (NA
).
The optional argument partition
specifies the variables that define the calibration domains for the model. The default value (FALSE
) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). If a formula is passed through the partition
argument the program checks that calmodel
actually describes a "reduced model", that is it does not reference any of the partition variables; if this is not the case, the program stops and prints an error message. Notice that a formula like by=~D1+D2
will be automatically translated into the factor-crossing formula by=~D1:D2
. The data
variables referenced by partition
(if any) must be factor
and must not contain any missing value (NA
). Note that, if the partition
formula involves two or more factors, their crossed levels will be ordered according to operator :
(that is, those from the rightmost variable will vary fastest).
An object of class pop.totals
. The data frame it contains is a “template” in the sense that all the known totals it must be able to store are missing (NA
). However, this data frame has a structure that complies with the standard required by e.calibrate
(provided the latter is invoked with the same calmodel
and partition
values used to create the template).
The operation of filling the template's NA
s with the actual values of the corresponding population totals has, obviously, to be done by the user. If the user has access to a “sampling frame” (that is a data frame containing the complete list of the units belonging to the target population along with the corresponding values of the auxiliary variables), then he can exploit function fill.template
to automatically fill the template.
The pop.totals
class is a specialization of the data.frame
class; this means that an object built by pop.template
inherits from the data.frame
class and you can use on it every method defined on that class.
e.calibrate
for calibrating weights, population.check
to check that the known totals data frame satisfies the standard required by e.calibrate
, pop.desc
to provide a natural language description of the template structure, and fill.template
to automatically fill the template when a sampling frame is available.
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.
# Creation of population totals template data frames for different # calibration problems (if the calibration models can be factorized # both a global and a partitioned solution are given): data(data.examples) # 1) Calibration on the total number of units in the population: pop.template(data=example,calmodel=~1)#> (Intercept) #> 1 NA# 2) Calibration on the total number of units in the population # and on the marginal distribution of marstat (notice that the # total for the first level "married" of the marstat factor # variable is missing because it can be deduced from # the remaining totals): pop.template(data=example,calmodel=~marstat)#> (Intercept) marstatunmarried marstatwidowed #> 1 NA NA NA# 3) Calibration on the marginal distribution of marstat (you # must explicitly remove the intercept term in the # calibration model adding -1 to the calmodel formula): pop.template(data=example,calmodel=~marstat-1)#> marstatmarried marstatunmarried marstatwidowed #> 1 NA NA NA# 4) Calibration (global solution) on the joint distribution of sex # and marstat: pop.template(data=example,calmodel=~sex:marstat-1)#> sexf:marstatmarried sexm:marstatmarried sexf:marstatunmarried #> 1 NA NA NA #> sexm:marstatunmarried sexf:marstatwidowed sexm:marstatwidowed #> 1 NA NA NA# 4.1) Calibration (partitioned solution) on the joint distribution # of sex and marstat: # 4.1.1) Using sex to define calibration domains: pop.template(data=example,calmodel=~marstat-1,partition=~sex)#> sex marstatmarried marstatunmarried marstatwidowed #> 1 f NA NA NA #> 2 m NA NA NA# 4.1.2) Using marstat to define calibration domains: pop.template(data=example,calmodel=~sex-1,partition=~marstat)#> marstat sexf sexm #> 1 married NA NA #> 2 unmarried NA NA #> 3 widowed NA NA# 4.1.3) Using sex and marstat to define calibration domains: pop.template(data=example,calmodel=~1,partition=~sex:marstat)#> sex marstat (Intercept) #> 1 f married NA #> 2 f unmarried NA #> 3 f widowed NA #> 4 m married NA #> 5 m unmarried NA #> 6 m widowed NA# 5) Calibration (global solution) on the total for the quantitative # variable x1 and on the marginal distribution of the qualitative # variable age5c, in the subpopulations defined by crossing sex # and marstat: pop.template(data=example,calmodel=~(age5c+x1-1):sex:marstat)#> age5c1:sexf:marstatmarried age5c2:sexf:marstatmarried #> 1 NA NA #> age5c3:sexf:marstatmarried age5c4:sexf:marstatmarried #> 1 NA NA #> age5c5:sexf:marstatmarried age5c1:sexm:marstatmarried #> 1 NA NA #> age5c2:sexm:marstatmarried age5c3:sexm:marstatmarried #> 1 NA NA #> age5c4:sexm:marstatmarried age5c5:sexm:marstatmarried #> 1 NA NA #> age5c1:sexf:marstatunmarried age5c2:sexf:marstatunmarried #> 1 NA NA #> age5c3:sexf:marstatunmarried age5c4:sexf:marstatunmarried #> 1 NA NA #> age5c5:sexf:marstatunmarried age5c1:sexm:marstatunmarried #> 1 NA NA #> age5c2:sexm:marstatunmarried age5c3:sexm:marstatunmarried #> 1 NA NA #> age5c4:sexm:marstatunmarried age5c5:sexm:marstatunmarried #> 1 NA NA #> age5c1:sexf:marstatwidowed age5c2:sexf:marstatwidowed #> 1 NA NA #> age5c3:sexf:marstatwidowed age5c4:sexf:marstatwidowed #> 1 NA NA #> age5c5:sexf:marstatwidowed age5c1:sexm:marstatwidowed #> 1 NA NA #> age5c2:sexm:marstatwidowed age5c3:sexm:marstatwidowed #> 1 NA NA #> age5c4:sexm:marstatwidowed age5c5:sexm:marstatwidowed x1:sexf:marstatmarried #> 1 NA NA NA #> x1:sexm:marstatmarried x1:sexf:marstatunmarried x1:sexm:marstatunmarried #> 1 NA NA NA #> x1:sexf:marstatwidowed x1:sexm:marstatwidowed #> 1 NA NA# 5.1) The same problem with partitioned solutions: # 5.1.1) Using sex to define calibration domains: pop.template(data=example,calmodel=~(age5c+x1-1):marstat,partition=~sex)#> sex age5c1:marstatmarried age5c2:marstatmarried age5c3:marstatmarried #> 1 f NA NA NA #> 2 m NA NA NA #> age5c4:marstatmarried age5c5:marstatmarried age5c1:marstatunmarried #> 1 NA NA NA #> 2 NA NA NA #> age5c2:marstatunmarried age5c3:marstatunmarried age5c4:marstatunmarried #> 1 NA NA NA #> 2 NA NA NA #> age5c5:marstatunmarried age5c1:marstatwidowed age5c2:marstatwidowed #> 1 NA NA NA #> 2 NA NA NA #> age5c3:marstatwidowed age5c4:marstatwidowed age5c5:marstatwidowed #> 1 NA NA NA #> 2 NA NA NA #> x1:marstatmarried x1:marstatunmarried x1:marstatwidowed #> 1 NA NA NA #> 2 NA NA NA# 5.1.2) Using marstat to define calibration domains: pop.template(data=example,calmodel=~(age5c+x1-1):sex,partition=~marstat)#> marstat age5c1:sexf age5c2:sexf age5c3:sexf age5c4:sexf age5c5:sexf #> 1 married NA NA NA NA NA #> 2 unmarried NA NA NA NA NA #> 3 widowed NA NA NA NA NA #> age5c1:sexm age5c2:sexm age5c3:sexm age5c4:sexm age5c5:sexm x1:sexf x1:sexm #> 1 NA NA NA NA NA NA NA #> 2 NA NA NA NA NA NA NA #> 3 NA NA NA NA NA NA NA# 5.1.3) Using sex and marstat to define calibration domains: pop.template(data=example,calmodel=~age5c+x1-1,partition=~sex:marstat)#> sex marstat age5c1 age5c2 age5c3 age5c4 age5c5 x1 #> 1 f married NA NA NA NA NA NA #> 2 f unmarried NA NA NA NA NA NA #> 3 f widowed NA NA NA NA NA NA #> 4 m married NA NA NA NA NA NA #> 5 m unmarried NA NA NA NA NA NA #> 6 m widowed NA NA NA NA NA NA