population.check.Rd
Checks whether a known population totals data frame conforms to the standard required by e.calibrate
for a specific calibration problem.
population.check(df.population, data, calmodel, partition = FALSE)
df.population | Data frame of known population totals. |
---|---|
data | Data frame of survey data (or an object inheriting from class |
calmodel | Formula defining the linear structure of the calibration model. |
partition | Formula specifying the variables that define the "calibration domains" for the model. |
The behaviour of this function depends on the outcome of the test. If df.population
is found to conform to the standard, the function first converts it into an object of class pop.totals
and then invisibly returns it. Failing this, the function stops and prints an error message: the meaning of the message should help the user diagnose the cause of the problem.
The mandatory argument df.population
identifies the known totals data frame for which compliance with the standard is to be checked.
The mandatory argument data
identifies the survey data frame on which the calibration problem is defined (or, as an alternative, an analytic
object built upon that data frame).
The mandatory argument calmodel
symbolically defines the calibration model you intend to use: it identifies the auxiliary variables and the constraints for the calibration problem. The data
variables referenced by calmodel
must be numeric
or factor
and must not contain any missing value (NA
).
The optional argument partition
specifies the variables that define the calibration domains for the model. The default value (FALSE
) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). If a formula is passed through the partition
argument the program checks that calmodel
actually describes a "reduced model", that is it does not reference any of the partition variables; if this is not the case, the program stops and prints an error message. Notice that a formula like by=~D1+D2
will be automatically translated into the factor-crossing formula by=~D1:D2
. The data
variables referenced by partition
(if any) must be factor
and must not contain any missing value (NA
). Note that, if the partition
formula involves two or more factors, their crossed levels will be ordered according to operator :
(that is, those from the rightmost variable will vary fastest).
An invisible object of class pop.totals
. The pop.totals
class is a specialization of the data.frame
class; this means that an object built by pop.template
inherits from the data.frame
class and you can use on it every method defined on that class.
The population.check
function can be used to convert a known totals data frame that conforms to the standard required by e.calibrate
into an object of class pop.totals
. The usefulness of this conversion lies in the fact that, once you have known totals with this "certified format", you can invoke e.calibrate
without specifying the values for the calmodel
and partition
arguments (this means that the function is able to extract them directly from the attributes of the pop.totals
object).
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.
e.calibrate
for calibrating weights, pop.template
for the definition of the class pop.totals
and to build a "template" data frame for known population totals, fill.template
to automatically fill the template when a sampling frame is available.
data(data.examples) # Suppose you have to calibrate the example survey data frame # on the totals of x1 by sex and you want the partitioned solution. # Start creating a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Then build a template data frame for the known totals: pop<-pop.template(data=example,calmodel=~x1-1,partition=~sex) pop#> sex x1 #> 1 f NA #> 2 m NAclass(pop)#> [1] "pop.totals" "data.frame"# Now fill NAs with the actual values for the population # totals (suppose 123 for sex="f" and 456 for sex="m"): pop[,"x1"]<-c(123,456) pop#> sex x1 #> 1 f 123 #> 2 m 456class(pop)#> [1] "pop.totals" "data.frame"# Finally check if pop complies with the e.calibrate standard: population.check(df.population=pop,data=example,calmodel=~x1-1, partition=~sex)#> #> # Checking Known Totals dataframe: OK #># If, despite keeping the content unchanged, we altered the # structure of the data frame (for example, by changing the # order of its rows)... pop.mod<-pop ; pop.mod[1,]<-pop[2,] ; pop.mod[2,]<-pop[1,] pop#> sex x1 #> 1 f 123 #> 2 m 456pop.mod#> sex x1 #> 1 m 456 #> 2 f 123# ...we would obtain an error: if (FALSE) { population.check(df.population=pop.mod,data=example,calmodel=~x1-1, partition=~sex) } # Remember that, if the known totals have been converted # into the pop.totals "format" by means of population.check, # it is possible to invoke e.calibrate without specifying # calmodel and partition: class(pop04p)#> [1] "pop.totals" "data.frame"pop04p#> regcod x1 x2 x3 #> 1 6 18403 5870 6525 #> 2 7 22484 7557 8092 #> 3 10 13726 4884 5659descal04p<-e.calibrate(design=des,df.population=pop04p, calfun="logit",bounds=bounds,aggregate.stage=2) # ...this option is not allowed if the known totals # are not of class 'pop.totals' even if they conform to the # standard: pop04p.mod<-data.frame(pop04p) class(pop04p.mod)#> [1] "data.frame"pop04p.mod#> regcod x1 x2 x3 #> 1 6 18403 5870 6525 #> 2 7 22484 7557 8092 #> 3 10 13726 4884 5659if (FALSE) { e.calibrate(design=des,df.population=pop04p.mod,calfun="logit", bounds=bounds,aggregate.stage=2) }