Checks whether a known population totals data frame conforms to the standard required by e.calibrate for a specific calibration problem.

population.check(df.population, data, calmodel, partition = FALSE)

Arguments

df.population

Data frame of known population totals.

data

Data frame of survey data (or an object inheriting from class analytic).

calmodel

Formula defining the linear structure of the calibration model.

partition

Formula specifying the variables that define the "calibration domains" for the model. FALSE (the default) implies no calibration domains.

Details

The behaviour of this function depends on the outcome of the test. If df.population is found to conform to the standard, the function first converts it into an object of class pop.totals and then invisibly returns it. Failing this, the function stops and prints an error message: the meaning of the message should help the user diagnose the cause of the problem.

The mandatory argument df.population identifies the known totals data frame for which compliance with the standard is to be checked.

The mandatory argument data identifies the survey data frame on which the calibration problem is defined (or, as an alternative, an analytic object built upon that data frame).

The mandatory argument calmodel symbolically defines the calibration model you intend to use: it identifies the auxiliary variables and the constraints for the calibration problem. The data variables referenced by calmodel must be numeric or factor and must not contain any missing value (NA).

The optional argument partition specifies the variables that define the calibration domains for the model. The default value (FALSE) means either that there are not calibration domains or that you want to solve the problem globally (even though it could be factorized). If a formula is passed through the partition argument the program checks that calmodel actually describes a "reduced model", that is it does not reference any of the partition variables; if this is not the case, the program stops and prints an error message. Notice that a formula like by=~D1+D2 will be automatically translated into the factor-crossing formula by=~D1:D2. The data variables referenced by partition (if any) must be factor and must not contain any missing value (NA). Note that, if the partition formula involves two or more factors, their crossed levels will be ordered according to operator : (that is, those from the rightmost variable will vary fastest).

Value

An invisible object of class pop.totals. The pop.totals class is a specialization of the data.frame class; this means that an object built by pop.template inherits from the data.frame class and you can use on it every method defined on that class.

Note

The population.check function can be used to convert a known totals data frame that conforms to the standard required by e.calibrate into an object of class pop.totals. The usefulness of this conversion lies in the fact that, once you have known totals with this "certified format", you can invoke e.calibrate without specifying the values for the calmodel and partition arguments (this means that the function is able to extract them directly from the attributes of the pop.totals object).

References

Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.

See also

e.calibrate for calibrating weights, pop.template for the definition of the class pop.totals and to build a "template" data frame for known population totals, fill.template to automatically fill the template when a sampling frame is available.

Examples

data(data.examples) # Suppose you have to calibrate the example survey data frame # on the totals of x1 by sex and you want the partitioned solution. # Start creating a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Then build a template data frame for the known totals: pop<-pop.template(data=example,calmodel=~x1-1,partition=~sex) pop
#> sex x1 #> 1 f NA #> 2 m NA
class(pop)
#> [1] "pop.totals" "data.frame"
# Now fill NAs with the actual values for the population # totals (suppose 123 for sex="f" and 456 for sex="m"): pop[,"x1"]<-c(123,456) pop
#> sex x1 #> 1 f 123 #> 2 m 456
class(pop)
#> [1] "pop.totals" "data.frame"
# Finally check if pop complies with the e.calibrate standard: population.check(df.population=pop,data=example,calmodel=~x1-1, partition=~sex)
#> #> # Checking Known Totals dataframe: OK #>
# If, despite keeping the content unchanged, we altered the # structure of the data frame (for example, by changing the # order of its rows)... pop.mod<-pop ; pop.mod[1,]<-pop[2,] ; pop.mod[2,]<-pop[1,] pop
#> sex x1 #> 1 f 123 #> 2 m 456
pop.mod
#> sex x1 #> 1 m 456 #> 2 f 123
# ...we would obtain an error: if (FALSE) { population.check(df.population=pop.mod,data=example,calmodel=~x1-1, partition=~sex) } # Remember that, if the known totals have been converted # into the pop.totals "format" by means of population.check, # it is possible to invoke e.calibrate without specifying # calmodel and partition: class(pop04p)
#> [1] "pop.totals" "data.frame"
pop04p
#> regcod x1 x2 x3 #> 1 6 18403 5870 6525 #> 2 7 22484 7557 8092 #> 3 10 13726 4884 5659
descal04p<-e.calibrate(design=des,df.population=pop04p, calfun="logit",bounds=bounds,aggregate.stage=2) # ...this option is not allowed if the known totals # are not of class 'pop.totals' even if they conform to the # standard: pop04p.mod<-data.frame(pop04p) class(pop04p.mod)
#> [1] "data.frame"
pop04p.mod
#> regcod x1 x2 x3 #> 1 6 18403 5870 6525 #> 2 7 22484 7557 8092 #> 3 10 13726 4884 5659
if (FALSE) { e.calibrate(design=des,df.population=pop04p.mod,calfun="logit", bounds=bounds,aggregate.stage=2) }