This help page documents the options that control the behaviour of the ReGenesees package with respect to standard error estimation.

Details

The ReGenesees package provides four options for variance estimations which can be freely set and modified by the user:

- RG.ultimate.cluster
- RG.lonely.psu
- RG.adjust.domain.lonely
- RG.warn.domain.lonely

When options("RG.ultimate.cluster") is TRUE, the ReGenesees package adopts the so called “Ultimate Cluster Approximation” [Kalton 79]. Under this approximation, the overall sampling variance for a multistage sampling design is estimated by taking into account only the contribution arising from the estimated PSU totals (thus simply ignoring any available information about subsequent sampling stages). For without replacement sampling designs, this approach is known to underestimate the true multistage variance, while - at the same time - overestimating its true first-stage component. Anyway, the underestimation error becomes negligible if the PSUs' sampling fractions across strata are very small. When sampling with replacement, the Ultimate Cluster approach is no longer an approximation, but rather an exact result. Hence, be options("RG.ultimate.cluster") TRUE or FALSE, if one does not specify first-stage finite population corrections, ReGenesees will produce exactly the same variance estimates.

When options("RG.ultimate.cluster") is FALSE, each sampling stage contributes and variances get estimated by means of a recursive algorithm [Bellhouse, 85] inherited and adapted from package survey [Lumley 06]. Notice that the results obtained by choosing this option can differ from the one that would be obtained under the "Ultimate Cluster Approximation" only if first-stage finite population corrections are specified.

Lonely PSUs (i.e. PSUs which are alone inside a not self-representing stratum) are a concern from the viewpoint of variance estimation. The suggested ReGenesees facility to handle the lonely PSUs problem is the strata aggregation technique (see e.g. [Wolter 07] and [Rust, Kalton 87]) provided in function collapse.strata. As a possible alternative, you can get rid of lonely PSUs also by setting proper variance estimation options via options("RG.lonely.psu"). The default setting is "fail", which raises an error if a lonely PSU is met. Option "remove" simply causes the software to ignore lonely PSUs for variance computation purposes. Option "adjust" means that deviations from the population mean will be used in variance estimation formulae, instead of deviations from the stratum mean (a conservative choice). Finally, option "average" causes the software to replace the variance contribution of the stratum by the average variance contribution across strata (this can be appropriate e.g. when one believes that lonely PSU strata occur at random due to uniform nonresponse among strata).

The variance formulae for domain estimation give well-defined, positive results when a stratum contains only a single PSU with observations falling in the domain, but are not unbiased.
If options("RG.adjust.domain.lonely") is TRUE and options("RG.lonely.psu") is "average" or "adjust" the same adjustment for lonely PSUs will be used within a domain. Note that this adjustment is not available for calibrated designs.

If options("RG.warn.domain.lonely") is set to TRUE, a warning message is raised whenever an estimation domain happens to contain just a single PSU belonging to a stratum. The default is FALSE.

References

Kalton, G. (1979). “Ultimate cluster sampling”, Journal of the Royal Statistical Society, Series A, 142, pp. 210-222.

Bellhouse, D. R. (1985). “Computing Methods for Variance Estimation in Complex Surveys”. Journal of Official Statistics, Vol. 1, No. 3, pp. 323-329.

Lumley, T. (2006) “survey: analysis of complex survey samples”, https://CRAN.R-project.org/package=survey.

Wolter, K.M. (2007) “Introduction to Variance Estimation”, Second Edition, Springer-Verlag, New York.

Rust, K., Kalton, G. (1987) “Strategies for Collapsing Strata for Variance Estimation”, Journal of Official Statistics, Vol. 3, No. 1, pp. 69-81.

See also

e.svydesign and its self.rep.str argument for a "compromise solution" that can be adopted when the sampling design involves self-representing (SR) strata, collapse.strata for the suggested way of handling lonely PSUs, and fpcdat for useful data examples.

Examples

# Define a two-stage stratified cluster sampling without # replacement: data(fpcdat) des<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum,weights=~w, fpc=~fpc1+fpc2) # Now compare SE (or CV%) sizes under different settings: ## 1) Default setting, i.e. Ultimate Cluster Approximation is off svystatTM(des,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 158.6328 15.78174 #> y 976.2417 235.1718 24.08951 #> z 19180.1349 3306.6124 17.23978
## 2) Turn on the Ultimate Cluster Approximation, thus missing ## the variance contribution from the second stage ## (hence SR strata give no contribution at all): old.op <- options("RG.ultimate.cluster"=TRUE) svystatTM(des,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 152.2222 15.14398 #> y 976.2417 231.5724 23.72081 #> z 19180.1349 3258.6512 16.98972
options(old.op) ## 3) The "compromise solution" (see ?e.svydesign) i.e. retaining ## only the leading contribution to the sampling variance (namely ## the one arising from SSUs in SR strata and PSUs in not-SR strata): des2<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum,weights=~w, fpc=~fpc1+fpc2, self.rep.str=~sr)
#> Warning: Sampling variance estimation for this design will take into account only leading contributions, i.e. PSUs in not-SR strata and SSUs in SR strata (see ?e.svydesign and ?ReGenesees.options for details)
svystatTM(des2,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 152.4536 15.16700 #> y 976.2417 231.7568 23.73970 #> z 19180.1349 3268.6730 17.04197
# Therefore, sampling variances come out in the expected # hierarchy: 1) > 3) > 2). # Under default settings lonely PSUs produce errors in standard # errors estimation (notice we didn't pass the fpcs): data(fpcdat) des.lpsu<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum, weights=~w) if (FALSE) { svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct")) } # This can be circumvented in different ways, namely: old.op <- options("RG.lonely.psu"="adjust") svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 199.0398 19.80167 #> y 976.2417 277.1037 28.38474 #> z 19180.1349 4079.4900 21.26935
options(old.op) # or: old.op <- options("RG.lonely.psu"="average") svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 220.1591 21.90275 #> y 976.2417 328.8805 33.68843 #> z 19180.1349 4741.9620 24.72330
options(old.op) # or otherwise by collapsing strata inside planned # estimation domains: des.clps<-collapse.strata(design=des.lpsu,block.vars=~pl.domain)
#> #> # All lonely strata (2) successfully collapsed! #>
#> Warning: No similarity score specified: achieved strata aggregation depends on the ordering of sample data
svystatTM(des.clps,~x+y+z,vartype=c("se","cvpct"))
#> Total SE CV% #> x 1005.1667 199.2386 19.82145 #> y 976.2417 272.4241 27.90540 #> z 19180.1349 4154.4085 21.65995