ReGenesees.options.Rd
This help page documents the options that control the behaviour of the ReGenesees package with respect to standard error estimation.
The ReGenesees package provides four options for variance
estimations which can be freely set and modified by the user:
- RG.ultimate.cluster
- RG.lonely.psu
- RG.adjust.domain.lonely
- RG.warn.domain.lonely
When options("RG.ultimate.cluster")
is TRUE
,
the ReGenesees package adopts the so called “Ultimate
Cluster Approximation” [Kalton 79]. Under this approximation, the overall
sampling variance for a multistage sampling design is estimated by
taking into account only the contribution arising from the estimated PSU
totals (thus simply ignoring any available information about subsequent
sampling stages). For without replacement sampling designs, this approach
is known to underestimate the true multistage variance, while - at the
same time - overestimating its true first-stage component. Anyway, the
underestimation error becomes negligible if the PSUs' sampling fractions
across strata are very small. When sampling with replacement, the Ultimate
Cluster approach is no longer an approximation, but rather an exact result.
Hence, be options("RG.ultimate.cluster")
TRUE
or FALSE
,
if one does not specify first-stage finite population corrections, ReGenesees
will produce exactly the same variance estimates.
When options("RG.ultimate.cluster")
is FALSE
,
each sampling stage contributes and variances get estimated by means
of a recursive algorithm [Bellhouse, 85] inherited and adapted from
package survey [Lumley 06]. Notice that the results obtained
by choosing this option can differ from the one that would be obtained
under the "Ultimate Cluster Approximation" only if first-stage
finite population corrections are specified.
Lonely PSUs (i.e. PSUs which are alone inside a not self-representing
stratum) are a concern from the viewpoint of variance estimation. The
suggested ReGenesees facility to handle the lonely PSUs problem is
the strata aggregation technique (see e.g. [Wolter 07] and [Rust, Kalton 87])
provided in function collapse.strata
.
As a possible alternative, you can get rid of lonely PSUs also by setting
proper variance estimation options via options("RG.lonely.psu")
.
The default setting is "fail"
, which raises an error if a lonely PSU
is met. Option "remove"
simply causes the software to ignore lonely PSUs
for variance computation purposes. Option "adjust"
means that
deviations from the population mean will be used in variance
estimation formulae, instead of deviations from the stratum mean
(a conservative choice). Finally, option "average"
causes the
software to replace the variance contribution of the stratum by the average
variance contribution across strata (this can be appropriate e.g. when one
believes that lonely PSU strata occur at random due to uniform nonresponse
among strata).
The variance formulae for domain estimation give well-defined,
positive results when a stratum contains only a single PSU with
observations falling in the domain, but are not unbiased.
If options("RG.adjust.domain.lonely")
is TRUE
and options("RG.lonely.psu")
is "average"
or
"adjust"
the same adjustment for lonely PSUs will be used
within a domain. Note that this adjustment is not available for
calibrated designs.
If options("RG.warn.domain.lonely")
is set to TRUE
, a
warning message is raised whenever an estimation domain happens to
contain just a single PSU belonging to a stratum. The default is FALSE
.
Kalton, G. (1979). “Ultimate cluster sampling”, Journal of the Royal Statistical Society, Series A, 142, pp. 210-222.
Bellhouse, D. R. (1985). “Computing Methods for Variance Estimation in Complex Surveys”. Journal of Official Statistics, Vol. 1, No. 3, pp. 323-329.
Lumley, T. (2006) “survey: analysis of complex survey samples”, https://CRAN.R-project.org/package=survey.
Wolter, K.M. (2007) “Introduction to Variance Estimation”, Second Edition, Springer-Verlag, New York.
Rust, K., Kalton, G. (1987) “Strategies for Collapsing Strata for Variance Estimation”, Journal of Official Statistics, Vol. 3, No. 1, pp. 69-81.
e.svydesign
and its self.rep.str
argument for a
"compromise solution" that can be adopted when the sampling design
involves self-representing (SR) strata, collapse.strata
for the suggested way of handling lonely PSUs, and fpcdat
for useful data examples.
# Define a two-stage stratified cluster sampling without # replacement: data(fpcdat) des<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum,weights=~w, fpc=~fpc1+fpc2) # Now compare SE (or CV%) sizes under different settings: ## 1) Default setting, i.e. Ultimate Cluster Approximation is off svystatTM(des,~x+y+z,vartype=c("se","cvpct"))#> Total SE CV% #> x 1005.1667 158.6328 15.78174 #> y 976.2417 235.1718 24.08951 #> z 19180.1349 3306.6124 17.23978## 2) Turn on the Ultimate Cluster Approximation, thus missing ## the variance contribution from the second stage ## (hence SR strata give no contribution at all): old.op <- options("RG.ultimate.cluster"=TRUE) svystatTM(des,~x+y+z,vartype=c("se","cvpct"))#> Total SE CV% #> x 1005.1667 152.2222 15.14398 #> y 976.2417 231.5724 23.72081 #> z 19180.1349 3258.6512 16.98972options(old.op) ## 3) The "compromise solution" (see ?e.svydesign) i.e. retaining ## only the leading contribution to the sampling variance (namely ## the one arising from SSUs in SR strata and PSUs in not-SR strata): des2<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum,weights=~w, fpc=~fpc1+fpc2, self.rep.str=~sr)#> Warning: Sampling variance estimation for this design will take into account only leading contributions, i.e. PSUs in not-SR strata and SSUs in SR strata (see ?e.svydesign and ?ReGenesees.options for details)#> Total SE CV% #> x 1005.1667 152.4536 15.16700 #> y 976.2417 231.7568 23.73970 #> z 19180.1349 3268.6730 17.04197# Therefore, sampling variances come out in the expected # hierarchy: 1) > 3) > 2). # Under default settings lonely PSUs produce errors in standard # errors estimation (notice we didn't pass the fpcs): data(fpcdat) des.lpsu<-e.svydesign(data=fpcdat,ids=~psu+ssu,strata=~stratum, weights=~w) if (FALSE) { svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct")) } # This can be circumvented in different ways, namely: old.op <- options("RG.lonely.psu"="adjust") svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct"))#> Total SE CV% #> x 1005.1667 199.0398 19.80167 #> y 976.2417 277.1037 28.38474 #> z 19180.1349 4079.4900 21.26935options(old.op) # or: old.op <- options("RG.lonely.psu"="average") svystatTM(des.lpsu,~x+y+z,vartype=c("se","cvpct"))#> Total SE CV% #> x 1005.1667 220.1591 21.90275 #> y 976.2417 328.8805 33.68843 #> z 19180.1349 4741.9620 24.72330options(old.op) # or otherwise by collapsing strata inside planned # estimation domains: des.clps<-collapse.strata(design=des.lpsu,block.vars=~pl.domain)#> #> # All lonely strata (2) successfully collapsed! #>#> Warning: No similarity score specified: achieved strata aggregation depends on the ordering of sample data#> Total SE CV% #> x 1005.1667 199.2386 19.82145 #> y 976.2417 272.4241 27.90540 #> z 19180.1349 4154.4085 21.65995