Computes estimates and sampling errors of a Measure of Change from two not necessarily independent samples. The Measure of Change can be any analytic function of Horvitz-Thompson or Calibration estimators derived from the two samples.

svyDelta(expr, design1, design2, by = NULL,
         des.INDEP = FALSE, rho.STRAT = c("Full", "noJump", "noStrat"),
         vartype = c("se", "cv", "cvpct", "var"),
         conf.int = FALSE, conf.lev = 0.95)

details(object, print.call = TRUE, ...)

# S3 method for svyDelta
coef(object, ...)
# S3 method for svyDelta
SE(object, ...)
# S3 method for svyDelta
VAR(object, ...)
# S3 method for svyDelta
cv(object, ...)
# S3 method for svyDelta
confint(object, ...)

Arguments

expr

R expression defining the Measure of Change (see svystatL for the basic syntax; see also section ‘Details’).

design1

Object of class analytic (or inheriting from it) containing survey data and sampling design metadata related to the first sample. See ‘Details’.

design2

Object of class analytic (or inheriting from it) containing survey data and sampling design metadata related to the second sample. See ‘Details’.

by

Formula specifying the variables that define the "estimation domains". If NULL (the default option) estimates refer to the whole population. See ‘Details’.

des.INDEP

Have the samples in design1 and design2 been selected independently? The default is FALSE. See ‘Details’.

rho.STRAT

To which extent should stratification be considered when estimating correlations? The default is "Full", which accommodates even dynamic stratification but can sometimes be too computationally demanding. See ‘Details’ and ‘Methodological and Computational Remarks’ for alternatives.

vartype

character vector specifying the desired variability estimators. It is possible to choose one or more of: standard error ('se', the default), coefficient of variation ('cv'), percent coefficient of variation ('cvpct'), or variance ('var').

conf.int

Compute confidence intervals for the estimates? The default is FALSE.

conf.lev

Probability specifying the desired confidence level: the default value is 0.95.

object

An object created by invoking function svyDelta, about which details are sought.

print.call

Print on screen the call of svyDelta that generated the input object?

...

Additional arguments to coef, ..., confint methods (if any). For function details, arguments for future extensions (currently unused).

Details

Function svyDelta computes estimates and sampling errors of a Measure of Change from two not necessarily independent samples.

For ‘Measure of Change’ we mean here any function that can be used to compare cross-sectional estimates of two population parameters derived from two samples. The easiest Measure of Change is the simple difference between the estimators of the same population parameter referred to two distinct survey occasions. Estimating this difference and its sampling variance is central to hypothesis testing (e.g. for impact evaluation). More complex Measures of Change can of course be considered, for instance nonlinear ones, like the relative difference between two estimators. In addition, the cross-sectional estimators being compared in the Measure of Change can themselves be complex (i.e. expressed as nonlinear functions of Horvitz-Thompson or Calibration estimators).

Function svyDelta can accommodate all the cases sketched above.

When the two samples being analyzed are independent, estimating the sampling variance of a simple Measure of Change is fairly straightforward: one just has to add variance estimates derived from the two samples. The problem becomes non-trivial when the two samples are not independent, because sampling covariance terms come into play. Yet, the analysis of non-independent samples is often required in concrete applications, including panel studies (with or without sample rotation), repeated cross-sectional surveys making use of coordinated samples, quasi-experimental designs in which control and treatment groups share some level of sampling information (e.g. observations are taken within same PSUs), etc.

If the samples in design1 and design2 are not independent, function svyDelta estimates the needed sampling covariance terms following the method of [Berger, Priam 16]. Note that this method derives design-based covariance estimates from estimates of correlation (rho), which are computed in the first place.

If the Measure of Change is complex, svyDelta automatically linearizes it. Automatic linearization is performed as function svystatL would do, along the lines illustrated in [Zardetto, 15], see also section ‘Methodological and Computational Remarks’ below.

The mandatory argument expr, which identifies the Measure of Change, must be an object of class expression. You can specify just a single Measure of Change at a time, i.e. length(expr) must be equal to 1. Any analytic function of estimators of Totals and Means derived from the two input design objects design1 and design2 is allowed.

The basic syntax follows the same rules described for function svystatL. Inside expr, the estimator of the Total of a variable is simply represented by the name of the variable itself. The reserved name ones can be used to reference an artificial variable (which will be created on-the-fly, if not already present) whose value is 1 for each sampling unit, so that its Total estimator actually estimates the size of the population in terms of elementary units. Therefore, expression y/ones represents the estimator of the Mean of variable y. Variables referenced inside expr must be numeric and belong to design1 and/or design2.

To unequivocally associate each variable either to design1 or to design2, suffixes '.1' and '.2' must always be used. This makes it possible to reuse the same variable names across the two samples. For instance, expression(y.1 - y.2), represents the difference between the Totals of y estimated from design1 and design2, respectively. Similarly, expression((y.1 - y.2) / y.1) represents the relative difference of the same Totals (using the estimate derived from the first sample as reference value). Along the same vein, the difference between the Means of y estimated from design1 and design2 can be specified as expression(y.1/ones.1 - y.2/ones.2). Likewise, expression(y.1/x.1 - y.2/x.2) represents the difference between estimates of a Ratio derived from design1 and design2.

The mathematical expression of the Measure of Change, as specified by argument expr, can involve ‘parameters’, that is symbols representing given, non-random, scalar, numeric values. For each parameter appearing in expr, the value corresponding to its symbol will be searched following R standard scoping rules.

Although function svyDelta is very general and flexible, when the samples are non-independent (i.e. des.INDEP = FALSE, see below), some restrictions exist on arguments design1 and design2:

  1. Objects design1 and design2 must be either both stratified or both unstratified: mixed cases are not allowed.

  2. Objects design1 and design2 must be either both element sampling designs or both cluster sampling designs: mixed cases are not allowed.

  3. Objects design1 and design2 can be both uncalibrated, both calibrated, or even one uncalibrated and the other calibrated (i.e. mixed cases are allowed).

The method of [Berger, Priam 16] can handle non-independent samples with fixed (i.e. non-random) overlap (including the limiting cases of no overlap and full overlap, for which it reproduces “classical” covariance estimates). Therefore, in order for function svyDelta to work properly, it is a fundamental requirement that the overlapping portion of design1 and design2 can be unequivocally identified. To this end, the identifiers of sampling units originally used by function e.svydesign to define objects design1 and design2 (or, if they are calibrated, their uncalibrated counterparts) must be consistent. Note that function svyDelta will use those same identifiers as keys to merge the survey data of design1 and design2 and thereby identify their overlapping subsample. Since, for multi-stage designs, the method of [Berger, Priam 16] resorts to the ultimate cluster approximation, in those cases the fundamental piece of information actually concerns the identifier of the PSUs of design1 and design2: it is crucial that the latter can be used to identify PSUs that are common to both samples.

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by svyDelta refer to the whole population. If specified, estimation domains must be defined by a formula, following the usual syntactic and semantic rules (see e.g. svystatL). Variables referenced in by must be common to both design1 and design2, and suffixes '.1' and '.2' must not be used for them.

Concerning domain estimation, a methodological caveat is in order. Since the method of [Berger, Priam 16] handles non-independent samples assuming their overlap is fixed, in case design1 and design2 are not independent one should use svyDelta only for planned estimation domains, namely domains that can be obtained by aggregation of sampling strata. In any case, even considering unplanned estimation domains, point estimates produced by svyDelta will still be correct.

Argument des.INDEP must be used to tell svyDelta whether the samples in design1 and design2 were selected independently or not (the default is FALSE). Note that this argument is indeed necessary, because in finite population sampling there is always a chance that two independent samples have a non-null overlap. Therefore, in case of any overlap in design1 and design2, function svyDelta will behave differently depending on the value you specify for des.INDEP:

  • If des.INDEP = FALSE, then any overlap will be regarded as non-accidental and intended by design, thereby triggering the method of [Berger, Priam 16] for calculating the correlation terms needed to estimate the sampling variance of the Measure of Change.

  • If des.INDEP = TRUE, then any overlap will be regarded as accidental and not intended by design, thereby leading to the sampling variance estimate of the Measure of Change that is appropriate for independent samples.

Note that, when the samples are declared to be independent by setting des.INDEP = TRUE, even the above restrictions 1. and 2. on objects design1 and design2 are obviously lifted, and function svyDelta can be used to compare estimators coming from arbitrarily mixed sampling designs (e.g. stratified element vs. unstratified cluster).

When non-independent samples with sizable overlap are concerned (e.g. for rotating panel surveys), the method of [Berger and Priam 2016] may become computationally demanding, especially for highly stratified unit sampling designs (see section ‘Methodological and Computational Remarks’). In such cases, if the invocation of svyDelta with the default setting rho.STRAT = "Full" resulted in unaffordable computation time or memory-failure on your machine, you might want to resort to lighter, but approximate, alternatives:

  • If rho.STRAT = "noJump", then function svyDelta will disregard the contribution of units that changed stratum from design1 to design2 (if any) when estimating correlation terms (rho). Note that, in case no such units actually exist, the result will be the same that would be obtained with the default setting rho.STRAT = "Full", only it will be achieved using much less computing time and memory.

  • If rho.STRAT = "noStrat", then function svyDelta will entirely disregard stratification when estimating correlation terms (rho). This sometimes results in huge computing time and memory savings.

In the light of the above, setting rho.STRAT = "noJump" seems a very sound option in many concrete large-scale applications (see section ‘Methodological and Computational Remarks’).

Note, in any case, that both the approximations rho.STRAT = "noJump" and rho.STRAT = "noStrat" are restricted to correlation terms only: all the other terms involved in the estimator of the sampling variance of the Measure of Change will still be calculated by properly and fully taking into account the stratification of design1 and design2.

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95.

Function details provides more details about the results of function svyDelta. It takes in input an object returned by svyDelta, and prints on screen that object together with additional diagnostic information. This additional information (see also section ‘Methodological and Computational Remarks’) is stored in a data frame with one row and the following columns:

  Column    Meaning
  n1........Sample size of design1 (in terms of elements or PSUs, depending
            on the sampling design)
  n2........Sample size of design2 (in terms of elements or PSUs, depending
            on the sampling design)
  nc........Size of the overlapping sample between design1 and design2
  overlap...Rate of overlap: nc / ( (n1 + n2) / 2 )
  V1........Estimated sampling variance of the linearized Measure of Change,
            arranged in standard difference form, w.r.t. design1: V1(L.1)
  V2........Estimated sampling variance of the linearized Measure of Change,
            arranged in standard difference form, w.r.t. design2: V2(L.2)
  Vind......Estimated sampling variance of the linearized Measure of Change,
            as would be obtained if design1 and design2 were independent: V1 + V2
  rho.......Estimated sampling correlation between L.1 and L.2
  CoV.......Estimated sampling covariance between L.1 and L.2:
            rho * sqrt(V1) * sqrt(V2)
  V.........Estimated sampling variance of the linearized Measure of Change
            w.r.t. design1 and design2 jointly: V1 + V2 - 2 * CoV

Note that the same data frame is also invisibly returned by function details.

Value

An object inheriting from the data.frame class, whose detailed structure depends on input parameters' values.

Methodological and Computational Remarks

The method proposed by [Berger and Priam 2016] estimates correlations (rho) using the estimated residual variance-covariance matrix of a suitable multivariate regression model. The explanatory variables of this multivariate regression model encode information about stratification, and overlap of the two samples. However, the estimation approach remains fully design-based (i.e. not model-based), because the multivariate regression model is exploited only to make calculations easier, whereas its validity is never assumed, nor used, for inferential purposes.

As already mentioned, the method of [Berger and Priam 2016] assumes instead that (i) the size of both samples and, importantly, of their overlapping subsample is fixed, and that (ii) sampling fractions are negligible in both samples. For multi-stage designs, both conditions (i) and (ii) should be fulfilled at the PSU-level.

When stratified designs are concerned, the method is flexible enough to handle dynamic stratification. More explicitley, the sample at time 2 can include new strata that were not present at time 1, and sampling units that are common to both samples can belong to different strata at time 1 and 2. However, as already mentioned, this flexibility comes with a computational cost, which can become prohibitive for highly stratified sampling designs.

If no units exist that changed stratum from design1 to design2, then correlations can be estimated exactly using a more parsimonious multivariate regression model than the one of [Berger and Priam 2016] (i.e. a model with less interaction terms). This is triggered by rho.STRAT = "noJump". This option allows for remarkable savings in computing time and memory usage, and typically results in a very good approximation when a moderate number of units actually changed stratum.

In extreme cases, the option of entirely disregarding stratification when estimating correlations (triggered by setting rho.STRAT = "noStrat") can sometimes be the only viable solution, at least in ordinary computing environments. Unfortunately, however, the inferential effect of this approximation (e.g. whether it leads to over-estimation or under-estimation) is unclear.

Complex Measures of Change, expressed as nonlinear functions of Horvitz-Thompson or Calibration estimators derived from the two samples, are inherently challenging, and would still be so even for independent samples. To estimate the sampling variance of a complex Measure of Change, svyDelta automatically linearizes the estimator specified by argument expression, call it Delta. Automatic linearization is performed as in function svystatL, along the lines illustrated in [Zardetto, 15]. Put briefly:

  1. Woodruff transforms of the estimator Delta are derived with respect to design1 and design2, respectively.

  2. In case any of the two input designs is calibrated, the Woodruff transform associated to it and computed in step 1. includes the appropriate residuals of the involved variables with respect to the calibration model.

  3. The linearized expression of the Measure of Change Delta is re-arranged in standard difference form, L.Delta = L.1 - L.2, where L.1 and L.2 are linear functions of the previously computed Woodruff transforms with suitable signs.

  4. The method of [Berger and Priam 2016] is applied to L.Delta to derive the estimate of the correlation between L.1 and L.2 (this is the rho value that can be extracted using function details).

  5. The other terms of the variance of L.Delta are estimated using the respective sampling designs separately.

  6. The overall estimate of the variance of L.Delta is re-constructed using the terms computed in steps 4. and 5.

References

Berger, Y. G., Priam, R. (2016). “A simple variance estimator of change for rotating repeated surveys: an application to the European Union Statistics on Income and Living Conditions household surveys”. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(1), 251-272.

Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.

Sarndal, C.E., Swensson, B., Wretman, J. (1992) “Model Assisted Survey Sampling”, Springer Verlag.

See also

Estimators of Complex Analytic Functions of Totals and/or Means svystatL. Estimators of Totals and Means svystatTM, Ratios between Totals svystatR, Shares svystatS, Ratios between Shares svystatSR, Multiple Regression Coefficients svystatB, Quantiles svystatQ, and all of the above svystat.

Examples

############################################### # Basic usage and trivial consistency checks. # ############################################### ## Load two non-independent stratified samples of *elementary units*, s1 and s2, ## with 40% overlap (see ?Delta.el) data(Delta.el) # Define the design objects # s1 des1 <- e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = s1) des1
#> Stratified Independent Unit Sampling Design (with replacement) #> - [2] strata #> - [20] units #> #> Call: #> e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = s1)
# s2 des2 <- e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = s2) des2
#> Stratified Independent Unit Sampling Design (with replacement) #> - [2] strata #> - [20] units #> #> Call: #> e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = s2)
# Estimate the difference between cross-sectional estimates of the Total of # variable y d <- svyDelta(expression(y.2 - y.1), des1, des2) d
#> Delta SE #> y.2 - y.1 484.2381 202.1024
# Now, to know if the difference is statistically significant at alpha = 0.05, # just see whether the confidence interval covers 0 confint(d)
#> 2.5 % 97.5 % #> y.2 - y.1 88.12473 880.3515
# it does not, thus the difference is significant # Have a look at the details, e.g. the estimated correlation rho details(d)
#> Delta SE #> y.2 - y.1 484.2381 202.1024 #> #> Call: #> svyDelta(expression(y.2 - y.1), des1, des2) #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 21316.52 51182.57 0.2048452 5168.599 40845.37
# ...so rho ~ 0.2 # Can check some of that figures in the details above: Y1 <- svystatTM(des1, ~y) Y1
#> Total SE #> y 2897.317 172.818
VAR(Y1) # Equal to V1 in details above
#> VAR.y #> 1 29866.05
Y2 <- svystatTM(des2, ~y) Y2
#> Total SE #> y 3381.555 146.0018
VAR(Y2) # Equal to V2 in details above
#> VAR.y #> 1 21316.52
coef(d)
#> y.2 - y.1 #> 484.2381
coef(Y2) - coef(Y1) # Exactly matched, as it must be
#> y #> 484.2381
## Function svyDelta handles *cluster samples* essentially the same way. The key ## is this time the identifier of PSUs. ## Load two non-independent stratified *cluster samples*, sclus1 and sclus2, ## with 50% overlap at the PSU-level (see ?Delta.clus) data(Delta.clus) # Define the design objects # sclus1 dclus1 <- e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = sclus1) dclus1
#> Stratified 1 - Stage Cluster Sampling Design (with replacement) #> - [2] strata #> - [6] clusters #> #> Call: #> e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = sclus1)
# sclus2 dclus2 <- e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = sclus2) dclus2
#> Stratified 1 - Stage Cluster Sampling Design (with replacement) #> - [2] strata #> - [6] clusters #> #> Call: #> e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = sclus2)
# Estimate the difference between cross-sectional estimates of the Total of # variable y d <- svyDelta(expression(y.2 - y.1), dclus1, dclus2) d
#> Delta SE #> y.2 - y.1 1067.583 270.0636
# Check statistical significance at alpha = 0.05 confint(d)
#> 2.5 % 97.5 % #> y.2 - y.1 538.2676 1596.898
# strongly significant (would be so at alpha = 0.01 too) # Have a look at the details, e.g. the estimated correlation rho details(d)
#> Delta SE #> y.2 - y.1 1067.583 270.0636 #> #> Call: #> svyDelta(expression(y.2 - y.1), dclus1, dclus2) #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 6 6 3 0.5 393854.8 487496.1 881350.9 0.9224685 404208.3 72934.36
# ...so, this time, rho ~ 0.9 # NOTE: If one compared the estimates assuming independence, i.e. disregarding # sampling correlations, the difference would be wrongly considered # statistically not significant (see the size of Vind above). # In fact: d <- svyDelta(expression(y.2 - y.1), dclus1, dclus2, des.INDEP = TRUE) details(d)
#> Delta SE #> y.2 - y.1 1067.583 938.8029 #> #> Call: #> svyDelta(expression(y.2 - y.1), dclus1, dclus2, des.INDEP = TRUE) #> #> Details: #> #> Input design objects were declared to be independent
# Check statistical significance at alpha = 0.05 confint(d)
#> 2.5 % 97.5 % #> y.2 - y.1 -772.4373 2907.602
# non-significant, as anticipated. ############################### # Complex Measures of Change. # ############################### ## Relative difference (percent) between Totals of y at time 2 and time 1: d <- svyDelta(expression( 100 * (y.2 - y.1) / y.1 ), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> 100 * (y.2 - y.1)/y.1 16.71333 7.712714 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 48.46482 25.3936 73.85842 0.2048452 7.186232 59.48596
## Difference between Means of y at time 2 and time 1: d <- svyDelta(expression( y.2/ones.2 - y.1/ones.1 ), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> y.2/ones.2 - y.1/ones.1 0.9787263 0.4069125 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 0.1188257 0.08791268 0.2067384 0.2013588 0.0205803 0.1655778
## Relative difference between Ratios Y/X at time 2 and time 1: d <- svyDelta(expression( (y.2/x.2 - y.1/x.1) / (y.1/x.1) ), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> (y.2/x.2 - y.1/x.1)/(y.1/x.1) 0.2643152 0.1120682 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV #> 1 20 20 8 0.4 0.005299806 0.007121985 0.01242179 -0.01118908 -6.874246e-05 #> V #> 1 0.01255928
# NOTE: For the cases above, you may want to perform the same checks illustrated # for the simple difference between Totals, using suitable ReGenesees # functions ##################################### # Examples with calibrated objects. # ##################################### # Suppose that: # (1) Population size was N = 500 at time 1, and N = 501 at time 2 # (2) Total of x was X = 1810 at time 1, and X = 1770 at time 2 # Calibrate on the auxiliary information (1) and (2) above: # design1 des1 <- des.addvars(des1, ones = 1) pop1 <- pop.template(des1, calmodel = ~ ones + x - 1) pop1[,] <- c(500, 1810) pop1
#> ones x #> 1 500 1810
descal1 <- e.calibrate(des1, pop1) # design2 des2 <- des.addvars(des2, ones = 1) pop2 <- pop.template(des2, calmodel = ~ ones + x - 1) pop2[,] <- c(501, 1770) pop2
#> ones x #> 1 501 1770
descal2 <- e.calibrate(des2, pop2) # Compare the estimates of change: ## Calibration estimators d <- svyDelta(expression(y.2 - y.1), descal1, descal2) details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 632.0379 230.7435 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 26275.34 29218.82 55494.16 0.04063073 1125.797 53242.57
## Horvitz-Thompson estimators d <- svyDelta(expression(y.2 - y.1), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 484.2381 202.1024 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 21316.52 51182.57 0.2048452 5168.599 40845.37
# Technically, you can estimate changes in mixed situations, comparing e.g. # Horvitz-Thompson estimators at time 1 with Calibration estimators at time 2: ## Mix HT and CAL d <- svyDelta(expression(y.2 - y.1), des1, descal2) details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 554.618 208.4125 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 29218.82 59084.87 0.2648737 7824.545 43435.78
######################################################## # Dynamic stratification and computational efficiency. # ######################################################## ## Stratification of s1 and s2 is static (see ?Delta.el). # Therefore the full complexity setting (rho.STRAT = "Full", the default), and # the alternative and *much more efficient setting* which disregards stratum- # changer units (if any) in estimating correlations (rho.STRAT = "noJump") will # yield identical results: ## rho.STRAT = "Full" d <- svyDelta(expression(y.2 - y.1), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 484.2381 202.1024 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 21316.52 51182.57 0.2048452 5168.599 40845.37
## rho.STRAT = "noJump" d <- svyDelta(expression(y.2 - y.1), des1, des2, rho.STRAT = "noJump") details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 484.2381 202.1024 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 21316.52 51182.57 0.2048452 5168.599 40845.37
## identical results, as it must be. ## Now check the implications of dynamic stratification. # Simulate new strata at time 2 (new level "C"): levels(s2$strata) <- c("A", "B", "C") # and stratum-changer units from s1 to s2: s2$strata[4] <- "C" s2$strata[14] <- "A" s2$strata[16] <- "C" s2$strata[20] <- "C" # Have a look at the resulting rotation structure of s1 and s2: s <- merge(s1, s2, by = "id", all = TRUE, suffixes = c("1", "2")) s <- s[order(s$strata1, s$strata2), ] s
#> id strata1 w1 y1 x1 strata2 w2 y2 x2 #> 2 2 A 21.06586 1.0 1.265054 A 21.06586 3.308235 3.087006 #> 6 6 A 20.28786 3.0 2.734145 A 20.28786 3.165896 1.143721 #> 8 8 A 20.08048 4.0 3.302226 A 20.08048 4.540171 1.879891 #> 4 4 A 19.98340 2.0 1.281605 C 19.98340 2.977190 2.171311 #> 1 1 A 18.55982 0.5 1.162374 <NA> NA NA NA #> 3 3 A 21.93166 1.5 2.030517 <NA> NA NA NA #> 5 5 A 18.51504 2.5 2.364694 <NA> NA NA NA #> 7 7 A 19.47544 3.5 3.179758 <NA> NA NA NA #> 9 9 A 19.35298 4.5 2.565109 <NA> NA NA NA #> 10 10 A 21.64282 5.0 2.227502 <NA> NA NA NA #> 14 14 B 30.75910 7.0 3.942981 A 30.75910 8.740280 5.599236 #> 12 12 B 29.99442 6.0 5.148122 B 29.99442 7.067339 3.314140 #> 18 18 B 30.54008 9.0 4.980502 B 30.54008 9.596294 2.891738 #> 16 16 B 29.97200 8.0 3.757939 C 29.97200 8.900935 4.677009 #> 11 11 B 30.33911 5.5 2.710793 <NA> NA NA NA #> 13 13 B 31.15830 6.5 4.952516 <NA> NA NA NA #> 15 15 B 29.87766 7.5 4.827640 <NA> NA NA NA #> 17 17 B 29.66927 8.5 3.687931 <NA> NA NA NA #> 19 19 B 30.21610 9.5 7.083637 <NA> NA NA NA #> 20 20 B 30.05322 10.0 5.710445 <NA> NA NA NA #> 21 21 <NA> NA NA NA A 20.78341 2.512918 2.696096 #> 22 23 <NA> NA NA NA A 18.76287 5.003790 3.670774 #> 23 25 <NA> NA NA NA A 18.07945 4.510136 1.465740 #> 24 27 <NA> NA NA NA A 19.98976 3.504128 2.627141 #> 25 29 <NA> NA NA NA A 20.28916 4.985899 2.365247 #> 26 30 <NA> NA NA NA A 20.57337 6.370713 3.869849 #> 27 31 <NA> NA NA NA B 30.46191 6.681330 2.236143 #> 28 33 <NA> NA NA NA B 29.54631 6.551161 4.297401 #> 29 35 <NA> NA NA NA B 31.98209 8.118962 4.011871 #> 30 37 <NA> NA NA NA B 29.38564 9.446146 5.044527 #> 31 39 <NA> NA NA NA B 30.13084 9.444555 2.989280 #> 32 40 <NA> NA NA NA C 29.54043 10.408366 5.223684
# Check that strata are dynamic: one unit jumped from "A" to "C", one from "B" # to "A", and one from "B" to "C": with(s, table(strata1, strata2, useNA = "ifany"))
#> strata2 #> strata1 A B C <NA> #> A 3 0 1 6 #> B 1 2 1 6 #> <NA> 6 5 1 0
# Now recreate object des2 using the updated s2 data des2 <- e.svydesign(ids = ~id, strata = ~strata, weights = ~w, data = s2) # And re-do the comparison: ## rho.STRAT = "Full" d <- svyDelta(expression(y.2 - y.1), des1, des2) details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 484.2381 330.669 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 104589.1 134455.2 0.2246668 12556.57 109342
## rho.STRAT = "noJump" d <- svyDelta(expression(y.2 - y.1), des1, des2, rho.STRAT = "noJump") details(d, print.call = FALSE)
#> Delta SE #> y.2 - y.1 484.2381 329.7205 #> #> Details: #> #> n1 n2 nc overlap V1 V2 Vind rho CoV V #> 1 20 20 8 0.4 29866.05 104589.1 134455.2 0.2302706 12869.77 108715.6
## The results are slightly different, as expected. ## However, the approximation obtained adopting option rho.STRAT = "noJump" ## is *very good*, and - importantly - *far more efficient* (computation is way ## less memory-hungry and remarkably faster). ## You may want to run the same comparison using real-world stratified samples ## to really appreciate the efficiency gain.