Merge New Survey Data into Design Objects

Modifies an analytic object by joining the original survey data with a new data frame via a common key.

des.merge(design, data, key)

Arguments

design	Object of class `analytic` (or inheriting from it) containing survey data and sampling design metadata.
data	Data frame containing a key variable, plus new variables to be merged to `design` data.
key	Formula identifying the common key variable to be used for merging.

Details

This function updates the survey variables contained into design (i.e. design$variables), by merging the original data with those contained into the data data frame. The merge operation exploits a single variable key, which must be common to both design and data.

The function preserves both the original ordering of the survey data stored into design, as well as all the original sampling design metadata.

The variable referenced by key must be a valid unique key for both design and data: it must not contain duplicated values, nor NAs. Moreover, the values of key in design and data must be in 1:1 correspondence. These requirements are meant to ensure that the new survey data (that is the merged ones) will have exactly the same number of rows as the old survey data stored into design.

Should design and data contain further common variables besides the key, only their original design version will be retained. Thus, des.merge cannot modify any pre-existing design columns. This an intentional feature intended to safeguard the integrity of the relations between survey data and sampling design metadata stored in design.

Practical Purpose

In the field of Official Statistics, it is not infrequent that calibration weights must be computed even several months before the target variables of the survey are made available for estimation. Such a time lag follows from the fact that target variables typically undergo much more thorough editing and imputation procedures than auxiliary variables.

In such production scenarios, function des.merge allows to tackle the task of computing estimates and errors for the fresh-released target variables without any need of repeating the calibration step. Indeed, by using the function, one can join the data contained into an already calibrated design object with new data made available only after the calibration step. The merge operation is made easy and safe, and preserves all the original calibration metadata (e.g. those needed for variance estimation).

Value

An object of the same class of design, containing additional survey data but supplied with exactly the same metadata.

Examples

data(data.examples)

# Create a design object:
des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
     weights=~weight)

# Create a calibrated design object as well (e.g. using population totals
# stored inside pop03p):
cal<-e.calibrate(design=des,df.population=pop03p,
                 calmodel=~marstat-1,partition=~sex,calfun="logit",
                 bounds=bounds)

# Lastly create a new data frame to be merged into des and cal:
set.seed(12345)    # RNG seed fixed for reproducibility
new.data<-example[,c("income","key")]
new.data$income <- 1000 + new.data$income    # altered income values
new.data$NEW.f<-factor(sample(c("A","B"),nrow(new.data),rep=TRUE))
new.data$NEW.n<-rnorm(nrow(new.data),10,2)
new.data <- new.data[sample(1:nrow(new.data)), ]    # rows ordering changed
head(new.data)
#>      income  key NEW.f     NEW.n
#> 724    1471  724     A  9.515012
#> 1161   1549 1161     B  7.222381
#> 664    1849  664     B 12.730018
#> 2872   1367 2872     A 11.567735
#> 1026   2310 1026     A 12.503473
#> 1499   2563 1499     B 10.144981

###########################################################
# Example 1: merge new data into a non calibrated design. #
###########################################################

# Merge new data inside des (note the warning on income):
des2<-des.merge(design=des,data=new.data,key=~key)
#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income.
#> Only their 'design' version will be retained

# Compare visually:
## before:
head(des$variables)
#>   towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2
#> 1    147   3103   1  485.8     803           26  0      7      8  0  0  0  0  0
#> 2    147   3103   2  485.8     803           26  0      7      8  0  0  0  1  1
#> 3    147   3109   3  485.8     803           26  0      7      8  0  0  0  1  1
#> 4    147   3111   4  485.8     803           26  0      7      8  0  0  0  0  0
#> 5    147   3120   5  485.8     803           26  0      7      8  0  0  1  1  1
#> 6    147   3121   6  485.8     803           26  0      7      8  0  0  0  0  0
#>   y3 age5c age10c sex   marstat         z income
#> 1  0     3      5   f unmarried 148.32432   1158
#> 2  0     2      4   f   married  88.57746   1268
#> 3  0     3      6   f   married 115.07377    108
#> 4  0     4      7   f   married  86.37647   1700
#> 5  0     2      4   f   married 110.52172    537
#> 6  0     3      5   f   married 134.40092   2143
## after:
head(des2$variables)
#>   towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2
#> 1    147   3103   1  485.8     803           26  0      7      8  0  0  0  0  0
#> 2    147   3103   2  485.8     803           26  0      7      8  0  0  0  1  1
#> 3    147   3109   3  485.8     803           26  0      7      8  0  0  0  1  1
#> 4    147   3111   4  485.8     803           26  0      7      8  0  0  0  0  0
#> 5    147   3120   5  485.8     803           26  0      7      8  0  0  1  1  1
#> 6    147   3121   6  485.8     803           26  0      7      8  0  0  0  0  0
#>   y3 age5c age10c sex   marstat         z income NEW.f     NEW.n
#> 1  0     3      5   f unmarried 148.32432   1158     B  9.650755
#> 2  0     2      4   f   married  88.57746   1268     A  8.658767
#> 3  0     3      6   f   married 115.07377    108     B 11.014852
#> 4  0     4      7   f   married  86.37647   1700     B 12.494869
#> 5  0     2      4   f   married 110.52172    537     B  7.503449
#> 6  0     3      5   f   married 134.40092   2143     B  6.130563

# New data can be used as usual:
svystatTM(des2,~NEW.n,~NEW.f,vartype="cvpct")
#>   NEW.f Total.NEW.n CV%.Total.NEW.n
#> A     A     4754340        2.776665
#> B     B     4466486        2.701605

# Old data are unaffected, as it must be:
svystatTM(des,~income,estimator="Mean",vartype="cvpct")
#>            Mean       CV%
#> income 1256.166 0.6808451
svystatTM(des2,~income,estimator="Mean",vartype="cvpct")
#>            Mean       CV%
#> income 1256.166 0.6808451

#######################################################
# Example 2: merge new data into a calibrated design. #
#######################################################

# Merge new data inside cal (note the warning on income):
cal2<-des.merge(design=cal,data=new.data,key=~key)
#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income.
#> Only their 'design' version will be retained

# Compare visually:
## before:
head(cal$variables)
#>   towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2
#> 1    147   3103   1  485.8     803           26  0      7      8  0  0  0  0  0
#> 2    147   3103   2  485.8     803           26  0      7      8  0  0  0  1  1
#> 3    147   3109   3  485.8     803           26  0      7      8  0  0  0  1  1
#> 4    147   3111   4  485.8     803           26  0      7      8  0  0  0  0  0
#> 5    147   3120   5  485.8     803           26  0      7      8  0  0  1  1  1
#> 6    147   3121   6  485.8     803           26  0      7      8  0  0  0  0  0
#>   y3 age5c age10c sex   marstat         z income weight.cal
#> 1  0     3      5   f unmarried 148.32432   1158   486.2240
#> 2  0     2      4   f   married  88.57746   1268   483.3182
#> 3  0     3      6   f   married 115.07377    108   483.3182
#> 4  0     4      7   f   married  86.37647   1700   483.3182
#> 5  0     2      4   f   married 110.52172    537   483.3182
#> 6  0     3      5   f   married 134.40092   2143   483.3182
## after:
head(cal2$variables)
#>   towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2
#> 1    147   3103   1  485.8     803           26  0      7      8  0  0  0  0  0
#> 2    147   3103   2  485.8     803           26  0      7      8  0  0  0  1  1
#> 3    147   3109   3  485.8     803           26  0      7      8  0  0  0  1  1
#> 4    147   3111   4  485.8     803           26  0      7      8  0  0  0  0  0
#> 5    147   3120   5  485.8     803           26  0      7      8  0  0  1  1  1
#> 6    147   3121   6  485.8     803           26  0      7      8  0  0  0  0  0
#>   y3 age5c age10c sex   marstat         z income weight.cal NEW.f     NEW.n
#> 1  0     3      5   f unmarried 148.32432   1158   486.2240     B  9.650755
#> 2  0     2      4   f   married  88.57746   1268   483.3182     A  8.658767
#> 3  0     3      6   f   married 115.07377    108   483.3182     B 11.014852
#> 4  0     4      7   f   married  86.37647   1700   483.3182     B 12.494869
#> 5  0     2      4   f   married 110.52172    537   483.3182     B  7.503449
#> 6  0     3      5   f   married 134.40092   2143   483.3182     B  6.130563

# New data can be used as usual:
svystatTM(cal2,~NEW.n,~NEW.f,vartype="cvpct")
#>   NEW.f Total.NEW.n CV%.Total.NEW.n
#> A     A     4725583        1.979358
#> B     B     4437172        2.050854

# Old data are unaffected, as it must be:
svystatTM(cal,~income,estimator="Mean",vartype="cvpct")
#>            Mean      CV%
#> income 1255.989 0.681657
svystatTM(cal2,~income,estimator="Mean",vartype="cvpct")
#>            Mean      CV%
#> income 1255.989 0.681657

Merge New Survey Data into Design Objects

Arguments

Details

Practical Purpose

Value

See also

Examples

Contents

Author