Modifies an analytic object by joining the original survey data with a new data frame via a common key.

des.merge(design, data, key)

Arguments

design

Object of class analytic (or inheriting from it) containing survey data and sampling design metadata.

data

Data frame containing a key variable, plus new variables to be merged to design data.

key

Formula identifying the common key variable to be used for merging.

Details

This function updates the survey variables contained into design (i.e. design$variables), by merging the original data with those contained into the data data frame. The merge operation exploits a single variable key, which must be common to both design and data.

The function preserves both the original ordering of the survey data stored into design, as well as all the original sampling design metadata.

The variable referenced by key must be a valid unique key for both design and data: it must not contain duplicated values, nor NAs. Moreover, the values of key in design and data must be in 1:1 correspondence. These requirements are meant to ensure that the new survey data (that is the merged ones) will have exactly the same number of rows as the old survey data stored into design.

Should design and data contain further common variables besides the key, only their original design version will be retained. Thus, des.merge cannot modify any pre-existing design columns. This an intentional feature intended to safeguard the integrity of the relations between survey data and sampling design metadata stored in design.

Practical Purpose

In the field of Official Statistics, it is not infrequent that calibration weights must be computed even several months before the target variables of the survey are made available for estimation. Such a time lag follows from the fact that target variables typically undergo much more thorough editing and imputation procedures than auxiliary variables.

In such production scenarios, function des.merge allows to tackle the task of computing estimates and errors for the fresh-released target variables without any need of repeating the calibration step. Indeed, by using the function, one can join the data contained into an already calibrated design object with new data made available only after the calibration step. The merge operation is made easy and safe, and preserves all the original calibration metadata (e.g. those needed for variance estimation).

Value

An object of the same class of design, containing additional survey data but supplied with exactly the same metadata.

See also

e.svydesign to bind survey data and sampling design metadata, e.calibrate for calibrating weights, des.addvars to add new variables to design objects.

Examples

data(data.examples) # Create a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Create a calibrated design object as well (e.g. using population totals # stored inside pop03p): cal<-e.calibrate(design=des,df.population=pop03p, calmodel=~marstat-1,partition=~sex,calfun="logit", bounds=bounds) # Lastly create a new data frame to be merged into des and cal: set.seed(12345) # RNG seed fixed for reproducibility new.data<-example[,c("income","key")] new.data$income <- 1000 + new.data$income # altered income values new.data$NEW.f<-factor(sample(c("A","B"),nrow(new.data),rep=TRUE)) new.data$NEW.n<-rnorm(nrow(new.data),10,2) new.data <- new.data[sample(1:nrow(new.data)), ] # rows ordering changed head(new.data)
#> income key NEW.f NEW.n #> 724 1471 724 A 9.515012 #> 1161 1549 1161 B 7.222381 #> 664 1849 664 B 12.730018 #> 2872 1367 2872 A 11.567735 #> 1026 2310 1026 A 12.503473 #> 1499 2563 1499 B 10.144981
########################################################### # Example 1: merge new data into a non calibrated design. # ########################################################### # Merge new data inside des (note the warning on income): des2<-des.merge(design=des,data=new.data,key=~key)
#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income. #> Only their 'design' version will be retained
# Compare visually: ## before: head(des$variables)
#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income #> 1 0 3 5 f unmarried 148.32432 1158 #> 2 0 2 4 f married 88.57746 1268 #> 3 0 3 6 f married 115.07377 108 #> 4 0 4 7 f married 86.37647 1700 #> 5 0 2 4 f married 110.52172 537 #> 6 0 3 5 f married 134.40092 2143
## after: head(des2$variables)
#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income NEW.f NEW.n #> 1 0 3 5 f unmarried 148.32432 1158 B 9.650755 #> 2 0 2 4 f married 88.57746 1268 A 8.658767 #> 3 0 3 6 f married 115.07377 108 B 11.014852 #> 4 0 4 7 f married 86.37647 1700 B 12.494869 #> 5 0 2 4 f married 110.52172 537 B 7.503449 #> 6 0 3 5 f married 134.40092 2143 B 6.130563
# New data can be used as usual: svystatTM(des2,~NEW.n,~NEW.f,vartype="cvpct")
#> NEW.f Total.NEW.n CV%.Total.NEW.n #> A A 4754340 2.776665 #> B B 4466486 2.701605
# Old data are unaffected, as it must be: svystatTM(des,~income,estimator="Mean",vartype="cvpct")
#> Mean CV% #> income 1256.166 0.6808451
svystatTM(des2,~income,estimator="Mean",vartype="cvpct")
#> Mean CV% #> income 1256.166 0.6808451
####################################################### # Example 2: merge new data into a calibrated design. # ####################################################### # Merge new data inside cal (note the warning on income): cal2<-des.merge(design=cal,data=new.data,key=~key)
#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income. #> Only their 'design' version will be retained
# Compare visually: ## before: head(cal$variables)
#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income weight.cal #> 1 0 3 5 f unmarried 148.32432 1158 486.2240 #> 2 0 2 4 f married 88.57746 1268 483.3182 #> 3 0 3 6 f married 115.07377 108 483.3182 #> 4 0 4 7 f married 86.37647 1700 483.3182 #> 5 0 2 4 f married 110.52172 537 483.3182 #> 6 0 3 5 f married 134.40092 2143 483.3182
## after: head(cal2$variables)
#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income weight.cal NEW.f NEW.n #> 1 0 3 5 f unmarried 148.32432 1158 486.2240 B 9.650755 #> 2 0 2 4 f married 88.57746 1268 483.3182 A 8.658767 #> 3 0 3 6 f married 115.07377 108 483.3182 B 11.014852 #> 4 0 4 7 f married 86.37647 1700 483.3182 B 12.494869 #> 5 0 2 4 f married 110.52172 537 483.3182 B 7.503449 #> 6 0 3 5 f married 134.40092 2143 483.3182 B 6.130563
# New data can be used as usual: svystatTM(cal2,~NEW.n,~NEW.f,vartype="cvpct")
#> NEW.f Total.NEW.n CV%.Total.NEW.n #> A A 4725583 1.979358 #> B B 4437172 2.050854
# Old data are unaffected, as it must be: svystatTM(cal,~income,estimator="Mean",vartype="cvpct")
#> Mean CV% #> income 1255.989 0.681657
svystatTM(cal2,~income,estimator="Mean",vartype="cvpct")
#> Mean CV% #> income 1255.989 0.681657