des.merge.Rd
Modifies an analytic object by joining the original survey data with a new data frame via a common key.
des.merge(design, data, key)
design | Object of class |
---|---|
data | Data frame containing a key variable, plus new variables to be merged to |
key | Formula identifying the common key variable to be used for merging. |
This function updates the survey variables contained into design
(i.e. design$variables
), by merging the original data with those contained into the data
data frame. The merge operation exploits a single variable key
, which must be common to both design
and data
.
The function preserves both the original ordering of the survey data stored into design
, as well as all the original sampling design metadata.
The variable referenced by key
must be a valid unique key for both design
and data
: it must not contain duplicated values, nor NA
s. Moreover, the values of key
in design
and data
must be in 1:1 correspondence. These requirements are meant to ensure that the new survey data (that is the merged ones) will have exactly the same number of rows as the old survey data stored into design
.
Should design
and data
contain further common variables besides the key
, only their original design
version will be retained. Thus, des.merge
cannot modify any pre-existing design
columns. This an intentional feature intended to safeguard the integrity of the relations between survey data and sampling design metadata stored in design
.
In the field of Official Statistics, it is not infrequent that calibration weights must be computed even several months before the target variables of the survey are made available for estimation. Such a time lag follows from the fact that target variables typically undergo much more thorough editing and imputation procedures than auxiliary variables.
In such production scenarios, function des.merge
allows to tackle the task of computing estimates and errors for the fresh-released target variables without any need of repeating the calibration step. Indeed, by using the function, one can join the data contained into an already calibrated design
object with new data
made available only after the calibration step. The merge operation is made easy and safe, and preserves all the original calibration metadata (e.g. those needed for variance estimation).
An object of the same class of design
, containing additional survey data but supplied with exactly the same metadata.
e.svydesign
to bind survey data and sampling design metadata, e.calibrate
for calibrating weights, des.addvars
to add new variables to design objects.
data(data.examples) # Create a design object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Create a calibrated design object as well (e.g. using population totals # stored inside pop03p): cal<-e.calibrate(design=des,df.population=pop03p, calmodel=~marstat-1,partition=~sex,calfun="logit", bounds=bounds) # Lastly create a new data frame to be merged into des and cal: set.seed(12345) # RNG seed fixed for reproducibility new.data<-example[,c("income","key")] new.data$income <- 1000 + new.data$income # altered income values new.data$NEW.f<-factor(sample(c("A","B"),nrow(new.data),rep=TRUE)) new.data$NEW.n<-rnorm(nrow(new.data),10,2) new.data <- new.data[sample(1:nrow(new.data)), ] # rows ordering changed head(new.data)#> income key NEW.f NEW.n #> 724 1471 724 A 9.515012 #> 1161 1549 1161 B 7.222381 #> 664 1849 664 B 12.730018 #> 2872 1367 2872 A 11.567735 #> 1026 2310 1026 A 12.503473 #> 1499 2563 1499 B 10.144981########################################################### # Example 1: merge new data into a non calibrated design. # ########################################################### # Merge new data inside des (note the warning on income): des2<-des.merge(design=des,data=new.data,key=~key)#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income. #> Only their 'design' version will be retained#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income #> 1 0 3 5 f unmarried 148.32432 1158 #> 2 0 2 4 f married 88.57746 1268 #> 3 0 3 6 f married 115.07377 108 #> 4 0 4 7 f married 86.37647 1700 #> 5 0 2 4 f married 110.52172 537 #> 6 0 3 5 f married 134.40092 2143#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income NEW.f NEW.n #> 1 0 3 5 f unmarried 148.32432 1158 B 9.650755 #> 2 0 2 4 f married 88.57746 1268 A 8.658767 #> 3 0 3 6 f married 115.07377 108 B 11.014852 #> 4 0 4 7 f married 86.37647 1700 B 12.494869 #> 5 0 2 4 f married 110.52172 537 B 7.503449 #> 6 0 3 5 f married 134.40092 2143 B 6.130563#> NEW.f Total.NEW.n CV%.Total.NEW.n #> A A 4754340 2.776665 #> B B 4466486 2.701605#> Mean CV% #> income 1256.166 0.6808451#> Mean CV% #> income 1256.166 0.6808451####################################################### # Example 2: merge new data into a calibrated design. # ####################################################### # Merge new data inside cal (note the warning on income): cal2<-des.merge(design=cal,data=new.data,key=~key)#> Warning: Common variables found in 'design' and 'data' (besides the 'key'): income. #> Only their 'design' version will be retained#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income weight.cal #> 1 0 3 5 f unmarried 148.32432 1158 486.2240 #> 2 0 2 4 f married 88.57746 1268 483.3182 #> 3 0 3 6 f married 115.07377 108 483.3182 #> 4 0 4 7 f married 86.37647 1700 483.3182 #> 5 0 2 4 f married 110.52172 537 483.3182 #> 6 0 3 5 f married 134.40092 2143 483.3182#> towcod famcod key weight stratum SUPERSTRATUM sr regcod procod x1 x2 x3 y1 y2 #> 1 147 3103 1 485.8 803 26 0 7 8 0 0 0 0 0 #> 2 147 3103 2 485.8 803 26 0 7 8 0 0 0 1 1 #> 3 147 3109 3 485.8 803 26 0 7 8 0 0 0 1 1 #> 4 147 3111 4 485.8 803 26 0 7 8 0 0 0 0 0 #> 5 147 3120 5 485.8 803 26 0 7 8 0 0 1 1 1 #> 6 147 3121 6 485.8 803 26 0 7 8 0 0 0 0 0 #> y3 age5c age10c sex marstat z income weight.cal NEW.f NEW.n #> 1 0 3 5 f unmarried 148.32432 1158 486.2240 B 9.650755 #> 2 0 2 4 f married 88.57746 1268 483.3182 A 8.658767 #> 3 0 3 6 f married 115.07377 108 483.3182 B 11.014852 #> 4 0 4 7 f married 86.37647 1700 483.3182 B 12.494869 #> 5 0 2 4 f married 110.52172 537 483.3182 B 7.503449 #> 6 0 3 5 f married 134.40092 2143 483.3182 B 6.130563#> NEW.f Total.NEW.n CV%.Total.NEW.n #> A A 4725583 1.979358 #> B B 4437172 2.050854#> Mean CV% #> income 1255.989 0.681657#> Mean CV% #> income 1255.989 0.681657