des.addvars.RdModifies an analytic object by adding new variables to it.
des.addvars(design, ...)
| design | Object of class |
|---|---|
| ... |
|
This function adds to the data frame contained in design the new variables defined by the tag = expr arguments. A tag can be specified either by means of an identifier or by a character string; expr can be any expression that it makes sense to evaluate in the design environment.
For each argument tag = expr bound to the formal argument ... the added column will have name given by the tag value and values obtained by evaluating the expr expression on design. Any input expression not supplied with a tag will be ignored and will therefore have no effect on the des.addvars return value.
Variables to be added to the input object have to be new: namely it is not possible to use des.addvars to modify the values in a pre-existing design column. This an intentional feature meant to safeguard the integrity of the relations between survey data and sampling design metadata stored in design.
An object of the same class of design, containing new variables but supplied with exactly the same metadata.
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi:10.1515/jos-2015-0013 .
e.svydesign to bind survey data and sampling design metadata, e.calibrate for calibrating weights.
data(data.examples) # Creation of an analytic object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Adding the new 'ones' variable to estimate the number # of final units in the population: des<-des.addvars(des,ones=1) svystatTM(des,~ones)#> Total SE #> ones 924101.3 17172.68# Recoding a qualitative variable: des<-des.addvars(des,agerange=factor(ifelse(age5c==1, "young","not-young"))) svystatTM(des,~agerange,estimator="Mean")#> Mean SE #> agerangenot-young 0.8604824 0.006775833 #> agerangeyoung 0.1395176 0.006775833#> agerange Mean.income SE.Mean.income CI.l(95%).Mean.income #> not-young not-young 1303.7618 9.081213 1285.9630 #> young young 962.6162 17.684967 927.9543 #> CI.u(95%).Mean.income #> not-young 1321.5607 #> young 997.2781# Algebraic operations on numeric variables: des<-des.addvars(des,z2=z^2) svystatTM(des,~z2,estimator="Mean")#> Mean SE #> z2 20623.27 356.5924# A more interesting example: estimating the # percentage of population with income below # the poverty threshold (defined as 0.6 times # the median income for the whole population): Median.Income <- coef(svystatQ(des, ~income,probs=0.5)) Median.Income#> income #> 1244des <- des.addvars(des, status = factor( ifelse(income < (0.6 * Median.Income), "poor", "non-poor") ) ) svystatTM(des,~status,estimator="Mean")#> Mean SE #> statusnon-poor 0.8842155 0.006131183 #> statuspoor 0.1157845 0.006131183#> status Mean.income SE.Mean.income #> non-poor non-poor 1349.3443 7.855904 #> poor poor 544.5881 10.161308### NOTE: Procedure above yields *correct point estimates* of the share of poor ### population and their average income, while *variance estimation is ### approximated* since we neglected the sampling variability of the ### estimated poverty threshold.