des.addvars.Rd
Modifies an analytic object by adding new variables to it.
des.addvars(design, ...)
design | Object of class |
---|---|
... |
|
This function adds to the data frame contained in design
the new variables defined by the tag = expr
arguments. A tag
can be specified either by means of an identifier or by a character string; expr
can be any expression that it makes sense to evaluate in the design
environment.
For each argument tag = expr
bound to the formal argument ...
the added column will have name given by the tag
value and values obtained by evaluating the expr
expression on design
. Any input expression not supplied with a tag
will be ignored and will therefore have no effect on the des.addvars
return value.
Variables to be added to the input object have to be new: namely it is not possible to use des.addvars
to modify the values in a pre-existing design
column. This an intentional feature meant to safeguard the integrity of the relations between survey data and sampling design metadata stored in design
.
An object of the same class of design
, containing new variables but supplied with exactly the same metadata.
Zardetto, D. (2015) “ReGenesees: an Advanced R System for Calibration, Estimation and Sampling Error Assessment in Complex Sample Surveys”. Journal of Official Statistics, 31(2), 177-203. doi: https://doi.org/10.1515/jos-2015-0013.
e.svydesign
to bind survey data and sampling design metadata, e.calibrate
for calibrating weights.
data(data.examples) # Creation of an analytic object: des<-e.svydesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight) # Adding the new 'ones' variable to estimate the number # of final units in the population: des<-des.addvars(des,ones=1) svystatTM(des,~ones)#> Total SE #> ones 924101.3 17172.68# Recoding a qualitative variable: des<-des.addvars(des,agerange=factor(ifelse(age5c==1, "young","not-young"))) svystatTM(des,~agerange,estimator="Mean")#> Mean SE #> agerangenot-young 0.8604824 0.006775833 #> agerangeyoung 0.1395176 0.006775833#> agerange Mean.income SE.Mean.income CI.l(95%).Mean.income #> not-young not-young 1303.7618 9.081213 1285.9630 #> young young 962.6162 17.684967 927.9543 #> CI.u(95%).Mean.income #> not-young 1321.5607 #> young 997.2781# Algebraic operations on numeric variables: des<-des.addvars(des,z2=z^2) svystatTM(des,~z2,estimator="Mean")#> Mean SE #> z2 20623.27 356.5924# A more interesting example: estimating the # percentage of population with income below # the poverty threshold (defined as 0.6 times # the median income for the whole population): Median.Income <- coef(svystatQ(des, ~income,probs=0.5)) Median.Income#> income #> 1244des <- des.addvars(des, status = factor( ifelse(income < (0.6 * Median.Income), "poor", "non-poor") ) ) svystatTM(des,~status,estimator="Mean")#> Mean SE #> statusnon-poor 0.8842155 0.006131183 #> statuspoor 0.1157845 0.006131183#> status Mean.income SE.Mean.income #> non-poor non-poor 1349.3443 7.855904 #> poor poor 544.5881 10.161308### NOTE: Procedure above yields *correct point estimates* of the share of poor ### population and their average income, while *variance estimation is ### approximated* since we neglected the sampling variability of the ### estimated poverty threshold.