kottby.Rd
Calculates estimates, standard errors and confidence intervals for totals and means in subpopulations.
kottby(deskott, y, by = NULL, estimator = c("total", "mean"), vartype = c("se", "cv", "cvpct", "var"), conf.int = FALSE, conf.lev = 0.95)
deskott | Object of class |
---|---|
y | Formula defining the variables of interest. |
by | Formula specifying the variables that define the "estimation domains". If |
estimator |
|
vartype |
|
conf.int | Boolean ( |
conf.lev | Probability specifying the desired confidence level: the default value is |
This function calculates weighted estimates for totals and means using suitable weights depending on the class of deskott
: calibrated weights for class kott.cal.design
and direct weights otherwise. Standard errors are calculated using the extended DAGJK method [Kott 99-01].
The mandatory argument y
identifies the variables of interest, that is the variables for which estimates are to be calculated. The corresponding formula must be of the type y=~var1+…+varn
. The deskott
variables referenced by y
must be numeric
or factor
and must not contain any missing value (NA
). It is admissible to specify for y
"mixed" formulas that simultaneously contain quantitative (numeric
) variables and qualitative (factor
) variables.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by kottby
refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. The deskott
variables referenced by by
(if any) must be factor
and must not contain any missing value (NA
).
The optional argument estimator
makes it possible to select the desired estimator. If
estimator="total"
(the default option), kottby
calculates, for a given variable of interest vark
, the estimate of the total (when vark
is numeric
) or the estimate of the absolute frequency distribution (when vark
is factor
). Similarly, if estimator="mean"
, the function calculates the estimate of the mean (when vark
is numeric
) or the the estimate of the relative frequency distribution (when vark
is factor
).
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
.
The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list
(typically a list made up of data frames).
The advantage of the DAGJK method over the traditional jackknife is that, unlike the latter, it remains computationally manageable even when dealing with "complex and big" surveys (tens of thousands of PSUs arranged in a large number of strata with widely varying sizes). In fact, the DAGJK method is known to provide, for a broad range of sampling designs and estimators, (near) unbiased standard error estimates even with a "small" number (e.g. a few tens) of replicate weights. On the other hand, if the number of replicates is not large, it seems defensible to use a t distribution (rather than a normal distribution) for calculating the confidence intervals. In line with what was proposed in [Kott 99-01], given an input kott.design
object with nrg
random groups, kottby
builds the confidence intervals making use of a t distribution with nrg-1
degrees of freedom.
Kott, Phillip S. (1999) "The Extended Delete-A-Group Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.
Kott, Phillip S. (2001) "The Delete-A-Group Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521-526.
kott.ratio
for estimating ratios between totals, kott.quantile
for estimating quantiles, kott.regcoef
for estimating regression coefficients and kottby.user
for calculating estimates based on user-defined estimators.
data(data.examples) # Creation of a kott.design object: kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight,nrg=15) # Estimate of the total of 3 quantitative variables for the whole # population: kottby(kdes,~y1+y2+y3)#> $y1 #> total SE #> y1 381111.8 16251.66 #> #> $y2 #> total SE #> y2 356633.5 16291.13 #> #> $y3 #> total SE #> y3 24478.3 2439.861 #># Estimate of the total of the same 3 variables by sex: kottby(kdes,~y1+y2+y3,~sex)#> $y1 #> f m #> total 191585.8 189526 #> SE 11199.24 8314.135 #> #> $y2 #> f m #> total 181179.5 175454 #> SE 11663.96 7591.745 #> #> $y3 #> f m #> total 10406.3 14072 #> SE 1257.641 2156.7 #># Estimate of the mean of the same 3 variables by marstat and sex: kottby(kdes,~y1+y2+y3,~marstat:sex,estimator="mean")#> $y1 #> married.f unmarried.f widowed.f married.m unmarried.m widowed.m #> mean 0.3907492 0.4235675 0.4626949 0.4067734 0.4394581 0.4018789 #> SE 0.02016379 0.02838279 0.04654907 0.01356993 0.01807067 0.04700886 #> #> $y2 #> married.f unmarried.f widowed.f married.m unmarried.m widowed.m #> mean 0.3669468 0.4042538 0.440711 0.3826893 0.3906832 0.3952614 #> SE 0.02073544 0.02765722 0.0452096 0.01490486 0.0140276 0.04720339 #> #> $y3 #> married.f unmarried.f widowed.f married.m unmarried.m widowed.m #> mean 0.02380235 0.01931368 0.02198388 0.02408417 0.0487749 0.006617401 #> SE 0.005086532 0.006118989 0.01280707 0.006835304 0.008965009 0.006673789 #># Estimate of the absolute frequency distribution of the qualitative # variable age5c for the whole population: kottby(kdes,~age5c)#> total SE #> age5c.1 128928.4 6872.904 #> age5c.2 294575.1 11047.742 #> age5c.3 356463.9 10033.262 #> age5c.4 122897.6 9700.259 #> age5c.5 21236.3 2459.820# Estimate of the relative frequency distribution of the qualitative # variable marstat by sex: kottby(kdes,~marstat,~sex,estimator="mean")#> $f #> mean SE #> marstat.married 0.58198769 0.014765985 #> marstat.unmarried 0.33855429 0.012208442 #> marstat.widowed 0.07945802 0.007813969 #> #> $m #> mean SE #> marstat.married 0.57948707 0.014615787 #> marstat.unmarried 0.33810508 0.012797112 #> marstat.widowed 0.08240785 0.007581132 #># The same with confidence intervals at a confidence level of 0.9: kottby(kdes,~marstat,~sex,estimator="mean",conf.int=TRUE,conf.lev=0.9)#> $f #> mean SE l.conf(90%) u.conf(90%) #> marstat.married 0.58198769 0.014765985 0.5559802 0.60799517 #> marstat.unmarried 0.33855429 0.012208442 0.3170514 0.36005714 #> marstat.widowed 0.07945802 0.007813969 0.0656952 0.09322084 #> #> $m #> mean SE l.conf(90%) u.conf(90%) #> marstat.married 0.57948707 0.014615787 0.55374414 0.60523001 #> marstat.unmarried 0.33810508 0.012797112 0.31556540 0.36064476 #> marstat.widowed 0.08240785 0.007581132 0.06905512 0.09576057 #># Quantitative and qualitative variables together: estimate of the # total for y3 and of the absolute frequency distribution of marstat, # by sex: kottby(kdes,~y3+marstat,~sex)#> $y3 #> f m #> total 10406.3 14072 #> SE 1257.641 2156.7 #> #> $marstat #> $marstat$f #> total SE #> marstat.married 273569.6 10364.019 #> marstat.unmarried 159141.1 7800.659 #> marstat.widowed 37350.1 3747.368 #> #> $marstat$m #> total SE #> marstat.married 263110.6 6942.904 #> marstat.unmarried 153513.4 9042.946 #> marstat.widowed 37416.5 3807.603 #> #># Lonely PSUs do not give rise to NaNs in the standard errors: kdes.lpsu<-kottdesign(data=example,ids=~towcod+famcod,strata=~stratum, weights=~weight,nrg=15)#> Warning: Lonely PSUs in strata: 902, 903, 904, 905, 906, 907, 1004, 1005, 1006, 1007, 1008, 1102, 1103, 1104, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009, 3010, 3105, 3106, 3107, 5407, 5408, 5409, 5410, 5411, 5412, 5413, 5415, 5416, 5502, 5503, 9304, 9305, 9306, 9307, 9308, 9309, 9310, 9312kottby(kdes.lpsu,~x1+x2+x3)#> $x1 #> total SE #> x1 57412 3383.581 #> #> $x2 #> total SE #> x2 19635.2 2777.078 #> #> $x3 #> total SE #> x3 20552.6 1848.088 #>