kott.quantile.Rd
Calculates estimates, standard errors and confidence intervals for quantiles in subpopulations.
kott.quantile(deskott, y, probs = c(0.25,0.50,0.75), by = NULL, vartype = c("se", "cv", "cvpct", "var"), conf.int = FALSE, conf.lev = 0.95)
deskott | Object of class |
---|---|
y | Formula defining the variable of interest. |
probs | Vector of probability values to be used to calculate the quantiles estimates. The default value selects the quartiles estimates. |
by | Formula specifying the variables that define the "estimation domains". If |
vartype |
|
conf.int | Boolean ( |
conf.lev | Probability specifying the desired confidence level: the default value is |
This function calculates weighted estimates for the quantiles of a quantitative variable using suitable weights depending on the class of deskott
: calibrated weights for class kott.cal.design
and direct weights otherwise. Standard errors are calculated using the extended DAGJK method [Kott 99-01].
The mandatory argument y
identifies the variable of interest, that is the variable for which quantiles estimates are to be calculated. The deskott
variable referenced by y
must be numeric
and must not contain any missing value (NA
).
The optional argument probs
specifies the probability values (0<=probs[i]<=1
) for which quantiles estimates must be calculated; the default option selects quartiles estimates. If probs[i]
is equal to 0
(1
) the corresponding "estimate" produced by kott.quantile
coincides with the smallest (largest) observed value for the y
variable.
The optional argument by
specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL
(the default option), the estimates produced by kottby
refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2
selects as estimation domains the subpopulations determined by crossing the modalities of variables B1
and B2
. The deskott
variables referenced by by
(if any) must be factor
and must not contain any missing value (NA
).
The conf.int
argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE
, that is the confidence intervals are not provided.
Whenever confidence intervals are requested (i.e. conf.int=TRUE
), the desired confidence level can be specified by means of the conf.lev
argument. The conf.lev
value must represent a probability (0<=conf.lev<=1
) and its default is chosen to be 0.95
. Given an input kott.design
object with nrg
random groups, kott.quantile
builds the confidence intervals making use of a t distribution with nrg-1
degrees of freedom.
The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list
(typically a list made up of data frames).
It may happen that, in certain subpopulations, some of the nrg
replicate weights turn out to be all zero: for these replicates it is not possible to provide quantiles estimates. In these cases, kott.quantile
(i) returns NaN
for the corresponding standard errors and (ii) prints a warning
message.
Let \(\hat{F}_y\) be the estimate of the cumulative distribution of the \(y\) variabile. If an observed value \(y^*\) exists such that \(\hat{F}_y(y^*)=probs[i]\) than the i-th quantile estimate provided by kott.quantile
equals \(y^*\). If this is not the case, the kott.quantile
function (i) finds the two observed values \(y^-\) and \(y^+\) (\(y^- < y^+\)) such that the corresponding values \(\hat{F}_y(y^-)\) and \(\hat{F}_y(y^+)\) are the closest to \(probs[i]\), (ii) linearly interpolates \(\hat{F}_y\) between \(\hat{F}_y(y^-)\) and \(\hat{F}_y(y^+)\) and (iii) estimates the i-th quantile by inverting the linear approximation in the point \(probs[i]\).
The rigorous results of [kott 99-01] show that the DAGJK variance estimator for a given estimator \(\hat{\theta}\) is correct provided that PSUs are sampled with replacement and that \(\hat{\theta}\) is a smooth function of total estimators. As a result, it is not possible to guarantee that the DAGJK quantile variance estimator provided by kott.quantile
is not biased.
Kott, Phillip S. (1999) "The Extended Delete-A-Group Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.
Kott, Phillip S. (2001) "The Delete-A-Group Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521-526.
kottby
for estimating totals and means, kott.ratio
for estimating ratios between totals, kott.regcoef
for estimating regression coefficients and kottby.user
for calculating estimates based on user-defined estimators.
data(data.examples) # Creation of a kott.design object: kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM, weights=~weight,nrg=15) # Estimate of the deciles of the income variable for # the whole population: kott.quantile(kdes,~income,probs=seq(0.1,0.9,0.1))#> estimate SE #> 10% 713.1813 9.734157 #> 20% 888.0928 11.247590 #> 30% 1022.0000 11.986940 #> 40% 1135.0000 9.682285 #> 50% 1244.0000 14.891480 #> 60% 1360.0000 10.021758 #> 70% 1468.1979 14.481402 #> 80% 1607.0000 17.775812 #> 90% 1826.9735 12.074536# Estimate of the median of income by age5c: kott.quantile(kdes,~income,probs=0.5,by=~age5c,conf.int=TRUE)#> $`1` #> estimate SE l.conf(95%) u.conf(95%) #> 50% 970 10.01053 948.5295 991.4705 #> #> $`2` #> estimate SE l.conf(95%) u.conf(95%) #> 50% 1142.566 22.01423 1095.35 1189.781 #> #> $`3` #> estimate SE l.conf(95%) u.conf(95%) #> 50% 1341 24.75453 1287.907 1394.093 #> #> $`4` #> estimate SE l.conf(95%) u.conf(95%) #> 50% 1512 26.39081 1455.397 1568.603 #> #> $`5` #> estimate SE l.conf(95%) u.conf(95%) #> 50% 1719.681 80.49817 1547.03 1892.332 #># "Estimate" of the minimum and maximum of income by sex # (notice the value of SE): kott.quantile(kdes,~income,probs=c(0,1),by=~sex)#> $f #> estimate SE #> 0% 0 0.00000 #> 100% 2764 94.67699 #> #> $m #> estimate SE #> 0% 0 0.00000 #> 100% 2743 75.35516 #>