Estimation of quantiles

Calculates estimates, standard errors and confidence intervals for quantiles in subpopulations.

kott.quantile(deskott, y, probs = c(0.25,0.50,0.75), by = NULL,
              vartype = c("se", "cv", "cvpct", "var"),
              conf.int = FALSE, conf.lev = 0.95)

Arguments

deskott	Object of class `kott.design` containing the replicated survey data.
y	Formula defining the variable of interest.
probs	Vector of probability values to be used to calculate the quantiles estimates. The default value selects the quartiles estimates.
by	Formula specifying the variables that define the "estimation domains". If `NULL` (the default option) estimates refer to the whole population.
vartype	`character` vector specifying the desired variability estimators. It is possible to choose one or more of: standard error (the default), coefficient of variation, percent coefficient of variation, or variance.
conf.int	Boolean (`logical`) value to request confidence intervals for the estimates: the default is `FALSE`.
conf.lev	Probability specifying the desired confidence level: the default value is `0.95`.

Details

This function calculates weighted estimates for the quantiles of a quantitative variable using suitable weights depending on the class of deskott: calibrated weights for class kott.cal.design and direct weights otherwise. Standard errors are calculated using the extended DAGJK method [Kott 99-01].

The mandatory argument y identifies the variable of interest, that is the variable for which quantiles estimates are to be calculated. The deskott variable referenced by y must be numeric and must not contain any missing value (NA).

The optional argument probs specifies the probability values (0<=probs[i]<=1) for which quantiles estimates must be calculated; the default option selects quartiles estimates. If probs[i] is equal to 0 (1) the corresponding "estimate" produced by kott.quantile coincides with the smallest (largest) observed value for the y variable.

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by kottby refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. The deskott variables referenced by by (if any) must be factor and must not contain any missing value (NA).

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95. Given an input kott.design object with nrg random groups, kott.quantile builds the confidence intervals making use of a t distribution with nrg-1 degrees of freedom.

Value

The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list (typically a list made up of data frames).

Warning

It may happen that, in certain subpopulations, some of the nrg replicate weights turn out to be all zero: for these replicates it is not possible to provide quantiles estimates. In these cases, kott.quantile (i) returns NaN for the corresponding standard errors and (ii) prints a warning message.

Note

Let \(\hat{F}_y\) be the estimate of the cumulative distribution of the \(y\) variabile. If an observed value \(y^*\) exists such that \(\hat{F}_y(y^*)=probs[i]\) than the i-th quantile estimate provided by kott.quantile equals \(y^*\). If this is not the case, the kott.quantile function (i) finds the two observed values \(y^-\) and \(y^+\) (\(y^- < y^+\)) such that the corresponding values \(\hat{F}_y(y^-)\) and \(\hat{F}_y(y^+)\) are the closest to \(probs[i]\), (ii) linearly interpolates \(\hat{F}_y\) between \(\hat{F}_y(y^-)\) and \(\hat{F}_y(y^+)\) and (iii) estimates the i-th quantile by inverting the linear approximation in the point \(probs[i]\).

The rigorous results of [kott 99-01] show that the DAGJK variance estimator for a given estimator \(\hat{\theta}\) is correct provided that PSUs are sampled with replacement and that \(\hat{\theta}\) is a smooth function of total estimators. As a result, it is not possible to guarantee that the DAGJK quantile variance estimator provided by kott.quantile is not biased.

References

Kott, Phillip S. (1999) "The Extended Delete-A-Group Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.

Kott, Phillip S. (2001) "The Delete-A-Group Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521-526.

Examples

data(data.examples)

# Creation of a kott.design object:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
      weights=~weight,nrg=15)

# Estimate of the deciles of the income variable for
# the whole population:
kott.quantile(kdes,~income,probs=seq(0.1,0.9,0.1))
#>      estimate        SE
#> 10%  713.1813  9.734157
#> 20%  888.0928 11.247590
#> 30% 1022.0000 11.986940
#> 40% 1135.0000  9.682285
#> 50% 1244.0000 14.891480
#> 60% 1360.0000 10.021758
#> 70% 1468.1979 14.481402
#> 80% 1607.0000 17.775812
#> 90% 1826.9735 12.074536

# Estimate of the median of income by age5c:
kott.quantile(kdes,~income,probs=0.5,by=~age5c,conf.int=TRUE)
#> $`1`
#>     estimate       SE l.conf(95%) u.conf(95%)
#> 50%      970 10.01053    948.5295    991.4705
#> 
#> $`2`
#>     estimate       SE l.conf(95%) u.conf(95%)
#> 50% 1142.566 22.01423     1095.35    1189.781
#> 
#> $`3`
#>     estimate       SE l.conf(95%) u.conf(95%)
#> 50%     1341 24.75453    1287.907    1394.093
#> 
#> $`4`
#>     estimate       SE l.conf(95%) u.conf(95%)
#> 50%     1512 26.39081    1455.397    1568.603
#> 
#> $`5`
#>     estimate       SE l.conf(95%) u.conf(95%)
#> 50% 1719.681 80.49817     1547.03    1892.332
#> 

# "Estimate" of the minimum and maximum of income by sex
# (notice the value of SE): 
kott.quantile(kdes,~income,probs=c(0,1),by=~sex)
#> $f
#>      estimate       SE
#> 0%          0  0.00000
#> 100%     2764 94.67699
#> 
#> $m
#>      estimate       SE
#> 0%          0  0.00000
#> 100%     2743 75.35516
#>