Estimation of totals and means

Calculates estimates, standard errors and confidence intervals for totals and means in subpopulations.

kottby(deskott, y, by = NULL, estimator = c("total", "mean"),
       vartype = c("se", "cv", "cvpct", "var"),
       conf.int = FALSE, conf.lev = 0.95)

Arguments

deskott	Object of class `kott.design` containing the replicated survey data.
y	Formula defining the variables of interest.
by	Formula specifying the variables that define the "estimation domains". If `NULL` (the default option) estimates refer to the whole population.
estimator	`character` specifying the desired estimator: it may be `"total"` (the default) or `"mean"`.
vartype	`character` vector specifying the desired variability estimators. It is possible to choose one or more of: standard error (the default), coefficient of variation, percent coefficient of variation, or variance.
conf.int	Boolean (`logical`) value to request confidence intervals for the estimates: the default is `FALSE`.
conf.lev	Probability specifying the desired confidence level: the default value is `0.95`.

Details

This function calculates weighted estimates for totals and means using suitable weights depending on the class of deskott: calibrated weights for class kott.cal.design and direct weights otherwise. Standard errors are calculated using the extended DAGJK method [Kott 99-01].

The mandatory argument y identifies the variables of interest, that is the variables for which estimates are to be calculated. The corresponding formula must be of the type y=~var1+…+varn. The deskott variables referenced by y must be numeric or factor and must not contain any missing value (NA). It is admissible to specify for y "mixed" formulas that simultaneously contain quantitative (numeric) variables and qualitative (factor) variables.

The optional argument by specifies the variables that define the "estimation domains", that is the subpopulations for which the estimates are to be calculated. If by=NULL (the default option), the estimates produced by kottby refer to the whole population. Estimation domains must be defined by a formula: for example the statement by=~B1:B2 selects as estimation domains the subpopulations determined by crossing the modalities of variables B1 and B2. The deskott variables referenced by by (if any) must be factor and must not contain any missing value (NA).

The optional argument estimator makes it possible to select the desired estimator. If
estimator="total" (the default option), kottby calculates, for a given variable of interest vark, the estimate of the total (when vark is numeric) or the estimate of the absolute frequency distribution (when vark is factor). Similarly, if estimator="mean", the function calculates the estimate of the mean (when vark is numeric) or the the estimate of the relative frequency distribution (when vark is factor).

The conf.int argument allows to request the confidence intervals for the estimates. By default conf.int=FALSE, that is the confidence intervals are not provided.

Whenever confidence intervals are requested (i.e. conf.int=TRUE), the desired confidence level can be specified by means of the conf.lev argument. The conf.lev value must represent a probability (0<=conf.lev<=1) and its default is chosen to be 0.95.

Value

The return value depends on the value of the input parameters. In the most general case, the function returns an object of class list (typically a list made up of data frames).

Note

The advantage of the DAGJK method over the traditional jackknife is that, unlike the latter, it remains computationally manageable even when dealing with "complex and big" surveys (tens of thousands of PSUs arranged in a large number of strata with widely varying sizes). In fact, the DAGJK method is known to provide, for a broad range of sampling designs and estimators, (near) unbiased standard error estimates even with a "small" number (e.g. a few tens) of replicate weights. On the other hand, if the number of replicates is not large, it seems defensible to use a t distribution (rather than a normal distribution) for calculating the confidence intervals. In line with what was proposed in [Kott 99-01], given an input kott.design object with nrg random groups, kottby builds the confidence intervals making use of a t distribution with nrg-1 degrees of freedom.

References

Kott, Phillip S. (1999) "The Extended Delete-A-Group Jackknife". Bulletin of the International Statistical Instititute. 52nd Session. Contributed Papers. Book 2, pp. 167-168.

Kott, Phillip S. (2001) "The Delete-A-Group Jackknife". Journal of Official Statistics, Vol.17, No.4, pp. 521-526.

Examples

data(data.examples)

# Creation of a kott.design object:
kdes<-kottdesign(data=example,ids=~towcod+famcod,strata=~SUPERSTRATUM,
      weights=~weight,nrg=15)


# Estimate of the total of 3 quantitative variables for the whole
# population:
kottby(kdes,~y1+y2+y3)
#> $y1
#>       total       SE
#> y1 381111.8 16251.66
#> 
#> $y2
#>       total       SE
#> y2 356633.5 16291.13
#> 
#> $y3
#>      total       SE
#> y3 24478.3 2439.861
#> 


# Estimate of the total of the same 3 variables by sex: 
kottby(kdes,~y1+y2+y3,~sex)
#> $y1
#>              f        m
#> total 191585.8   189526
#> SE    11199.24 8314.135
#> 
#> $y2
#>              f        m
#> total 181179.5   175454
#> SE    11663.96 7591.745
#> 
#> $y3
#>              f      m
#> total  10406.3  14072
#> SE    1257.641 2156.7
#> 


# Estimate of the mean of the same 3 variables by marstat and sex:
kottby(kdes,~y1+y2+y3,~marstat:sex,estimator="mean")
#> $y1
#>       married.f unmarried.f  widowed.f  married.m unmarried.m  widowed.m
#> mean  0.3907492   0.4235675  0.4626949  0.4067734   0.4394581  0.4018789
#> SE   0.02016379  0.02838279 0.04654907 0.01356993  0.01807067 0.04700886
#> 
#> $y2
#>       married.f unmarried.f widowed.f  married.m unmarried.m  widowed.m
#> mean  0.3669468   0.4042538  0.440711  0.3826893   0.3906832  0.3952614
#> SE   0.02073544  0.02765722 0.0452096 0.01490486   0.0140276 0.04720339
#> 
#> $y3
#>        married.f unmarried.f  widowed.f   married.m unmarried.m   widowed.m
#> mean  0.02380235  0.01931368 0.02198388  0.02408417   0.0487749 0.006617401
#> SE   0.005086532 0.006118989 0.01280707 0.006835304 0.008965009 0.006673789
#> 


# Estimate of the absolute frequency distribution of the qualitative
# variable age5c for the whole population:
kottby(kdes,~age5c)
#>            total        SE
#> age5c.1 128928.4  6872.904
#> age5c.2 294575.1 11047.742
#> age5c.3 356463.9 10033.262
#> age5c.4 122897.6  9700.259
#> age5c.5  21236.3  2459.820


# Estimate of the relative frequency distribution of the qualitative
# variable marstat by sex:
kottby(kdes,~marstat,~sex,estimator="mean")
#> $f
#>                         mean          SE
#> marstat.married   0.58198769 0.014765985
#> marstat.unmarried 0.33855429 0.012208442
#> marstat.widowed   0.07945802 0.007813969
#> 
#> $m
#>                         mean          SE
#> marstat.married   0.57948707 0.014615787
#> marstat.unmarried 0.33810508 0.012797112
#> marstat.widowed   0.08240785 0.007581132
#> 


# The same with confidence intervals at a confidence level of 0.9:
kottby(kdes,~marstat,~sex,estimator="mean",conf.int=TRUE,conf.lev=0.9)
#> $f
#>                         mean          SE l.conf(90%) u.conf(90%)
#> marstat.married   0.58198769 0.014765985   0.5559802  0.60799517
#> marstat.unmarried 0.33855429 0.012208442   0.3170514  0.36005714
#> marstat.widowed   0.07945802 0.007813969   0.0656952  0.09322084
#> 
#> $m
#>                         mean          SE l.conf(90%) u.conf(90%)
#> marstat.married   0.57948707 0.014615787  0.55374414  0.60523001
#> marstat.unmarried 0.33810508 0.012797112  0.31556540  0.36064476
#> marstat.widowed   0.08240785 0.007581132  0.06905512  0.09576057
#> 


# Quantitative and qualitative variables together: estimate of the
# total for y3 and of the absolute frequency distribution of marstat,
# by sex:
kottby(kdes,~y3+marstat,~sex)
#> $y3
#>              f      m
#> total  10406.3  14072
#> SE    1257.641 2156.7
#> 
#> $marstat
#> $marstat$f
#>                      total        SE
#> marstat.married   273569.6 10364.019
#> marstat.unmarried 159141.1  7800.659
#> marstat.widowed    37350.1  3747.368
#> 
#> $marstat$m
#>                      total       SE
#> marstat.married   263110.6 6942.904
#> marstat.unmarried 153513.4 9042.946
#> marstat.widowed    37416.5 3807.603
#> 
#> 


# Lonely PSUs do not give rise to NaNs in the standard errors:
kdes.lpsu<-kottdesign(data=example,ids=~towcod+famcod,strata=~stratum,
           weights=~weight,nrg=15)
#> Warning: Lonely PSUs in strata: 902, 903, 904, 905, 906, 907, 1004, 1005, 1006, 1007, 1008, 1102, 1103, 1104, 3002, 3003, 3004, 3005, 3006, 3007, 3008, 3009, 3010, 3105, 3106, 3107, 5407, 5408, 5409, 5410, 5411, 5412, 5413, 5415, 5416, 5502, 5503, 9304, 9305, 9306, 9307, 9308, 9309, 9310, 9312
kottby(kdes.lpsu,~x1+x2+x3)
#> $x1
#>    total       SE
#> x1 57412 3383.581
#> 
#> $x2
#>      total       SE
#> x2 19635.2 2777.078
#> 
#> $x3
#>      total       SE
#> x3 20552.6 1848.088
#>