GVF.db.Rd
GVF.db
is the archive of registered (i.e. built-in and/or user-defined) Generalized Variance Functions models supported by ReGenesees. Special accessor functions allow to customize, maintain, extend, update, save and reset such archive.
GVF.db `GVF.db$insert`(GVF.model, Estimator.kind = NA, Resp.to.CV = NA, verbose = TRUE) `GVF.db$delete`(Model.id, verbose = TRUE) `GVF.db$get`(verbose = TRUE) `GVF.db$assign`(value, verbose = TRUE) `GVF.db$reset`(verbose = TRUE)
GVF.model | A GVF model, expressed as a formula object or as a character string (see ‘Details’). |
---|---|
Estimator.kind | Character string identifying the kind of estimators for which the GVF model is deemed to be appropriate (see ‘Details’). |
Resp.to.CV | Character string representing the function which maps the response of the GVF model (namely: variable 'resp') to the coefficient of variation (namely: variable 'CV'), see ‘Details’. |
Model.id | Unique integer key identifying the GVF model. |
value | An exported copy of |
verbose | Enables printing of a summary description of the result (the default is |
Each row of the GVF.db
data frame represents a registered GVF model, with relevant information on the following 4 variables:
Model.id
A unique integer key identifying the GVF model, integer
.
GVF.model
A character string specifying the GVF model formula, character
. See also ‘Details’.
Estimator.kind
A character string identifying the kind of estimators for which the GVF model is deemed to be appropriate, character
. See also ‘Details’.
Resp.to.CV
A character string which represents the function mapping the response of the GVF model (namely: variable 'resp') to the coefficient of variation (namely: variable 'CV'), character
. See also ‘Details’.
GVF.db
stores information about Generalized Variance Functions models supported by ReGenesees. When starting a new work session with ReGenesees, GVF.db
contains few built-in GVF models (currently 5, see sections ‘Source’ and ‘Examples’). The content of GVF.db
can be customized by means of special accessor functions:
ACCESSOR FUNCTION PURPOSE GVF.db$insert...........Register a new GVF model by adding a new row to the GVF.db archive GVF.db$delete...........Unregister a GVF model by deleting the corresponding row from GVF.db GVF.db$get..............Get the current version of GVF.db (e.g. to copy/save a customized archive for later usage) GVF.db$assign...........Overwrite the current version of GVF.db (e.g. to use a customized archive which was exported in a previous ReGenesees session) GVF.db$reset............Reset GVF.db to its default version (i.e. the one with built-in GVF models only)
Information about registered GVF models stored inside GVF.db
will be accessed and used by ReGenesees Generalized Variance Functions facilities, e.g. functions fit.gvf
or predictCV
.
GVF.db$insert()
Function GVF.db$insert
has just a single mandatory argument: GVF.model
. This can be either a two-sided formula or a character string which would be transformed into a (well formed) two-sided formula by function as.formula
.
The GVF.model
formula to be inserted into GVF.db
must be new (i.e. not already present into the archive) and can involve only variables contained inside gvf.input
objects, namely:
(1) 'Y' (2) 'SE' (3) 'CV' (4) 'VAR' (5) 'DEFF'
Moreover, since GVF models are intended to model variances in terms of estimates, the response term of GVF.model
must involve some of 'SE'
, 'CV'
, 'VAR'
, and the linear predictor must involve 'Y'
.
Optional argument Estimator.kind
can be used to specify the kind of estimators for which the GVF.model
is deemed to be appropriate. There are currently only 11 valid values for Estimator.kind
, namely:
(1) 'Total' (2) 'Mean' (3) 'Frequency' (4) 'Absolute Frequency' (5) 'Relative Frequency' (6) 'Ratio' (7) 'Share' (8) 'Share Ratio' (9) 'Regression Coefficient' (10) 'Quantile' (11) 'Complex Estimator'
Note that category 'Frequency'
has to be understood as an aggregation of categories 'Absolute Frequency'
and 'Relative Frequency'
, thus being appropriate for GVF models which are deemed to work well for estimators of both kind of frequencies.
One of the primary motivations for building and fitting a GVF model is to exploit the fitted model to predict the sampling error associated to a given estimate, instead of having to compute directly an estimate of such sampling error. Optional argument Resp.to.CV
is relevant to that scope.
Indeed, different GVF models can actually specify as response term (call it 'resp'
for definiteness) different functions of variables 'SE'
, 'CV'
, and 'VAR'
, but ReGenesees will always adopt variable 'CV'
as a pivot. Thus, when registering a new GVF model, the user can provide via argument Resp.to.CV
the function which transforms the response of the model, 'resp'
, into the pivot measure of variability, 'CV'
. A look to the default content of GVF.db
should make the latter statement clear (see ‘Examples’).
Note that while Resp.to.CV
is passed as a character string, that string is expected to represent a well-formed mathematical expression (otherwise function predictCV
would not work). Moreover, only variables 'resp'
and 'Y'
are allowed to appear inside Resp.to.CV
(which is enough, since 'VAR'
and 'SE'
can be expressed in terms of 'CV'
and 'Y'
).
If the user does not specify Resp.to.CV
when registering a new GVF model, he will be not able to use function predictCV
for predicting CV values based on the fitted GVF model.
Lastly, note that the Model.id
of a newly inserted GVF model will automatically be set, by adding 1 to the previous maximum of Model.id
.
GVF.db$delete()
Function GVF.db$delete
has just a single mandatory argument: Model.id
. It must match the integer key of the (already existing) GVF model you want to drop from GVF.db
.
Note that, after deleting a GVF model from GVF.db
, values of column Model.id
will be automatically renumbered, so as to range always from 1
to nrow(GVF.db)
.
GVF.db$get()
Function GVF.db$get
has no mandatory arguments. When invoked, the function returns the current content of GVF.db
, so that it can be assigned and saved/exported for later usage (see ‘Examples’). Should the current content of GVF.db
happen to be empty, the function would inform the user and return NULL
. The return value of GVF.db$get
has class "GVF.db_exported"
, and inherits from class "data.frame"
.
GVF.db$assign()
Function GVF.db$assign
has just a single mandatory argument: value
. The object passed to argument value
can only be a previously exported copy of GVF.db
, i.e. an object of class GVF.db_exported
. The function overwrites the current version of GVF.db
with value
. As a result, after invoking GVF.db$assign
, the content of GVF.db
is value
.
GVF.db$reset()
Function GVF.db$reset
has no mandatory arguments and simply restores the default version of GVF.db
(i.e. the one containing built-in GVF models only).
Built-in GVF models for frequencies (i.e. those with Model.id
1
, 2
, and 3
) are discussed in Chapter 7 of [Wolter 07], along with their theoretical justification. Built-in GVF models for totals (i.e. those with Model.id
4
, and 5
) lack a rigorous justification, but have sometimes been used successfully on a purely empirical basis. For instance, Istat surveys on structural business statistics adopted models of that kind to summarize standard errors in publications and to allow their approximate evaluation on a custom basis.
Wolter, K.M. (2007) “Introduction to Variance Estimation”, Second Edition, Springer-Verlag, New York.
estimator.kind
to assess what kind of estimates are stored inside a survey statistic object, gvf.input
and svystat
to prepare the input for GVF model fitting, fit.gvf
to fit GVF models, plot.gvf.fit
to get diagnostic plots for fitted GVF models, drop.gvf.points
to drop alleged outliers from a fitted GVF model and simultaneously refit it, and predictCV
to predict CV values via fitted GVF models.
# Print the current content of GVF.db (invoking # print(GVF.db) would do the same): GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #>#> [1] "GVF.db"str(GVF.db)#> 'data.frame': 5 obs. of 4 variables: #> $ Model.id : int 1 2 3 4 5 #> $ GVF.model : chr "log(CV^2) ~ log(Y)" "CV^2 ~ I(1/Y)" "CV^2 ~ I(1/Y) + I(1/Y^2)" "SE ~ Y + I(Y^2)" ... #> $ Estimator.kind: chr "Frequency" "Frequency" "Frequency" "Total" ... #> $ Resp.to.CV : chr "sqrt(exp(resp))" "sqrt(resp)" "sqrt(resp)" "resp/Y" ...dim(GVF.db)#> [1] 5 4nrow(GVF.db)#> [1] 5###################### # Accessor functions # ###################### # Delete the 3rd model: GVF.db$delete(3)#> #> # GVF model has been deleted #># Print GVF.db (note that Model.id has been renumbered, # so as to range always from 1 to nrow(GVF.db)) GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 SE ~ Y + I(Y^2) Total resp/Y #> 4 4 CV ~ I(1/Y) + Y Total resp #># Now delete the 1st model: GVF.db$delete(1)#> #> # GVF model has been deleted #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 2 2 SE ~ Y + I(Y^2) Total resp/Y #> 3 3 CV ~ I(1/Y) + Y Total resp #># Reset GVF.db to its default values: GVF.db$reset()#> #> # Default GVF models db restored #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #># Insert a new tentative GVF model for Totals: GVF.db$insert(CV ~ I(1/Y^2) + I(1/Y) + Y + I(Y^2), "Total", "resp")#> #> # New GVF model has been registered #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #> 6 6 CV ~ I(1/Y^2) + I(1/Y) + Y + I(Y^2) Total resp #># (notice that invoking GVF.db$insert() with first argument of type character, # i.e. GVF.model="CV~I(1/Y^2)+I(1/Y)+Y+I(Y^2)", would have obtained exactly the # same result) # Now suppose you have somehow validated your newly added model, # and you want to save your current, enhanced GVF.db in order to # be able to use it later in a subsequent ReGenesees session. ### This can be achieved as follows: ### START # 1. You must first get a copy of it, by using accessor function # GVF.db$get: myGVF.db <- GVF.db$get() myGVF.db#> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #> 6 6 CV ~ I(1/Y^2) + I(1/Y) + Y + I(Y^2) Total respdata.class(myGVF.db)#> [1] "GVF.db_exported"# 2. Then, you must save the copy to a .RData workspace, in order # to be able to load it later when needed, e.g.: if (FALSE) { save(myGVF.db, file="custom.GVF.Archive.RData") } # 3. Starting a new ReGenesees session will set the default GVF.db, # which we can simulate in this example as follows: GVF.db$reset()#> #> # Default GVF models db restored #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #># 4. Now you can load your previously saved customized GVF.db... if (FALSE) { load("custom.GVF.Archive.RData") } # ...so that myGVF.db is back into your .GlobalEnv: myGVF.db#> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #> 6 6 CV ~ I(1/Y^2) + I(1/Y) + Y + I(Y^2) Total resp# 5. Lastly, you must overwrite GVF.db with your custom # GVF archive myGVF.db via function GVF.db$assign: GVF.db$assign(myGVF.db)#> #> # GVF models db overwritten #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #> 6 6 CV ~ I(1/Y^2) + I(1/Y) + Y + I(Y^2) Total resp #>### Now your custom GVF archive is ready to be used by ReGenesees. ### STOP # Illustrate some GVF.db$insert checks by trying crazy models # or ill-specified attributes # Examples start: reset GVF.db to its default values GVF.db$reset()#> #> # Default GVF models db restored #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #># GVF model must be "syntactically new"... if (FALSE) { GVF.db$insert(log(CV^2) ~ log(Y)) } # ...if this is the case, it can even be "equivalent" to old ones: e.g. # the following is identical to model number 5 and will produces identical # estimates and predictions (as you may want to check): GVF.db$insert(I(sqrt(VAR)/Y) ~ I(1/Y) + Y, "Total", Resp.to.CV = "resp")#> #> # New GVF model has been registered #>GVF.db#> #> # Registered GVF models currently available: #> #> Model.id GVF.model Estimator.kind Resp.to.CV #> 1 1 log(CV^2) ~ log(Y) Frequency sqrt(exp(resp)) #> 2 2 CV^2 ~ I(1/Y) Frequency sqrt(resp) #> 3 3 CV^2 ~ I(1/Y) + I(1/Y^2) Frequency sqrt(resp) #> 4 4 SE ~ Y + I(Y^2) Total resp/Y #> 5 5 CV ~ I(1/Y) + Y Total resp #> 6 6 I(sqrt(VAR)/Y) ~ I(1/Y) + Y Total resp #># GVF model must have a response term if (FALSE) { GVF.db$insert(~ log(Y)) } # GVF model response must involve some of 'SE', 'CV', 'VAR' if (FALSE) { GVF.db$insert(DEFF ~ log(Y)) } # GVF model predictor must involve 'Y' if (FALSE) { GVF.db$insert(VAR ~ SE) } # If passed, Resp.to.CV can only involve 'resp' and 'Y' if (FALSE) { GVF.db$insert(I(sqrt(VAR)/Y) ~ I(1/Y) + Y + I(Y^2), Resp.to.CV = "sqrt(VAR)/Y") } # Examples end: reset GVF.db to its default values: GVF.db$reset()#> #> # Default GVF models db restored #>