The sbs data frame stores artificial sbs-like sampling data, while sbs.frame is the artificial sampling frame from which the sbs units have been drawn. They allow to run R code contained in the ‘Examples’ section of the ReGenesees package help pages.

data(sbs)

Format

The sbs data frame mimics data observed in a Structural Business Statistics survey, under a one-stage stratified unit sampling design. The sample is made up of 6909 units, for which the following 22 variables were observed:

id

Identifier of the sampling units (enterprises), numeric

public

Does the enterprise belong to the Public Sector? factor with levels 0 (No) and 1 (Yes)

emp.num

Number of employees, numeric

emp.cl

Number of employees classified into 5 categories, factor with levels [6,9] (9,19] (19,49] (49,99] (99,Inf] (notice that small enterprises with less than 6 employees fell outside the scope of the survey)

nace5

Economic Activity code with 5 digits, factor with 596 levels

nace2

Economic Activity code with 2 digits, factor with 57 levels

area

Territorial Division, factor with 24 levels

cens

Flag identifying statistical units to be censused (hence defining take-all strata), factor with levels 0 (No) and 1 (Yes)

region

Macroregion, factor with levels North Center South

va.cl

Class of Value Added, factor with 27 levels

va

Value Added, numeric (contains NAs)

dom1

A planned estimation domain, factor with 261 levels (dom1 crosses nace2 and emp.cl)

nace.macro

Economic Activity Macrosector, factor with levels Agriculture Industry Commerce Services

dom2

A planned estimation domain, factor with 12 levels (dom2 crosses nace.macro and region)

strata

Stratification Variable, a factor with 664 levels (obtained by crossing variables region, nace2, emp.cl and cens)

va.imp1

Value Added Imputed1, numeric (NAs were replaced with average values computed inside imputation strata obtained by crossing region, nace.macro, emp.cl)

va.imp2

Value Added Imputed2, numeric (NAs were replaced with median values computed inside imputation strata obtained by crossing region, nace.macro, emp.cl)

y

A numeric variable correlated with va

weight

Direct weights, numeric

fpc

Finite Population Corrections (given as sampling fractions inside strata), numeric

ent

Convenience numeric variable identically equal to 1 (sometimes useful, e.g. to estimate the total number of enterprises)

dom3

An unplanned estimation domain, factor with 4 levels

The sbs.frame sampling frame (from which sbs units have been drawn) contains 17318 units.

Examples

data(sbs) head(sbs)
#> id public emp.num emp.cl nace5 nace2 area cens region va.cl va #> 1 1268 0 38 (19,49] 1210 1 32 0 Center 22 5500.0 #> 2 1358 0 30 (19,49] 1240 1 32 0 Center 19 1500.0 #> 3 13819 0 25 (19,49] 1131 1 41 0 Center 16 400.0 #> 4 15749 0 22 (19,49] 1111 1 43 0 Center 1 0.0 #> 5 8431 0 29 (19,49] 1121 1 31 0 Center 2 0.5 #> 6 7572 0 50 (49,99] 1132 1 41 0 Center 11 60.0 #> dom1 nace.macro dom2 strata va.imp1 va.imp2 #> 1 1.(19,49] Agriculture Agriculture.Center Center.1.(19,49].0 5500.0 5500.0 #> 2 1.(19,49] Agriculture Agriculture.Center Center.1.(19,49].0 1500.0 1500.0 #> 3 1.(19,49] Agriculture Agriculture.Center Center.1.(19,49].0 400.0 400.0 #> 4 1.(19,49] Agriculture Agriculture.Center Center.1.(19,49].0 0.0 0.0 #> 5 1.(19,49] Agriculture Agriculture.Center Center.1.(19,49].0 0.5 0.5 #> 6 1.(49,99] Agriculture Agriculture.Center Center.1.(49,99].0 60.0 60.0 #> y weight fpc ent dom3 #> 1 1636.6075 1.40 0.7142857 1 C #> 2 1002.4378 1.40 0.7142857 1 C #> 3 444.4637 1.40 0.7142857 1 D #> 4 252.1287 1.40 0.7142857 1 D #> 5 466.5918 1.40 0.7142857 1 D #> 6 742.9053 1.25 0.8000000 1 B
str(sbs)
#> 'data.frame': 6909 obs. of 22 variables: #> $ id : int 1268 1358 13819 15749 8431 7572 9701 9661 11899 15136 ... #> $ public : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... #> $ emp.num : int 38 30 25 22 29 50 67 55 52 12 ... #> $ emp.cl : Factor w/ 5 levels "[6,9]","(9,19]",..: 3 3 3 3 3 4 4 4 4 2 ... #> $ nace5 : Factor w/ 504 levels "1000","1100",..: 13 17 8 3 5 9 17 6 8 3 ... #> $ nace2 : Factor w/ 57 levels "1","2","5","11",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ area : Factor w/ 24 levels "11","12","13",..: 13 13 16 18 12 16 14 13 16 18 ... #> $ cens : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... #> $ region : Factor w/ 3 levels "North","Center",..: 2 2 2 2 2 2 2 2 2 2 ... #> $ va.cl : Factor w/ 27 levels "1","2","3","4",..: 22 19 16 1 2 11 23 16 16 1 ... #> $ va : num 5500 1500 400 0 0.5 60 7000 400 400 0 ... #> $ dom1 : Factor w/ 261 levels "1.(19,49]","1.(49,99]",..: 1 1 1 1 1 2 2 2 2 3 ... #> $ nace.macro: Factor w/ 4 levels "Agriculture",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ dom2 : Factor w/ 12 levels "Agriculture.Center",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ strata : Factor w/ 664 levels "Center.1.(19,49].0",..: 1 1 1 1 1 2 2 2 2 3 ... #> $ va.imp1 : num 5500 1500 400 0 0.5 60 7000 400 400 0 ... #> $ va.imp2 : num 5500 1500 400 0 0.5 60 7000 400 400 0 ... #> $ y : num 1637 1002 444 252 467 ... #> $ weight : num 1.4 1.4 1.4 1.4 1.4 1.25 1.25 1.25 1.25 1.5 ... #> $ fpc : num 0.714 0.714 0.714 0.714 0.714 ... #> $ ent : num 1 1 1 1 1 1 1 1 1 1 ... #> $ dom3 : Factor w/ 4 levels "A","B","C","D": 3 3 4 4 4 2 3 2 2 4 ...
str(sbs.frame)
#> 'data.frame': 17318 obs. of 20 variables: #> $ id : int 1 2 3 4 5 6 7 8 9 10 ... #> $ public : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 2 1 1 1 ... #> $ emp.num : int 21 35 20 18 689 12 172 51 14 9 ... #> $ emp.cl : Factor w/ 5 levels "[6,9]","(9,19]",..: 3 3 3 2 5 2 5 4 2 1 ... #> $ nace5 : Factor w/ 596 levels "1000","1100",..: 388 51 127 226 497 480 497 478 346 480 ... #> $ nace2 : Factor w/ 57 levels "1","2","5","11",..: 34 7 11 20 45 40 45 40 33 40 ... #> $ area : Factor w/ 24 levels "11","12","13",..: 1 1 3 1 2 1 1 1 1 1 ... #> $ cens : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 2 1 1 1 ... #> $ region : Factor w/ 3 levels "North","Center",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ va.cl : Factor w/ 27 levels "1","2","3","4",..: 21 23 19 17 27 NA 22 NA NA NA ... #> $ va : num 3500 7000 1500 600 70000 NA 5500 NA NA NA ... #> $ dom1 : Factor w/ 261 levels "1.(19,49]","1.(49,99]",..: 154 18 37 85 208 182 208 181 151 184 ... #> $ nace.macro: Factor w/ 4 levels "Agriculture",..: 3 2 2 2 4 4 4 4 3 4 ... #> $ dom2 : Factor w/ 12 levels "Agriculture.Center",..: 5 8 8 8 11 11 11 11 5 11 ... #> $ strata : Factor w/ 664 levels "Center.1.(19,49].0",..: 344 210 228 277 406 373 406 372 341 375 ... #> $ va.imp1 : num 3500 7000 1500 600 70000 ... #> $ va.imp2 : num 3500 7000 1500 600 70000 3500 5500 750 750 400 ... #> $ y : num 1374 2074 457 455 13584 ... #> $ ent : num 1 1 1 1 1 1 1 1 1 1 ... #> $ dom3 : Factor w/ 4 levels "A","B","C","D": 4 3 1 4 1 2 1 2 3 3 ...