joint_calib
allows joint calibration of totals and quantiles. It provides a user-friendly interface that includes the specification of variables in formula notation, a vector of population totals, a list of quantiles, and a variety of backends and methods.
Usage
joint_calib(
formula_totals = NULL,
formula_quantiles = NULL,
data = NULL,
dweights = NULL,
N = NULL,
pop_totals = NULL,
pop_quantiles = NULL,
subset = NULL,
backend = c("sampling", "laeken", "survey", "ebal", "base"),
method = c("raking", "linear", "logit", "sinh", "truncated", "el", "eb"),
bounds = c(0, 10),
maxit = 50,
tol = 1e-08,
eps = .Machine$double.eps,
control = control_calib(),
...
)
Arguments
- formula_totals
a formula with variables to calibrate the totals,
- formula_quantiles
a formula with variables for quantile calibration,
- data
a data.frame with variables,
- dweights
initial d-weights for calibration (e.g. design weights),
- N
population size for calibration of quantiles,
- pop_totals
a named vector of population totals for
formula_totals
. Should be provided exactly as insurvey
package (seesurvey::calibrate
),- pop_quantiles
a named list of population quantiles for
formula_quantiles
or annewsvyquantile
class object (fromsurvey::svyquantile
function),- subset
a formula for subset of data,
- backend
specify an R package to perform the calibration. Only
sampling
,laeken
,survey
,ebal
orbase
are allowed,- method
specify method (i.e. distance function) for the calibration. Only
raking
,linear
,logit
,sinh
,truncated
,el
(empirical likelihood),eb
(entropy balancing) are allowed,- bounds
a numeric vector of length two giving bounds for the g-weights,
- maxit
a numeric value representing the maximum number of iterations,
- tol
the desired accuracy for the iterative procedure (for
sampling
,laeken
,ebal
,el
) or tolerance in matching population total forsurvey::grake
(see help for survey::grake)- eps
the desired accuracy for computing the Moore-Penrose generalized inverse (see
MASS::ginv()
)- control
a list of control parameters (currently only for
joint_calib_create_matrix
)- ...
arguments passed either to
sampling::calib
,laeken::calibWeights
,survey::calibrate
oroptim::constrOptim
Value
Returns a list with containing:
g
-- g-weight that sums up to sample size,Xs
-- matrix used for calibration (i.e. Intercept, X and X_q transformed for calibration of quantiles),totals
-- a vector of totals (i.e.N
,pop_totals
andpop_quantiles
),method
-- selected method,backend
-- selected backend.
References
Beręsewicz, M., and Szymkowiak, M. (2023). A note on joint calibration estimators for totals and quantiles Arxiv preprint https://arxiv.org/abs/2308.13281
Deville, J. C., and Särndal, C. E. (1992). Calibration estimators in survey sampling. Journal of the American statistical Association, 87(418), 376-382.
Harms, T. and Duchesne, P. (2006). On calibration estimation for quantiles. Survey Methodology, 32(1), 37.
Wu, C. (2005) Algorithms and R codes for the pseudo empirical likelihood method in survey sampling, Survey Methodology, 31(2), 239.
Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference, International Statistical Review 91, 165–192.
Haziza, D., and Lesage, É. (2016). A discussion of weighting procedures for unit nonresponse. Journal of Official Statistics, 32(1), 129-145.
See also
sampling::calib()
-- for standard calibration.
laeken::calibWeights()
-- for standard calibration.
survey::calibrate()
-- for standard and more advanced calibration.
ebal::ebalance()
-- for standard entropy balancing.
Examples
## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
probs <- seq(0.1, 0.9, 0.1)
quants_known <- list(x=quantile(x, probs))
totals_known <- c(x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
y_quant_true <- quantile(y, probs)
## standard calibration for comparison
result0 <- sampling::calib(Xs = cbind(1, df_resp$x),
d = df_resp$d,
total = c(N, totals_known),
method = "linear")
y_quant_hat0 <- laeken::weightedQuantile(x = df_resp$y,
probs = probs,
weights = result0*df_resp$d)
x_quant_hat0 <- laeken::weightedQuantile(x = df_resp$x,
probs = probs,
weights = result0*df_resp$d)
## example 1: calibrate only quantiles (deciles)
result1 <- joint_calib(formula_quantiles = ~x,
data = df_resp,
dweights = df_resp$d,
N = N,
pop_quantiles = quants_known,
method = "linear",
backend = "sampling")
## estimate quantiles
y_quant_hat1 <- laeken::weightedQuantile(x = df_resp$y,
probs = probs,
weights = result1$g*df_resp$d)
x_quant_hat1 <- laeken::weightedQuantile(x = df_resp$x,
probs = probs,
weights = result1$g*df_resp$d)
## compare with known
data.frame(standard = y_quant_hat0, est=y_quant_hat1, true=y_quant_true)
#> standard est true
#> 10% -284.3574 -285.34675 -292.97255
#> 20% -131.7079 -131.70792 -128.19010
#> 30% -25.2815 -19.50460 -10.07312
#> 40% 80.5919 84.23786 84.64057
#> 50% 175.5490 178.96015 184.87445
#> 60% 274.0404 279.73343 294.76788
#> 70% 412.2826 426.98679 453.35435
#> 80% 592.0840 606.73082 669.36570
#> 90% 1105.6883 1172.38891 1163.92646
## example 2: calibrate with quantiles (deciles) and totals
result2 <- joint_calib(formula_totals = ~x,
formula_quantiles = ~x,
data = df_resp,
dweights = df_resp$d,
N = N,
pop_quantiles = quants_known,
pop_totals = totals_known,
method = "linear",
backend = "sampling")
## estimate quantiles
y_quant_hat2 <- laeken::weightedQuantile(x = df_resp$y,
probs = probs,
weights = result2$g*df_resp$d)
x_quant_hat2 <- laeken::weightedQuantile(x = df_resp$x,
probs = probs,
weights = result2$g*df_resp$d)
## compare with known
data.frame(standard = y_quant_hat0, est1=y_quant_hat1,
est2=y_quant_hat2, true=y_quant_true)
#> standard est1 est2 true
#> 10% -284.3574 -285.34675 -285.34675 -292.97255
#> 20% -131.7079 -131.70792 -131.70792 -128.19010
#> 30% -25.2815 -19.50460 -21.94192 -10.07312
#> 40% 80.5919 84.23786 84.23786 84.64057
#> 50% 175.5490 178.96015 178.96015 184.87445
#> 60% 274.0404 279.73343 279.73343 294.76788
#> 70% 412.2826 426.98679 426.98679 453.35435
#> 80% 592.0840 606.73082 606.73082 669.36570
#> 90% 1105.6883 1172.38891 1172.38891 1163.92646
## example 3: calibrate wigh quantiles (deciles) and totals with
## hyperbolic sinus (sinh) and survey package
result3 <- joint_calib(formula_totals = ~x,
formula_quantiles = ~x,
data = df_resp,
dweights = df_resp$d,
N = N,
pop_quantiles = quants_known,
pop_totals = totals_known,
method = "sinh",
backend = "survey")
## estimate quantiles
y_quant_hat3 <- laeken::weightedQuantile(x = df_resp$y,
probs = probs,
weights = result3$g*df_resp$d)
x_quant_hat3 <- laeken::weightedQuantile(x = df_resp$x,
probs = probs,
weights = result3$g*df_resp$d)
## example 4: calibrate wigh quantiles (deciles) and totals with ebal package
result4 <- joint_calib(formula_totals = ~x,
formula_quantiles = ~x,
data = df_resp,
dweights = df_resp$d,
N = N,
pop_quantiles = quants_known,
pop_totals = totals_known,
method = "eb",
backend = "ebal")
## estimate quantiles
y_quant_hat4 <- laeken::weightedQuantile(x = df_resp$y,
probs = probs,
weights = result4$g*df_resp$d)
x_quant_hat4 <- laeken::weightedQuantile(x = df_resp$x,
probs = probs,
weights = result4$g*df_resp$d)
## compare with known
data.frame(standard = y_quant_hat0,
est1=y_quant_hat1,
est2=y_quant_hat2,
est3=y_quant_hat3,
est4=y_quant_hat4,
true=y_quant_true)
#> standard est1 est2 est3 est4 true
#> 10% -284.3574 -285.34675 -285.34675 -285.34675 -285.34675 -292.97255
#> 20% -131.7079 -131.70792 -131.70792 -131.70792 -131.70792 -128.19010
#> 30% -25.2815 -19.50460 -21.94192 -21.94192 -21.94192 -10.07312
#> 40% 80.5919 84.23786 84.23786 84.23786 84.23786 84.64057
#> 50% 175.5490 178.96015 178.96015 178.96015 178.96015 184.87445
#> 60% 274.0404 279.73343 279.73343 279.73343 279.73343 294.76788
#> 70% 412.2826 426.98679 426.98679 426.98679 426.98679 453.35435
#> 80% 592.0840 606.73082 606.73082 606.73082 606.73082 669.36570
#> 90% 1105.6883 1172.38891 1172.38891 1172.38891 1172.38891 1163.92646
## compare with known X
data.frame(standard = x_quant_hat0,
est1=x_quant_hat1,
est2=x_quant_hat2,
est3=x_quant_hat3,
est4=x_quant_hat4,
true = quants_known$x)
#> standard est1 est2 est3 est4 true
#> 10% 8.43342 8.08468 8.08468 8.08468 8.08468 8.081984
#> 20% 17.37948 16.27337 16.27337 16.27337 16.15962 16.250618
#> 30% 24.62111 24.32420 24.32420 24.09831 24.09831 24.231978
#> 40% 32.71552 31.67142 31.67142 31.67142 31.65285 31.663990
#> 50% 39.54339 39.18696 39.18696 39.18696 39.18696 39.196020
#> 60% 47.19213 47.53136 47.53136 47.44365 47.44365 47.469023
#> 70% 54.91923 55.60166 55.60166 55.45591 55.45591 55.499635
#> 80% 60.96227 63.90653 63.90653 63.90653 63.90653 63.906013
#> 90% 70.81061 71.60363 71.60363 71.12172 71.12172 71.338558