An internal function for calibration of weights using empirical likelihood method
Source:R/calib_el.R
calib_el.Rd
calib_el
performs calibration using empirical likelihood (EL) method. The function is taken from Wu (2005). If algorithm has problem with convergence constrOptim
is used instead (as in Zhang, Han and Wu (2022)).
In (pseudo) EL the following (pseudo) EL function is maximized
\[\sum_{i \in r} d_i\log(p_i),\]
under the following constraint
\[\sum_{i \in r} p_i = 1,\]
with constraints on quantiles (with notation as in Harms and Duchesne (2006))
\[\sum_{i \in r} p_i(a_{i} - \alpha/N) = 0,\]
where \(a_{i}\) is created using joint_calib_create_matrix
function, and possibly means
\[\sum_{i \in r} p_i(x_{i} - \mu_{x}) = 0,\]
where \(\mu_{x}\) is known population mean of X. For simplicity of notation we assume only one quantile and one mean is known. This can be generalized to multiple quantiles and means.
Usage
calib_el(
X,
d,
totals,
maxit = 50,
tol = 1e-08,
eps = .Machine$double.eps,
att = FALSE,
...
)
Arguments
- X
matrix of variables for calibration of quantiles and totals (first column should be intercept),
- d
initial d-weights for calibration (e.g. design-weights),
- totals
vector of totals (where 1 element is the population size),
- maxit
a numeric value giving the maximum number of iterations,
- tol
the desired accuracy for the iterative procedure,
- eps
the desired accuracy for computing the Moore-Penrose generalized inverse (see
MASS::ginv()
),- att
indicating whether the weights should sum up treatment group (for
joint_calib_att
function),- ...
arguments passed to stats::optim via stats::constrOptim.
References
Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239 (code is taken from https://sas.uwaterloo.ca/~cbwu/Rcodes/LagrangeM2.txt).
Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference. International Statistical Review, 91: 165–192. https://doi.org/10.1111/insr.12518 (code is taken from Supplementary Materials).
Examples
## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
totals_known <- c(N=N, x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
res <- calib_el(X = model.matrix(~x, df_resp),
d = df_resp$d,
totals = totals_known)
data.frame(known = totals_known, estimated=colSums(res*df_resp$d*model.matrix(~x, df_resp)))
#> known estimated
#> N 1000.00 1000.00
#> x 39782.22 39782.22