An internal function for calibration of weights using empirical likelihood method

calib_el performs calibration using empirical likelihood (EL) method. The function is taken from Wu (2005). If algorithm has problem with convergence constrOptim is used instead (as in Zhang, Han and Wu (2022)).

In (pseudo) EL the following (pseudo) EL function is maximized

\[\sum_{i \in r} d_i\log(p_i),\]

under the following constraint

\[\sum_{i \in r} p_i = 1,\]

with constraints on quantiles (with notation as in Harms and Duchesne (2006))

\[\sum_{i \in r} p_i(a_{i} - \alpha/N) = 0,\]

where \(a_{i}\) is created using joint_calib_create_matrix function, and possibly means

\[\sum_{i \in r} p_i(x_{i} - \mu_{x}) = 0,\]

where \(\mu_{x}\) is known population mean of X. For simplicity of notation we assume only one quantile and one mean is known. This can be generalized to multiple quantiles and means.

Usage

calib_el(
  X,
  d,
  totals,
  maxit = 50,
  tol = 1e-08,
  eps = .Machine$double.eps,
  att = FALSE,
  ...
)

Arguments

X: matrix of variables for calibration of quantiles and totals (first column should be intercept),
d: initial d-weights for calibration (e.g. design-weights),
totals: vector of totals (where 1 element is the population size),
maxit: a numeric value giving the maximum number of iterations,
tol: the desired accuracy for the iterative procedure,
eps: the desired accuracy for computing the Moore-Penrose generalized inverse (see MASS::ginv()),
att: indicating whether the weights should sum up treatment group (for joint_calib_att function),
...: arguments passed to stats::optim via stats::constrOptim.

Value

Returns a vector of empirical likelihood g-weights

References

Wu, C. (2005). Algorithms and R codes for the pseudo empirical likelihood method in survey sampling. Survey Methodology, 31(2), 239 (code is taken from https://sas.uwaterloo.ca/~cbwu/Rcodes/LagrangeM2.txt).

Zhang, S., Han, P., and Wu, C. (2023) Calibration Techniques Encompassing Survey Sampling, Missing Data Analysis and Causal Inference. International Statistical Review, 91: 165–192. https://doi.org/10.1111/insr.12518 (code is taken from Supplementary Materials).

Author

Maciej Beręsewicz based on Wu (2005) and Zhang, Han and Wu (2022)

Examples

## generate data based on Haziza and Lesage (2016)
set.seed(123)
N <- 1000
x <- runif(N, 0, 80)
y <- exp(-0.1 + 0.1*x) + rnorm(N, 0, 300)
p <- rbinom(N, 1, prob = exp(-0.2 - 0.014*x))
totals_known <- c(N=N, x=sum(x))
df <- data.frame(x, y, p)
df_resp <- df[df$p == 1, ]
df_resp$d <- N/nrow(df_resp)
res <- calib_el(X = model.matrix(~x, df_resp),
                d = df_resp$d,
                totals = totals_known)
data.frame(known = totals_known, estimated=colSums(res*df_resp$d*model.matrix(~x, df_resp)))
#>      known estimated
#> N  1000.00   1000.00
#> x 39782.22  39782.22