nonprobsvy 0.2.0
Breaking changes
- functions
pop.size
,controlSel
,controlOut
andcontrolInf
were renamed topop_size
,control_sel
,control_out
andcontrol_inf
respectively. - function
genSimData
removed completely as it is not used anywhere in the package. - argument
maxLik_method
renamed tomaxlik_method
in thecontrol_sel
function. -
control_out
function:-
predictive_match
renamed topmm_match_type
to align with the PMM (Predictive Mean Matching) estimator naming convention, where all related parameters start withpmm_
-
-
control_sel
function:- argument
method
removed as it was not used - argument
est_method_sel
renamed toest_method
- argument
h
renamed togee_h_fun
to make this more readable to the user -
start_type
now accepts onlyzero
andmle
(forgee
models only).
- argument
-
control_inf
function:-
bias_inf
renamed tovars_combine
and type changed tological
.TRUE
if variables (its levels) should be combined after variable selection algorithm for the doubly robust approach. -
pi_ij
– argument removed as it is not used.
-
-
nonprobsvy
class renamed tononprob
and all related method adjusted to this change - functions
logit_model_nonprobsvy
,probit_model_nonprobsvy
andcloglog_model_nonprobsvy
removed in the favour of more readablemethod_ps
function that specifies the propensity score model - new option
control_inference=control_inf(vars_combine=TRUE)
which allows doubly robust estimator to combine variables prior estimation i.e. ifselection=~x1+x2
andy~x1+x3
then the following models are fittedselection=~x1+x2+x3
andy~x1+x2+x3
. By default we setcontrol_inference=control_inf(vars_combine=FALSE)
. Note that this behaviour is assumed independently from variable selection.
Features
- two additional datasets have been included:
jvs
(Job Vacancy Survey; a probability sample survey) andadmin
(Central Job Offers Database; a non-probability sample survey). The units and auxiliary variables have been aligned in a way that allows the data to be integrated using the methods implemented in this package. - a
check_balance
function was added to check the balance in the totals of the variables based on the weighted weights between the non-probability and probability samples. - citation file added.
- new generic methods added:
-
weights
– returns IPW weights -
update
– allows to update thenonprob
class object
-
- new functions added and exported:
-
method_ps
– for modelling propensity score -
method_glm
– for modelling y usingglm
function -
method_nn
– for the NN method -
method_pmm
– for the PMM method -
method_npar
– for the non-parametric method
-
- new
print.nonprob
,summary.nonprob
andprint.nonprob_summary
methods, i.e.
> result_mi
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.1817
- variable y2: 1.8087
- selected estimators:
- variable y1: 2.9498 (se=0.0420, ci=(2.8674, 3.0322))
- variable y2: 1.5760 (se=0.0326, ci=(1.5122, 1.6399))
number of digits can be changed using print(x, digits)
as shown below
> print(result_mi,2)
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.18
- variable y2: 1.81
- selected estimators:
- variable y1: 2.95 (se=0.04, ci=(2.87, 3.03))
- variable y2: 1.58 (se=0.03, ci=(1.51, 1.64))
> summary(result_mi) |> print(digits=2)
A nonprob_summary object
- call: nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 +
y2 ~ x1 + x2, svydesign = sample_prob)
- estimator type: mass imputation
- nonprob sample size: 693011 (69.3%)
- prob sample size: 1000 (0.1%)
- population size: 1000000 (fixed: false)
- detailed information about models are stored in list element(s): "outcome"
----------------------------------------------------------------
- distribution of outcome residuals:
- y1: min: -4.79; mean: 0.00; median: 0.00; max: 4.54
- y2: min: -4.96; mean: -0.00; median: -0.07; max: 12.25
- distribution of outcome predictions (nonprob sample):
- y1: min: -2.72; mean: 3.18; median: 3.04; max: 16.28
- y2: min: -1.55; mean: 1.81; median: 1.58; max: 13.92
- distribution of outcome predictions (prob sample):
- y1: min: -0.46; mean: 2.95; median: 2.84; max: 10.31
- y2: min: -0.58; mean: 1.58; median: 1.39; max: 7.87
----------------------------------------------------------------
Bugfixes
- basic methods and functions related to variance estimation, weights and probability linking methods have been rewritten in a more optimal and readable way.
Other
- more informative error messages added.
- documentation improved.
- switching completely to snake_case.
- extensive cleaning of the code.
- more unit-tests added.
- new dependencies:
formula.tools
Documentation
- annotation has been added that arguments such as
strata
,subset
andna_action
are not supported for the time being.
Replication materials
- to verify the quality of the software please refer to the replication materials available here: https://github.com/ncn-foreigners/software-tutorials
nonprobsvy 0.1.1
CRAN release: 2024-11-14
Bugfixes
- bug Fix occurring when estimation was based on auxiliary variable, which led to compression of the data from the frame to the vector.
- bug Fix related to not passing
maxit
argument fromcontrolSel
function to internally usednleqslv
function - bug Fix related to storing
vector
inmodel_frame
when predictingy_hat
in mass imputationglm
model when X is based in one auxiliary variable only - fix provided converting it todata.frame
object.
Features
- added information to
summary
about quality of estimation basing on difference between estimated and known total values of auxiliary variables - added estimation of exact standard error for k-nearest neighbor estimator.
- added breaking change to
controlOut
function by switching values forpredictive_match
argument. From now on, thepredictive_match = 1
means \hat{y}-\hat{y} in predictive mean matching imputation andpredictive_match = 2
corresponds to \hat{y}-y matching. - implemented
div
option when variable selection (more in documentation) for doubly robust estimation. - added more insights to
nonprob
output such as gradient, hessian and jacobian derived from IPW estimation formle
andgee
methods whenIPW
orDR
model executed. - added estimated inclusion probabilities and its derivatives for probability and non-probability samples to
nonprob
output whenIPW
orDR
model executed. - added
model_frame
matrix data from probability sample used for mass imputation tononprob
whenMI
orDR
model executed.
nonprobsvy 0.1.0
CRAN release: 2024-04-04
Features
- implemented population mean estimation using doubly robust, inverse probability weighting and mass imputation methods
- implemented inverse probability weighting models with Maximum Likelihood Estimation and Generalized Estimating Equations methods with
logit
,complementary log-log
andprobit
link functions. - implemented
generalized linear models
,nearest neighbours
andpredictive mean matching
methods for Mass Imputation - implemented bias correction estimators for doubly-robust approach
- implemented estimation methods when vector of population means/totals is available
- implemented variables selection with
SCAD
,LASSO
andMCP
penalization equations - implemented
analytic
andbootstrap
(with parallel computation -doSNOW
package) variance for described estimators - added control parameters for models
- added S3 methods for object of
nonprob
class such as-
nobs
for samples size -
pop.size
for population size estimation -
residuals
for residuals of the inverse probability weighting model -
cooks.distance
for identifying influential observations that have a significant impact on the parameter estimates -
hatvalues
for measuring the leverage of individual observations -
logLik
for computing the log-likelihood of the model, -
AIC
(Akaike Information Criterion) for evaluating the model based on the trade-off between goodness of fit and complexity, helping in model selection -
BIC
(Bayesian Information Criterion) for a similar purpose as AIC but with a stronger penalty for model complexity -
confint
for calculating confidence intervals around parameter estimates -
vcov
for obtaining the variance-covariance matrix of the parameter estimates -
deviance
for assessing the goodness of fit of the model
-