A comparison of approaches to implementing propensity score methods following multiple imputation

, or

missing data and confounding.
Simply 'ignoring' missing data, which typically means that a complete case analysis is performed, often is inappropriate, because the conditions under which it is unbiased are very restrictive [1][2][3][4].Even when these conditions are met, for example because the complete cases are truly a random subset of the study sample, discarding incomplete records may render the estimator unnecessarily inefficient [1][2][3].An alternative to complete case analysis is multiple imputation (MI), in which missing data are filled in with random draws from their predictive distributions based on the observed data, thereby producing multiple plausible datasets.Inferences are typically made using a simple set of rules known as Rubin's Rules [1].MI is a popular method for dealing with missing data, because it is flexible, relatively easy to implement with readily available statistical packages, and often provides valid estimates of effect and standard errors in situations where simpler techniques, including complete case analysis, fail [1][2][3].
To address the problem of confounding, researchers have traditionally used multivariable regression for data analysis.More recently, the use of propensity score methods has gained increasing interest [5].A subject's propensity score is the conditional probability of being assigned to treatment given their measured covariates [6][7].Among those subjects with the same propensity score, the distribution of measured covariates is expected to be the same between treated and untreated individuals [6].Thus, by conditioning on the propensity score treatment status becomes independent of covariates.Several propensity score methods have been described: stratification, matching on the propensity score, inverse probability of treatment weighting (IPW), and covariate adjustment in multivariable regression [6][7].However, despite increasing popularity, it is largely unclear how these perform in the presence of missing data.
Few have investigated approaches that combine missing data techniques with methods for confounding.Mitra and Reiter studied two approaches that combine multiple imputation with propensity score matching [8].In both, missing covariate data are imputed m times through multiple imputation.For each of the completed datasets, a propensity score is then estimated for each subject.In the so-called Within approach, propensity score analysis is performed within each of m imputed datasets, and the resulting m effect estimates are averaged.In the Across approach, for each subject the m estimated propensity scores are averaged first, after which the propensity score method is implemented based on each subject's average propensity score.While both approaches were shown to be superior to complete case analysis in terms of bias, it was found that the Across method was less biased than the Within method, especially in the presence of missing covariate data [8].However, as with any simulation study, these results may not extend beyond the settings that were considered.For example, while it was assumed that the treatment and outcome variables were fully observed, none have compared the approaches in settings with incomplete data for one or both of these variables.Furthermore, although it has been argued that often the outcome should be included in the imputation model [3,9,10] it was excluded from the imputation model in the previous study.With the outcome included, subsequent simulations have found the Within approach to be preferred [11,12].Nevertheless, in applied research, the Across approach appears to have gained interest since its introduction [13][14][15][16][17][18][19][20][21].
Our aim was therefore to provide further insight into how propensity scores analysis should be applied in combination with multiple imputation.Specifically, we compared the Within and Across approaches in settings with missing covariate data, missing treatment indicators, and missing outcomes.Because of their common use, complete case analyses were also studied.The remainder of this article is structured as follows.The notation and setup for the simulations are detailed in Sections 2 and 3. Section 4 details anticipated sources of bias.Results are presented in Section 5 and discussed in Section 6.Finally, we conclude with a summary in Section 7.

Notation
Suppose the random vector Z = (X 1 ,X 2 ,…,X g ,T,Y) is observed on n subjects.The first g variables of Z represent covariates, whereas T and Y refer to a binary treatment indicator variable and a continuous outcome, respectively.Realisations are printed in lower case letters.We denote an n×(g+2) matrix by Z, whose ith row Z i = (X i1 ,X i2 ,…,X ig ,T i ,Y i ) represents the ith (i = 1,2,...,n) subject's record.For each i,j element in Z, define a missing indicator variable M ij that takes the value of 1 if it is observed and 0 otherwise.Further, we write z = (z obs ,z mis ) to indicate that z can be partitioned into an observed part z obs and a missing part z mis .In multiple imputation, values of z mis are imputed m times by drawing from posterior predictive distributions, resulting in m completed datasets z (k) , k = 1,2,...,m that may be subjected to propensity score analysis.A detailed description of the Across and Within approaches are given in the Supplementary Material.

Simulation methods
We used a series of Monte Carlo simulations to examine the performance of the Across, the Within, and complete case approaches under various missing data mechanisms.The simulations were carried out in several stages.In the first stage, complete data are generated following one of the data generating mechanisms detailed Epidemiology Biostatistics and Public Health -2017, Volume 14, Number 4 PS methods and multiple imputation below.These were chosen for comparability with Mitra and Reiter [8].Second, missing data are introduced into one of the variables.Third, a number of approaches are applied to estimate the treatment-outcome effect.For each scenario (combination of complete data generating mechanism and missing data mechanism), this process was repeated 1000 times.A full factorial design was used.All simulations were conducted with R Statistical Software version 3.1.1[22].For multiple imputation we used the mice package [23].Continuous and binary variables were imputed using the norm and logreg options, respectively.The number of imputations was set to m = 5 for efficiency.For any incomplete variable, all other variables, including the outcome, were included in the imputation model.In all simulations, we used correctly specified propensity score and imputation models.
The value of T i was assigned by drawing from a Bernoulli distribution with parameter (i.e. the probability of treatment assignment) defined as a function of the ith subject's covariate data.In particular, we let where expit(η) is the inverse logit function exp(η)/ (1+exp(η)).As such, the log odds of treatment increases with 0.255 for every unit increase in either X 1 or X 2 .This mechanism assigns approximately 100 subjects to treatment (T=1) and 1000 subjects to the control group (T=0).
We defined the outcome Y i such that, for all i, where ε i is independent of (T i X i1 ,X i2 ).The interest lies in estimating treatment effect β TY = 2, that is, the conditional treatment effect, which-because of homogeneity and the collapsibility of the causal difference in means-equal the marginal treatment effect.Clearly, both covariates serve as a confounder for the association between T and Y.We varied σ 2 = 1,9 to show that larger residual variances (σ 2 =9) correspond with larger discrepancies between Across and Within estimates in the case of missing covariate data.

Missing data mechanisms
To aid understanding, we initially restricted ourselves to simple missing data mechanisms, namely univariate missing completely at random (MCAR) mechanisms, and finally considered univariate missing at random (MAR) settings.The mechanisms for generating missing data were as follows: 1. MCAR covariate values.Any subject's X 2 value was allowed to be missing with probability Pr(M i2 = 1|Z) = p, p = 0.2,0.
These mechanisms set approximately 40% of the subjects' X 2 values to missing.

Effect estimators
For all simulated datasets, the Within and Across estimates were obtained as described in the Supplementary Material.Because of their common use, complete case analyses were also performed for comparison.A number of propensity score methods were investigated.The regression estimates of the treatment effect were obtained by linearly regressing the outcome on treatment and the logit of the estimated propensity score.The term matching is used to refer to pair matching performed by selecting for each treated subject a single untreated control without replacement using a greedy nearest neighbour matching algorithm.No restrictions were placed on the maximum acceptable difference between the propensity scores of any two matched subjects.We also performed matching on the logit of the propensity score using a calliper distance of 0.05.A fourth effect estimator was obtained using inverse probability weighting (IPW) where treated subjects are weighted by the inverse of their propensity score and untreated subjects by the inverse of its complement.Finally, we applied iterative inverse probability weighting (IIPW) using a convergence threshold of 10 -4 and a maximum of 100 iterations per dataset [24].Calliper matching and iterative IPW were used because matching and IPW are sensitive to practical non-positivity [24,25].More details on practical nonpositivity and IIPW are given in the Supplementary Material.

Variance estimation
An appealing property of the standard multiple imputation approach is that it facilitates estimation of standard errors that reflect both the variability in the data and the uncertainty in the imputations.1However, for the Across approach, the between-imputation variance component of Rubin's multiple imputation variance estimator cannot fully capture the uncertainty of the imputations.For example, when there are only missing covariates, the between-imputation variance would be zero, because the same set of propensity scores is used for each dataset.
As an alternative to Rubin's rules for variance estimation, analysts can implement a bootstrapping procedure that is akin to the full mechanism bootstrapping approach described by Efron [26].Here, bootstrapping is implemented as follows: 1 Across approach) to the m imputed datasets to obtain a single effect estimate for the bootstrapped dataset.4. Repeat steps 1-3 B times to obtain B bootstrap replicates.The bootstrap variance and confidence interval for the effect estimate can be obtained from the bootstrap replicates using standard formulae.
For the scenarios with MAR missingness, we estimated variances and confidence intervals using Rubin's rules and the bootstrapping procedure outline above.As discussed, the former can be expected to yield too narrow standard errors and therefore suboptimal coverage.To illustrate this, we applied Rubin's rules for the regression estimators using the modified degrees of freedom formula detailed elsewhere to obtain 95% confidence intervals [27,28].As for bootstrapping, we calculated bootstrap sample variances and 95% percentile confidence intervals, using the 2.5 th and 97.5 th percentiles as the lower and upper bounds, based on 1000 bootstrap samples.

Performance measures
The primary performance measure of interest is bias, estimated by the mean deviation of the estimated effect from the true effect of treatment on the outcome (β TY ) across all 1000 simulations, but we also provide empirical variances and mean squared errors (MSE).
For the MAR scenarios, coverage probabilities and the mean estimated variances relative to the corresponding empirical variances are also provided.Based on 1000 simulations, the Monte Carlo standard error for the true coverage probability of 0.95 is √(0.95(1-0.95)/1000)≈0.0069, implying that the estimated coverage probability is expected to lie with 95% probability between 0.936 and 0.964.29 Empirical coverage rates outside this interval provide evidence against the true coverage probabilities being equivalent to the nominal level of 0.95.

Potential sources of bias
To see how the Within approach following multiple imputation might avoid bias due to missing data, it is instructive to consider large samples, so that uncertainty in imputation model parameters may be ignored.Under correct model specification and missingness that is at random (strictly, 'ignorable' in the sense defined by Rubin1), multiple imputation allows for the information lost because of missingness to be restored in such a way that the imputed datasets follow closely the distribution of the full data.Therefore, any analysis procedure may be anticipated to give similar results when applied to the imputed data and to the unobserved full data.
Note that only the Within approach fully adheres to Rubin's original MI algorithm, where averaging across imputations is deferred until the last step.In the context of propensity score matching, this may seem unsatisfactory.Untreated subjects who would be considered unsuitable matches based on their 'true' propensity scores, may be included in the matched set because by random variability their estimated propensity scores, based on the imputed data, better resemble the treated subjects' propensity scores.Untreated subjects whose propensity scores are overestimated tend to be included in the matched set; conversely, untreated subjects whose propensity scores are underestimated tend to be left out.This may then lead to bias by a systematic lack of exchangeability between treatment groups of the matched pseudopopulations.Intuitively, the Across approach may be preferable because of the lesser reliance on random variability.This problem of random variability is due to the nature of propensity score matchingbased estimators, and is not expected to introduce bias when for example regression adjustment is used.
The Across approach would appear more robust against the aforementioned source of bias, because matched pseudopopulations are formed after the pooling of propensity scores across imputed datasets.However, for large m the Across approach is comparable to conditional mean imputation, which, as illustrated in the Supplementary Material, may also introduce bias.Given its resemblance, the Across approach is also expected to be biased in the case of missing covariate data.
When treatment or outcome data are missing and covariate data are fully observed, the Within and Across approaches should yield similar results.Consider again a setting with MCAR missingness, now affecting treatment indicator values only.With large samples, propensity score model estimates would be similar across imputations.Since all covariate values are observed, this implies that propensity score estimates too would exhibit little to no variation across the complete datasets, rendering the Within and Across approaches effectively indiscernible.With complete treatment and covariate data, the propensity score estimates would be exactly the same across imputed datasets, because the outcome does not enter the propensity score model fitting.Therefore, Within and Across estimates would be identical under missing outcome values only.
Daniel et al. showed how causal diagrams can be used to infer that in nearly all scenarios considered here, conditioning on the complete cases (i.e.prior to matching, IPW, or IIPW) does not itself induce bias.4It follows that when other sources of bias, here practical non-positivity and confounding, are adequately addressed, the treatment effect can be validly estimated.Among the missing data mechanisms considered, it is only MAR2 that biases complete case analysis, namely by inducing a relation between treatment status and the (unmeasured) error on the outcome through collider stratification.
MAR1 is an example of a mechanism that accentuates practical non-positivity.Under this mechanism, untreated subjects with large X 1 values are more likely to be have missing X 2 values than others.When untreated subjects have systematically lower X 2 values even before the introduction of missingness, the consequence of this mechanism is that the propensity score distributions of groups of treated and untreated subjects become more distinct.As a result, estimators that are sensitive to practical non-positivity (e.g.matching and IPW) become more biased when incomplete records are discarded.Note that the matching and IPW methods described in Section 3.3 are estimators of the average effect on the treated (ATT) and the average effect (ATE) on all subjects, respectively [30].A sufficient condition for these measures of effect to coincide is that of collapsibility and homogeneity of the treatment effect.This joint condition is met in our simulations.The assumptions of ATT and ATE estimators with respect to positivity are, however, not the same.ATE estimators require the covariate distributions of the treated and untreated to have common support, whereas ATT estimators require only the support of the treated to be shared by that of the untreated but not vice versa [31].This may in part explain possible differences between matching versus IPW estimates.

Bias
In this section, we present graphically the estimated biases for the effect estimators of interest.Results on these and other performance measures are presented in tabular form in the Supplementary Material.

Missing (MCAR) covariate values
Figure 1 depicts the estimated biases for the scenarios with MCAR covariate data.Apart from those based on matching or IPW, the complete case and Within estimators were not identifiably biased.The Across approach, however, showed substantial bias, especially when either the missingness probability, the residual variance σ 2 or both were large.The regression-, matching-, calliper matching-, and IIPW-based estimators were all negatively biased for the Across approach.In contrast, Across IPW estimates were on average overestimated.Complete case matching and IPW estimates were also systematically overestimated, with the extent of bias increasing with the extent of missingness.

Missing (MCAR) treatment indicator values
Figure 2 depicts the estimated biases for the scenarios with MCAR treatment indicator values.The Across and Within estimates were on average highly similar.Apparent from the figure is also the trend that as the percentage of incomplete cases increases, the treatment effect becomes on average progressively more underestimated by both the Across and Within estimators.Conversely, the complete case matching and IPW estimators systematically overestimated the treatment effect, particularly for large missingness probabilities.

Missing (MCAR) outcome values
Figure 3 depicts the estimated biases for the scenarios with MCAR outcomes.For all propensity score methods, the Across and Within estimators yielded identical results.Again, the complete case matching and IPW estimators showed bias, particularly when the extent of missingness was large.The corresponding Within and Across estimators were less biased.The regression-, calliper matching-, and IIPW-based estimators resulted in minimal bias.

Missing (MAR) covariate values
Figure 4 depicts the estimated biases for the scenarios with MAR covariate data.The complete case matching and IPW estimators generally showed more bias than in the corresponding MCAR covariate settings with a comparable proportion of incomplete records (40%).The regression-, calliper matching-, and IIPW-based estimators showed minimal bias for both the complete case analysis and Within approach when the missingness of X 2 depended on X 1 and T (mechanism MAR1).As for the scenarios where the missingness dependend on the outcome Y (MAR2), the Within but not the complete case approach yielded estimates close to the true treatment effect.As before, Across estimates were systematically too low for the regression-, matching-, calliper matching-, and IIPW-based estimators.

Other performance measures
In general, the Within estimators were associated with the smallest empirical variances and MSEs.The simulations also illustrate the implications of using Rubin's rules in estimating the variance.The variances of the Across regression estimators were underestimated and the coverage probability too low (see Supplementary Material).Conversely, when applying the bootstrapping procedure, the estimated variances were on average close to the respective empirical variances.Despite generally adequate coverage probabilities for the Within approach, the variances for calliper matching-, and IIPW-based estimators were on average overestimated.

DISCUSSION
Our primary focus was on examining the relative performance of two approaches to implementing propensity score methods following multiple imputation.Although the Across approach has been applied in practice, our simulations show that, as expected, it fails in settings with missing confounder data, even when the missingness is completely at random and complete case estimators are unbiased.
As stated, untreated subjects with propensity scores that are by random variability underestimated are more likely to be selected as matches than subjects whose propensity score is overestimated.This problem of random variability is inherent to propensity score methods, and is not expected to introduce bias when for example regression adjustment is used.However, its impact was negligible in our simulations, because the calliper matching estimates were highly similar to the regression estimates.The second explanation for the discrepancy in bias between the approaches rests on the resemblance of the Across approach to conditional mean imputation in the context of missing covariate data.This explanation is consistent with our observations that the Across approach PS methods and multiple imputation showed more bias for larger missingness probabilities and larger residual variances.
Conversely, with complete confounder data, the Across approach is not comparable to conditional mean imputation.Instead, the bias observed in settings with missing treatment indicator values probably is largely attributable to a phenomenon, known as separation or 'perfect prediction', that is associated with regression models for categorical responses.Separation occurs if the responses, here treatment status, can be perfectly separated by a single predictor or a linear combination of predictors.The problem lies with the Normal approximation to the posterior distribution of the parameters of the logistic regression model that is used by the software to predict missing treatment indicator values.When in the presence of separation, logistic regression is applied to the complete cases, modelling the probability of being assigned to treatment as Pr(T i = 1|X i1 ,X i2 ,Y i ) = expit(α0 + α 1 X i 1 + α 2 X i 2 + α 3 Y i ), then we can find an infinite sequence of parameter specifications with monotonically increasing likelihood converging to unity, such that for at least one parameter α the estimate a tends to infinity [32,33].Hence, the maximum likelihood estimate does not exist.Nevertheless, given the near-flat nature of the likelihood, typically very large values for the maximum likelihood estimate a and its variance are returned by standard software.If the Normal approximation to the posterior distribution of the parameters is applied, then it is not unlikely that values are drawn such that in the imputation step subjects with incomplete data are assigned to the treatment group whilst the observed data clearly suggests that these subjects should be assigned to the control group [32].In other words, the Normal approximation to the posterior distribution is poor.One way to prevent these implausible imputations is to add to the dataset a few observations such that separation is no longer present and with such small weights that the impact on the imputation model is limited [34].mice implements such a data augmentation method to deal with this phenomenon [3,34].but we suspect that in our simulations the impact of the weights was large enough to produce bias.
Although unbiasedness is arguably more crucial than valid variance estimation, sufficiently large variances, even if they can be estimated validly, may render unbiased estimators of little practical use [29].In our simulations, the Within approach was superior or comparable to the Across in terms of either criterion.Another drawback of the latter approach is the difficulty in making inferences as to the precision of effect estimates.Bootstrapping may provide correct standard errors, but we acknowledge that this approach is computationally intensive.Further, because coverage is affected by the bias in both variance and effect estimation, it is likely to be poor in general for Across estimators.Bootstrapping for the (calliper) matching estimators here yielded slightly overestimated variances and conservative empirical coverage rates.A similar phenomenon was observed by Austin and Small [35].The bootstrapping procedure defined in Section 3.4 resembles the 'complex bootstrap' of Austin and Small.The rather large discrepancies between the mean estimated variances and the empirical variances for the IIPW estimators are possibly attributable to the unstable nature of inverse probability weighting.Further investigation and development of bootstrapping approaches to variance estimation for the (calliper) matching and (iterative) IPW estimators represent an interesting direction for future research.
Our findings contrast with those of Mitra and Reiter [8].A crucial difference between the simulations is the inclusion of the outcome in the model used to impute missing covariate values.Failing to include the outcome leads to imputed datasets that do not reflect the association between covariate and outcome that would have been observed had there been no missing values.The consequence of this is that if one adjusts for the imputed covariate values to estimate the treatment effect, the variation in outcomes between the treatment groups that is due to the partially unobserved covariate would in part be attributed to the differences in treatment status.
As with any simulation study, an important limitation of this study is the potentially limited generalisability.The scenarios considered here represent only a small and simplified subset of those likely to be encountered in applied research.Some of the missingness probabilities that were studied are probably unrealistic, and only a single sample size was considered.Practical non-positivity and separation are perhaps less relevant in settings with larger samples and fewer incomplete cases.Furthermore, we considered only two covariates and assumed that the propensity score and imputation models could be correctly specified.Applied researchers do not have the luxury of knowing the data generating and missing data mechanisms and often need to assess and account for

Abbreviations: C. matching, calliper matching; IPW, inverse probability weighting; IIPW, iterative inverse probability weighting; CCA, complete case analysis. Error bars represent 95% confidence intervals for the simulation estimates of bias.
Epidemiology Biostatistics and Public Health -2017, Volume 14, Number 4 PS methods and multiple imputation multiple sources of bias.However, rather than scrutinising methods for these issues in isolation only, it may be interesting to additionally study how they perform in combination.Conducting simulations for specific scenarios of interest may be particularly desirable given the limited generalisability of our results.If these are not possible, we advise researchers not to use the Across approach as the default method, because it appears to offer no advantage over the Within method.

CONCLUSION
In medical research, confounding and missing data are common problems that often occur simultaneously.Complete case analysis, although valid under various circumstances, is discouraged as the default procedure, because it leads to a loss of valuable information and it is typically unknown whether the conditions under which complete case analyses are valid are satisfied.When multiple imputation is to be followed by the implementation of a propensity score method, researchers could apply the Across and Within approaches.The present study highlights a number of aspects of the Across approach that render it suboptimal.Our simulations indicate that the Within approach is superior to the Across approach in terms of both bias and variance in settings with missing confounder data.For incomplete treatment and/or outcome data, the approaches yield similar estimates.We advise researchers not to use the Across approach as the default method, because even in MCAR settings, this may yield biased effect estimates.Finally, when matching or IPW are the propensity methods of choice, we recommend practical non-positivity to be adequately addressed, e.g. by using a narrow calliper or an iterative reweighting algorithm.One should be aware, however, that trimming away or down-weighting observations may direct the focus of inference to a narrower population that may not reflect that of primary interest.

SUPPLEMENTARY MATERIAL
This Supplementary Material has three parts.The first contains a discussion of the Within and Across approaches; the second part reviews the positivity assumption and provides sample R code for the iterative inverse probability weighting estimators for the complete case, Across and Within approaches; and the third part provides the results of the simulation studies on all performance measures.

A Within Across approaches
Two approaches to implementing propensity score methods following multiple imputation to address missing data are described: the and Across approaches [1].The first step in the analysis procedure is to estimate within each of m imputed datasets a vector of propensity scores e ) typically using logistic regression.In the Within approach, any propensity score method is implemented within each completed dataset using e (k) , yielding an estimate of the relation between treatment and outcome, β w (k) . The Within estimate is defined as the average of β w ,β A (m) are averaged, yielding the single estimate β A .This procedure deviates slightly from the Across approach described by Mitra and Reiter [1].The modified Across approach described here is equivalent to the original procedure when T and Y are fully observed, but can additionally accommodate missings on T and/ or Y. Henceforth, we will refer to this modified procedure simply as the Across approach.
In what follows, we first give a heuristic explanation for the bias due to the Across approach in a simple setting with missing covariate data and then offer more technical arguments.Consider for simplicity the case with only a single continuous covariate X, and suppose that the treatment-outcome effect can be parameterised through the linear regression model where β 1 is the parameter of interest.Further, assume that the relationship between the probability of being assigned to treatment (T = 1) given X can be modelled by a logistic function In this case, we may rewrite the treatment-outcome model in terms of a linear transformation logit e(X)=α 0 +α 1 X of the covariate values, the logit of the propensity score e(X), The ordinary least squares estimator of the true treatment effect is unbiased if the regressors of the linear model are T and X, or T and β 1 .A similar observation can be made in the case of multiple covariates [2].However, if we impute missing X values using conditional mean imputation, and regress Y on T and the (linearly transformed) imputed covariates to estimate the treatment effect, then, as illustrated in Figure 1, the estimator will be biased-provided that the conditional variance of Y given T and X is greater than zero and treatment assignment depends on X.
Likewise, in the case of missing (e.g.MCAR) X values, averaging (transformed) imputed values across many multiply imputed datasets (i.e.effectively conditional mean imputation) also renders the effect estimator biased.The crux of the matter lies in that the default imputation model is the linear regression with X as the dependent variable and T and Y as the independent variables, whereas the analysis model regards Y as the dependent variable.Switching dependent and independent variables results in best fit equations that are not in general equivalent (unless orthogonal regression is used).Bias can therefore also be expected for the Across approach, because in the context of missing covariate data it is comparable to conditional mean imputation, except that taking the logit of the average propensity score is not the same as taking the average of the logit of propensity scores.
We shall now describe the asymptotic behaviour of the Across approach in a simple setting with a binary confounder Let ε be a uniformly distributed random variable over the interval (0, 1).Define Y to be equal to ε < f(T,X) 1 if for some deterministic mapping f to the interval (0, 1), and let it be 0 otherwise.For any subject with realisations u of ε and x of X, let Y 0 =I(u<f(0,x)) and Y 1 = I(u<f (1,x)) being the indicator function) denote the outcomes if treatment were set, possibly contrary to to 0 and 1, respectively.Suppose further that ε is independent of T given X, so that there is no unmeasured confounding, i.e., (Y 0 ,Y 1 )⊥T|X.We also assume consistency throughout, i.e., that for any treated subject is observed and for any untreated subject is observed.In addition, it is assumed that positivity (i.e., 0<Pr(T=1|X=x)<1 for x=0,1) holds.
The marginal odds ratio OR for the causal effect of T on Y among the treated (ATT) is  Thus, the right-hand side of ( 4) becomes E X {Pr(Y=1|T=0,X) |T=1}, and, therefore, ( 3) is equal to the right-hand side of (2).Now suppose that X is unobserved with constant nonzero probability.Clearly, since the missingness is completely at random, an application of the above inverse probability weighting estimator to the complete records would result in correct inference as to the OR.However, the missing data mechanism is typically unknown to the researcher, so that one might opt for multiple imputation and proceed under the less restrictive ignorability assumption.Since all variables under consideration are binary, misspecification may easily be averted by selecting a fully saturated imputation model, e.g., an additive logistic regression of X on T and Y, allowing for interaction between T and Y, so that Pr(X=1|T,Y) is consistently estimated from the observed data.Following imputation, the propensity score model parameterised by Pr(T=1|X) can also be consistently estimated from each imputed sample.
With the number of imputations m and sample size tending to infinity, the Across approach produces estimated propensity scores that equal their true equivalents for subjects with observed covariate values; however, for an incomplete record with realisations t and y of T and Y, the estimated propensity score equals (5) The Across propensity score estimate (5) is not generally equal to the respective true propensity score.Sufficient conditions for equality (and therefore asymptotic unbiasedness of the Across approach) are (S1) the independence between T and X and (S2) the deterministic relation between X and Y given T.
We may compute the quantity estimated by the Across approach in the context of inverse probability weighting, with weights defined with the ATT in mind, as follows.As above, the weights are defined as a function of the (estimated) propensity scores; specifically, redefine W* such that with A denotinig ∑ x Pr(T=1|X=x) Pr(X=x|T=0,Y) and M being the indicator variable that takes the value of 1 if X is missing and 0 otherwise.Next, observe that E[WY|T=1]=E[Y|T=1],, as before, and The numerator of ( 6) may be evaluated using Similarly, the denominator may be evaluated using Substituting E[WY|T=1] and E[WY|T=0] in (3) with ( 6) and E[WY|T=1], respectively, yields the quantity asymptotically estimated by the Across approach.
To fix ideas, suppose that X, T, Y, and M are distributed such that The discrepancies between the true OR of 1 (consistently estimated by the Within approach) and the asymptotic results of the Across approach for the above distribution and one-way deviations are depicted in Figure 2.

PS methods and multiple imputation
# Iterative Inverse Probability of Treatment Weighting # Returns a list with weights and number of iterations data, # a dataframe containing columns T, X1, X2 and Y formula =`T~X1+X2', T =`T'

B Practical non-positivity and iterative inverse probability weighting
Given the propensity score, treated and untreated subjects tend to be comparable or 'exchangeable' with respect to measured covariates.In 'counterfactual outcomes parlance, the distribution of potential outcomes is expected to be the same for subjects with the same propensity score.Estimators typically compare outcomes between groups of subjects with 'exchangeable' observations.However, these comparisons become problematic if a treatment group has observations that are not 'exchangeable' with any of those in the other group.This problem represents a violation of the practical positivity assumption, i.e. that for each level of confounders there are records of both treated and untreated individuals in the dataset.Regression adjustment responds to this problem by extrapolating over covariate/propensity score regions of non-positivity through modelling the outcome, which may be appropriate under homogeneity assumptions, yet less attractive when these are not tenable.Indeed, one might question the ability of the model to accurately predict outside the region of positivity.Semi-or nonparametric methods, such as matching and IPW, do not involve explicit modelling of the outcome with respect to the PS, but are sensitive to practical non-positivity.
The consequences in terms of bias due to practical non-positivity are perhaps most intuitive in the setting where propensity score matching is used to estimate the ATT.When the upper tail of the propensity score distribution of the treated group shares no support with the propensity score distribution of the untreated group, it may be that for those subjects in this propensity score region the most suitable (closest in terms of the propensity score) matches are those with a substantially lower propensity score.As a result, unless corrections for non-positivity are made, treatment is still associated with the covariates in the matched set, potentially leading to bias.
IPW adjusts for confounding by weighting observations such that the association between treatment and confounders is removed.Suppose that at some (possibly multidimensional) covariate level, there are only treated subjects in the sample.With mere categorical confounders, this would preclude the fitting of a propensity score model.With continuous covariates, random zeros are inevitable because of the infinite number of confounder levels.In such settings, parametric models can be used to obtain estimated propensity scores by 'borrowing' information from individuals with similar covariate values to those that have zero probability of occurring.However, although weights may be defined at every level of the covariate, reweighting observations that have zero frequency of occurring is impossible.As such, if zero frequencies of being assigned to treatment (but not to the control group) tend to occur in, say, the lower or upper end of the covariate distribution, treatment will still be associated with the covariate in the pseudopopulation, so that the IPW estimator may be biased by residual confounding.
To improve covariate balance in the presence of practical non-positivity, Van der Wal proposed an algorithm in which the dataset is iteratively reweighted [3].The idea underpinning this algorithm is as follows.By fitting a propensity model on the weighted dataset, new weights can be estimated that (partially) adjust for the residual confounding.Multiplying these weights with the original yields weights that, when applied to the dataset, correct for confounding more than the original weights.As the covariate balance improves, the probability of being assigned to treatment becomes less dependent on the covariate values, and so the variance of the log-transformed new weights reduces.The above process is therefore repeated until the variance drops below a convergence threshold.
The iterative inverse probability weighting (IIPW) algorithm was defined in the context of complete data.With multiply imputed data, one can apply IIPW within each imputed dataset, in a way consistent with the Within approach, until the algorithm converges within each dataset or until a maximum number of iterations is reached.Alternatively, at each iteration one can average the estimated propensity scores across the imputed datasets, as per the Across approach, before reweighting the imputed datasets.Sample R code for these algorithms is given below.

FIGURE 1 .
FIGURE 1. Biases of treatment effect estimators for various degrees of missing (MCAR) covariate data and residual variances σ 2 .

FIGURE 2 .
FIGURE 2. Biases of treatment effect estimators for various degrees of missing (MCAR) treatment indicator values and residual variances σ 2 .

FIGURE 3 .
FIGURE 3. Biases of treatment effect estimators for various degrees of missing (MCAR) outcomes and residual variances σ 2 .

. 1 A
In the Across approach, the propensity score method is implemented within each completed dataset, now using e A =(e

FIGURE 1 .
FIGURE 1.A single random sample (left) with missing completely at random (MCAR) covariate data imputed using conditional mean imputation (right).A valid treatment effect (A) is obtained by the regression of Y on T and X (the analysis model) applied to the complete cases.Applying the analysis model to the subset with covariate values imputed through conditional mean imputation yields a biased treatment effect estimate (B).

FIGURE 2 .
FIGURE 2. Asymptotic results (solid lines) of the Across inverse probability weighting estimator of the odds ratio for the marginal causal treatment-outcome effect in a hypothetical setting with a binary confounder, binary treatment variable and binary outcome (default parameter values given in text).Dashed lines indicate the true OR of 1. S1 and S2 represent sufficient conditions for asymptotic unbiasedness; see text.

TABLE 1 .
Performance of treatment effect estimators for various degrees p of missing (MCAR) covariate data and residual variances σ 2 .

TABLE 2 .
Performance of treatment effect estimators for various degrees p of missing (MCAR) treatment indicator values and residual variances σ 2 .

TABLE 3 .
Performance of treatment effect estimators for various degrees p of missing (MCAR) outcomes and residual variances σ 2 .