Handling missing continuous outcome data in a Bayesian network meta-analysis

Background: A Bayesian network meta-analysis (NMA) model is a statistical method aimed at estimating the relative effects of multiple interventions against the same disease. The method has recently gained prominence, leading to the synthesis of the evidence regarding rank probabilities for each treatment. In several cases, an NMA is performed excluding incomplete data of studies retrieved through a systematic review, resulting in a loss of precision and power. Methods: There are several methods for handling missing or incomplete data in an NMA framework, especially for continuous outcomes. In certain cases, only baseline and follow-up measurements are available; in this framework, to obtain data regarding mean changes, it is necessary to consider the pre-post study correlation. In this context, in a Bayesian setting, several authors suggest imputation strategies for pre-post correlation. In other cases, a variability measure associated with a mean change score might be unavailable. Different imputation methods have been suggested, such as those based on maximum standard deviation imputation. The purpose of this study is to verify the robustness of Bayesian NMA models concerning different imputation strategies through simulations. Results: Simulation results show that the bias is notably small for every scenario, confirming that rankings provided by models are robust concerning different imputation methods in several heterogeneity-correlation settings. Conclusions: This NMA method seems to be more robust to missing data imputation when data reported in different studies are generated in a low-heterogeneity scenario. The NMA method seems to be more robust to missing value imputation if the expectation of the prior distribution, defined on the heterogeneity parameter, approaches the true value of the variability across studies.


INTRODUCTION
Network meta-analysis (NMA) is an increasingly popular statistical method used to perform a simultaneous comparison in the effectiveness of treatments against the same disease [1,2].The possibility to summarize evidence from studies reporting direct treatment comparisons and to infer indirect effects between interventions that have not been directly compared in head-to-head studies makes this method very attractive for clinical researchers [3].NMA's advantage over classical pairwise metaanalysis lies in gathering direct and indirect evidence to provide comparisons between the relative effectiveness of all treatments included in the network [4].In this framework, calculations of the probability of one treatment being the best or worst for a specific outcome through the final ranking of available treatments against the same disease are usually employed to facilitate the interpretation of the results [5,6].NMA models are available for a broad range of underlying data and summary effect measures, being implemented in both frequentist and Bayesian frameworks [7].
Among studies identified through a systematic literature review, only those reporting complete data on a treatment effect and its uncertainty usually contribute to an NMA model.In the literature, it is assessed that, for quantitative synthesis of evidence, the continuous outcome is more difficult to synthesize in an NMA than a binary endpoint for several reasons.For example, the research articles may not report all the information required to perform a meta-analysis [8].The most common method to analyze this kind of outcome in meta-analysis is to consider the mean change from baseline, including also an associated variability measure (standard deviation SD) [9].However, in a different situation [10], no or incomplete variability measures have been reported in clinical trial publications [11].In several cases, an NMA performed on a continuous endpoint involves the exclusion of the trial from the NMA due to the incompleteness of information [11].
The omission of studies with incomplete information may affect the results of an NMA through a loss of statistical power and bias in the final estimates [11].By far the most common approach to the missing data is to omit those studies with the missing data from the analysis.This approach is known as the complete-case analysis or list-wise deletion.Some authors suggest that, implicitly, in this case, all missing data are considered as missing completely at random (MCAR) [12].If the missing data are MCAR, they are entirely independent of observed and unobserved data [13].This assumption is not verifiable for data about candidate studies to be included in an NMA model; the missing information may be considered a function of the missing data itself and may be influenced by other variables related to study characteristics, for example, sample size [14] or the effect size reported in the study [15].In fact, significant results are more likely to be reported entirely in clinical trial publications [10,16].
This study investigates the problem of missing data imputation in an NMA on continuous outcomes where the treatment effect is expressed as delta (i.e., mean variation) between baseline and follow-up for each treatment included in the network.In this framework, different imputation strategies may be performed depending on the missing information.Some studies may report an average effect related to baseline and follow-up assessment and omit the mean treatment effect (with corresponding variability measure) in terms of delta variation.In this case, different strategies are provided by the literature, mostly based on the imputation of the pre-post study correlation [17] that is used to obtain the delta variability measure [18].The correlation may be imputed considering other comparable studies [19] or using a conservative estimate of 0.5 [19].Among Bayesian alternatives, Abrams and colleagues suggest the exploitation of complete available evidence composed by baseline, follow-up and delta data to obtain a posterior distribution for the correlation [17].
If a study just reports the delta without related variability and mean effect at baseline and follow-up, the imputation strategy is reduced to the imputation of standard deviation (SD).Several approaches have been proposed, among them maximum SD imputation [20], arithmetic mean [21], linear regression [22], the coefficient of variation [23], bootstrap method [24,25] and multiple imputations [13,26].
The purpose of this study is to verify the robustness of a Bayesian NMA model concerning some common imputation strategies on data characterized by different levels of between-trial heterogeneity.A simulation study has been performed to assess the performance of the method in ranking treatments accurately when the network includes imputed data.Data have been simulated referring to the effects of treatments, as reported in the literature, against knee osteoarthritis.

Imputation methods
Let x ij and y ij denote the i th outcome in the j th group of the study at baseline and follow-up, respectively.Let d ij the mean difference in change from baseline to follow up i th in the group of the j th study, and σ 2 dij (ρ) its associated variance.The following approaches are evaluated in this study:

Maximum SD imputation
This method is based on a direct SD imputation from studies included in the meta-analysis [20].Although other more sophisticated methods are available [27], this approach is commonly used in practice to obtain conservative estimates [8].

Correlation approach
The variability of mean change score d ij defined V(d ij ) can be calculated as: In this setting, using complete evidence composed by Handling missing continuous outcome data in a Bayesian network meta-analysis baseline, follow-up and delta data, a meta-analysis of the Fisher transformations, S ij is considered.Therefore, data derived from J studies having complete data are used to derive the posterior distribution for correlation.
The Fisher transformation is [17] The mean of the transformation is a normally distributed random variable with a normal distribution on the mean and a Gamma distribution on the variance: The correlation is strictly related to the Fisher transformation via the δ parameter, as shown below: The use of a fully Bayesian approach for the imputation of correlation is suitable not only because it leads the researcher to include in the computation pertinent study information but also because it takes into account a source of uncertainty in parameter estimation [17].
All these factors eliminate the need for sensitivity analyses with respect to an arbitrary choice of the withinsubject correlations [17].

Simulation framework
Seven hundred full databases, including study-level baseline (x ij ), follow-up (y ij ), and delta variation (d ij ), including 50 two-arm trials of size n ij , are simulated (j=2 pairwise comparisons).Trial data are simulated considering as an outcome the delta variation in the Western Ontario and McMaster Universities index (WOMAC) [28], widely used to assess the severity of symptoms in knee osteoarthritis.Baseline data are obtained by sampling from bounded 0-100 normal distributions with mean 41.8 and SD 21.5, to mimic the support of WOMAC score [29].Delta variation data are simulated from normal distributions with parameters provided by a literature review about six non-steroidal anti-inflammatory drugs (NSAIDs) against symptoms of knee osteoarthritis, as shown in Table 1.
For each simulated trial, treatments involved have been randomly drawn from the list of treatments shown in Table 1.Follow-up variability data (standard error of the mean) are derived from the generated delta and baseline variability measures, using the inverse formula, setting hypotheses on pre-post correlation (ρ) and considering, in each scenario, a sequence of ρ values from 0.3 to 0.95 by in increments of 0.05.The sample size n i is obtained by sampling from a uniform distribution bounded 50-100, reflecting the general framework of the sample size in rheumatological trials [30].
Between-trial heterogeneity τ has been included as a variability measure by following, for each simulation setting, a wide range of heterogeneity values consisting in a sequence from 0.1 to 5 by 0.1.Each simulation scenario provides different combinations of between-trial heterogeneity and pre-post correlation, totalling 700 scenarios.
More details about the simulation plan are shown in Figure 1.

Imputed data generation process
Imputed database I.In the first case, information about delta variation is randomly removed from the full database, leaving only baseline and follow-up data.Out of 50 studies, the variability information has been randomly removed in 10 studies.Variability of mean change has been imputed using the correlation method.
Imputed database II.In this case, information at baseline and follow-up are also removed, and then delta variability is imputed with maximum SD method.

Bayesian NMA analysis
A multiple-treatment Bayesian NMA, with random effects and a uniform (0,5) prior distribution on the heterogeneity parameter [31], has been performed on each simulated dataset consisting of 50 studies, using an arm-based approach [19].An arm-based approach models the arm-level data referring to a specific treatment effect within each trial.For comparison, a contrast-based approach would be based on the parametrization of the contrast trial level summary (for example, the mean difference in treatment effect within the trial).The results provided by both approaches are very similar [32].An arm-level approach, compared to contrast-based ones, leads to the incorporation of multi-arm trials without the necessity to employ multivariate distribution to determine specific treatment effects [32].
An MCMC approach was used to sample from the posterior distribution estimated by the NMA model.The sampling processes involved 200,000 iterations, including a burn in of 20,000 steps, from 4 chains.The results are compared in term of ranking probability [33].
For each scenario, the bias and the standard deviation of the first rank probability estimates, between full and imputed databases, have been computed to investigate the robustness of NMA results with respect to the considered imputation methods.

RESULTS
The network plot (Figure 2) is a graphical representation of the overall evidence leading to a concise description of its characteristics.Each node of the graph represents the interventions, and edges represent the available direct evidence between treatments.The size of the node is proportional to the sample size in each treatment, and line thickness is proportional to the number of pairwise comparisons between treatments.The network structure is the same in every scenario.Observing the structure of the network, it is possible to see that no indirect evidence is included in the meta-analysis.In fact, all comparisons report direct evidence in the network structure shown in Figure 2.
Results are synthesized in term of first rank probabilities for each treatment reported in Table 2.The simulation results are coherent for each scenario for all the imputation methods.The first rank probabilities are higher for etoricoxib followed by diclofenac and rofecoxib and they are always equal to zero for licofelone, naproxen, and placebo.The ranking provided by the models is robust  1, the sample size, instead are drawn from a uniform 0-100 distribution.The delta data are sampled from information provided in Table 1, and the baseline data are drawn from a 0-100 Normal distribution having mean 41.8 and SD 21.5.The imputed databases (I) have been obtained randomly removing, from the full database, information about delta data.The Imputed databases (II) have been obtained randomly removing baseline and follow up information.Handling missing continuous outcome data in a Bayesian network meta-analysis with respect to different imputation methods in several heterogeneity/correlation settings.The robustness of the final inference to missing data imputation has been evaluated considering bias and standard deviation within scenario (Table 3).The results are similar across different correlation settings.Smallest values of bias and SD are obtained from smaller values of heterogeneity.The bias increases as heterogeneity increases.This pattern is evident only in the presence of the lowest and highest values of heterogeneity, as in Figure 3. Additionally, it may be observed that the lowest bias and SDs are obtained, not only in the case of the lowest heterogeneity setting but also for heterogeneity values of 2.5 to 3.5, near the expectation of the uniform prior distribution chosen on heterogeneity parameters (Figure 3).

DISCUSSION
Imputation methods are generally viewed as the preferred analytical approach to the missing data problem in NMA [11,34].When no particular imputation method is preferable to others, a sensitivity analysis of NMA results concerning different imputation strategies is advocated [35].It should be noted that few NMA studies report the   Handling missing continuous outcome data in a Bayesian network meta-analysis results of sensitivity analysis after imputation [21,36,37], especially for variance imputation on continuous outcomes.Some authors have investigated the performance of the missing variance imputation strategies on continuous data considering a pairwise meta-analysis setting [38], showing the substantial robustness of results in several scenarios, especially when a random-effect meta-analysis has been performed.This study, instead, verifies the robustness of a Bayesian NMA model concerning different imputation strategies through simulations and reaches similar conclusions which are known to hold for pairwise meta-analysis in other research settings [39].Our findings show that the first rank probability bias is small over a broad range of imputation scenarios, showing the substantial robustness of final inference.
The results of our study are in line with previous findings that suggest, in the presence of missing variance data, we should consider an imputation method rather than exclude the study from the analysis [40].In fact, we have to consider that it is difficult to suppose that the missingness in variance study-level data may be caused by MAR mechanism [14,15].
Another noteworthy aspect of this paper is the sensitivity of NMA after imputation in different heterogeneity scenarios across studies.Heterogeneity is a crucial element in a meta-analysis.We have to take into account that heterogeneity is a source of variability, related to differences across the trial populations and considered interventions, which can occur in many studies that are candidates for inclusion in a meta-analysis [41].In addition to missing data, heterogeneity between trials is a component that may affect final inference.Heterogeneity between trials indicates a source of variability in treatment effects that, if misspecified (i.e.poorly identified by the prior distribution) in an NMA model, results in a limitation in the external validity of the results [33], downing the possibilities of generalizing inferential results to other situations.
This additional source of variability may occur when studies differing in trial characteristics, outcome definition, inclusion criteria or follow-up duration are included in the same meta-analysis [42].Opportune statistical methods are available to handle this additional source of variability, for example, a random effect NMA may be implemented in a Frequentist or a Bayesian setting [43].In practical approaches, the Bayesian methods for mixed-treatmentcomparison meta-analyses are widely used compared to frequentist NMA [43].
The literature on the relationship between continuous missing data imputation methods and heterogeneity focuses on pairwise meta-analysis [34].Specifically, scholars suggest that random-effect meta-analysis, even in cases of missing data, might be replaced with fixedeffect meta-regression to handle a source of heterogeneity that is not influenced by the known source of variability across studies.Instead, in the setting of a Bayesian NMA, different authors suggest some good practices to deal with heterogeneity, especially to define priors on heterogeneity across studies [31,44].In this work, in line with the literature, we confirm that this source of variability is an important issue to consider.In fact, the correct specification of the prior defined on the heterogeneity leads to more robust results to imputation, especially if the prior on heterogeneity value has an expectation near the true value of the variability across studies.

Study limitations
In further research, it may be useful to assess the performance of the NMA with respect to missing imputation, considering a smaller number of trials in the network, and including a different proportion of direct and indirect comparison.

CONCLUSIONS
Inference provided by NMA models is robust with respect to different imputation strategies or considering different combinations of heterogeneity and correlation when the proportion of missing data is 20%.The Bayesian NMA leads to a better performance also when data are imputed, especially when the extent of heterogeneity across studies is limited.The definition of the prior on The Bias has been computed as the mean of the differences between first rank probability estimate on the imputed database and the same estimate obtained on the full database.The Standard deviation has been computed on the first rank estimates computed by NMA in each scenario.

FIGURE 1 .
FIGURE 1. Simulation plan: the treatments are sampled from the drugs reported in Table1, the sample size, instead are drawn from a uniform 0-100 distribution.The delta data are sampled from information provided in Table1, and the baseline data are drawn from a 0-100 Normal distribution having mean 41.8 and SD 21.5.The imputed databases (I) have been obtained randomly removing, from the full database, information about delta data.The Imputed databases (II) have been obtained randomly removing baseline and follow up information.

FIGURE 2 .
FIGURE 2. Network plot of the simulated evidence.Each node represents a treatment included in the network.The line width is proportional to the number of studies reporting the specific pairwise comparison.

FIGURE 3 .
FIGURE 3. Bias and Standard deviation within scenarios for different correlation settings.In Bayesian NMA prior to heterogeneity τ, has been chosen as a Uniform (0,5) distribution.

TABLE 1 .
Delta Mean and Standard deviation of NSAIDs treatment derived by the literature.The parameters are used to generate data for delta variation treatment effect.

TABLE 2 .
First Rank probabilities for each treatment in each scenario.
A simulation scenario consists of different combination of heterogeneity () and correlation (ρ = 0.3, 0.5, 0.7, 0.9).First Rank probability estimates are reported for an NMA computed on Full Databases and imputed databases (Maximum SD and Abrams method).

TABLE 2 (
CONTINUED).First Rank probabilities for each treatment in each scenario.

TABLE 3 .
Mean of Bias (Absolute Values) and StandardDeviation (*100) between treatments in each simulation scenario.