Integrating clinicians’ opinion in the Bayesian meta-analysis of observational studies: the case of risk factors for falls in community-dwelling older people

ClINICIANS


IntroductIon
Meta-analysis is a widely used method to synthesize evidence from multiple studies.The Bayesian approach provides a natural structure for many subtle issues that arise in metaanalyses, in particular for the use of available information through prior distributions [1] and the additional flexibility that derives from the adoption of Markov chain Monte Carlo methods.e 8 9 0 9 -1 ClINICIANS' OPINION IN THE BAyESIAN mETA-ANAlySIS However, prior distributions entertained in meta-analysis are generally non-informative and, even when informative priors are used, they frequently derive from the literature (evidence based priors) [2].Conversely, the use of informative priors derived from expert opinion is more common in other study designs (i.e.clinical trials) [3][4][5] or in non-biomedical contexts (e.g.chemistry, veterinary, nuclear engineering) [5][6][7].
The scope of this work is to investigate the feasibility and the effect of enhancing observational findings using expert opinions from clinicians, through the development of an ad hoc questionnaire including a simple training section, and of a method to transform the elicited opinions into hyperparameters of prior distributions.Using the findings from a previously conducted meta-analysis on risk factors for falls [8] we compared the results using a) a frequentist approach, b) a non-informative Bayesian approach and c) an informative Bayesian approach with various priors.

Bayesian approach to meta-analysis
A meta-analysis can be considered as a two level hierarchical model which relates the observed measure Y k to the underlying effect in the kth study θ k [9].At the second level of the model, the θ k s are related to the overall effect μ in the population from which all the studies are assumed to be sampled whereas σ 2 is the between-study variance.Y k and θ k are assumed to follow a normal distribution: From a Bayesian perspective, s 2 k , μ and σ 2 are hyperparameters for which a prior distribution must be specified, as discussed later.For μ, a fully Bayesian approach was implemented, eliciting prior belief from experts.

Participants
A convenience sample of geriatricians and general practitioners (GPs) in Lombardy Region in Italy and Ticino Canton in Switzerland was contacted.Experts were recruited in the network of external collaborators of the Mario Negri Institute of Pharmacological Research.The search resulted in a list of 20 candidates, provided that not more than 2 were from the same group or institution.The following characteristics of the study participants were collected: age, sex, specialty, graduation year, year in practice treating older people, type of practice, history of statistical training, frequency of reading of fall risk factors papers, frequency of attendance to conferences about falls in older people.

Belief elicitation procedure
A questionnaire to elicit a relative risk (RR) was created, after a review of the methods used in the literature [10][11].
After the finalization of the tool, a standardized script was sent by e-mail to the participants (see Supplementary Materials 1).Participants were asked to specify, for a couple of communitydwelling older persons of the same sex and age but different with respect to the fall risk factor investigated, which one (i.e.affected or not by the risk factor) was more prone to fall in the subsequent year and the amount of risk increase.They were asked to indicate their response placing an 'X' on a line with risk increase ranging from 0.0 to 5.0.They were also told that 1.0 meant no difference, and values below 1.0 meant lower risk.Afterwards, they were asked to express the uncertainty around their estimate by placing an 'X' at the lower and upper limits of their belief.
The elicitation procedure for the first risk factor was supplied with instructions and examples, whilst for the subsequent risk factors the experts were asked to replicate the same procedure.At the end of the questionnaire, participants were told to compare the estimates and revise or correct them if needed.The questionnaire included five risk factors in the following order: benzodiazepines use, female sex, history of falls, urinary incontinence, antiepileptics use.For subjects aged 75 years and for more than 80 years old subjects the RRs were elicited separately.

data from previous studies
The bases for our research were the papers included in a previously published e 8 9 0 9 -2 ClINICIANS' OPINION IN THE BAyESIAN mETA-ANAlySIS meta-analysis [8].Briefly, from the results of a Medline and Embase search, we had selected original studies on risk factors for falls with the following inclusion criteria: • At least 80% of the sample aged 65 years or older Additionally, the reference lists of the previous reviews had been searched to identify studies that met the inclusion criteria.The outcome could be defined as: • any faller -subjects who fell at least once during the follow up • recurrent fallers -subject who fell at least twice during the follow up (within six months, or within one year, depending on the information given in the study) From the 74 studies included in the previous meta-analysis, 31 articles  reporting odds ratios or RRs for the five selected risk factors and considering the outcome "any faller" were used in the current analysis.They were grouped according to the median or mean age of the sample (<80 vs. >80 years old); studies where information on mean or median age was not available were excluded.

statistical analysis
Descriptive statistics was used to summarize the participants' characteristics and the elicited beliefs.
The experts were classified into 5 categories according to their characteristics: • geriatricians Only the first two categories were mutually exclusive.
The frequentist pooled RRs were estimated using random effect models [43].The statistical package used was RevMan, version 4.3.2 for Windows by the Cochrane Collaboration (http://ims.cochrane.org).
The Bayesian estimates of pooled RRs were obtained for risk factors for which more than 4 studies were available using different prior distributions for the hyperparameters.Our observations are log(RR)'s from the studies, whereas the posterior mean of the overall effect μ is used to obtain the log(RR) of the population.We assumed s 2 k known and replaced by the observed within-study variances.The betweenstudy precision τ (equal to 1/σ 2 ) was modeled using a vague prior testing three different gamma, i.e. (0.001, 0.001) (0.01, 0.01) (0.1, 0.1), or a uniform (0, 1) distribution, to reflect the high heterogeneity noticed in the classical random effects analysis.We used the vaguest prior for which convergence was obtained in a reasonable time, in order to balance computational efficiency and prior vagueness.
For μ we tested different priors: • a vague non-informative prior, using a normal distribution centered at 0, with a large variance (10 000).• two different clinical priors (i.e. one including all the elicited beliefs, the other including only the geriatricians' opinions), using a normal distribution with mean equal to the mean of the experts' point estimates, and a variance corresponding to the estimates' variance, following the method of moments proposed by Gaioni, Dey and Ruggeri [44].
The choice of a normal distribution, albeit very practical, was not taking in account the information provided on upper and lower limits and the consequent asymmetry; therefore we also considered a skew normal distribution [45].
The skew normal density function is: where ξ, ω, and α are the location, scale and shape parameters respectively, φ is the normal density function and Φ is the normal cumulative distribution function.• an enthusiastic prior, as the highest RR elicited from a single expert, using a skew normal distribution representing both the point estimate and the upper and lower limits • a skeptical prior, as the lowest RR elicited from a single expert, also using a skew normal distribution.
WinBUGS version 1.4 (Medical Research Council Biostatistics Unit, Cambridge, England) (http://www.mrc-bsu.cam.ac.uk/ bugs/winbugs/contents.shtml)and OpenBUGS version 3.1.2(http://www.openbugs.net)were used to perform the Bayesian analyses with 2 separate chains with 20 000 Markov chain Monte Carlo iterations completed, after a 5 000 iteration 'burn in' period, for each pooled RR computed.We carried out graphical diagnostic for the convergence of the chains, as available in WinBUGS and OpenBUGS [46].
The reporting of the analysis and results is consistent with the ROBUST criteria [47].

results
Twenty physicians were asked to participate and 15 compiled the questionnaire.One subject gave an incomplete form, even after a telephone training session, and he was excluded from the analysis.The characteristics of the 14 experts included are reported in Table 1.There were 6 GPs and 8 geriatricians.The mean age of the sample was 44 years, and 50% of subjects included had more than 20 years experience after graduation.More than 60% of the sample had some statistical training.Almost all geriatricians read papers and attended conferences on falls at least sometimes, while GPs tended to be less informed about fall risk factors.We included 8 subjects in the category 'researcher', 10 in the category 'informed' and 9 in the category 'experienced'.
All the experts provided a point estimate and upper and lower limits for the five risk factors considered, separately for two age groups, except for one geriatrician who did not report estimates for antiepileptics in 75 years old subjects.For three risk factors (benzodiazepines, incontinence, antiepileptics) for 80 years old subjects there were less than 4 studies available in the previous meta-analysis, so we excluded them from this study.The beliefs provided by the experts and the RR estimates from published studies for the most significant factors are shown in Figure 1 and in Supplementary Materials 2. In most cases, the variability between published studies was greater than (e.g.history of falls, 75 years old) or similar (e.g.history of falls, 80 years old) to the variability between experts.In both published studies and experts' beliefs some outliers were found (e.g.GP nr 2).The range of admissible values and the estimate provided by experts were sometimes very skewed (e.g.geriatrician nr 5 for incontinence).Confidence intervals (CIs) reported in published papers can be wider (e.g.history of falls, 75 and 80 years old) or narrower (e.g.incontinence) than the upper and lower limits reported by the experts.Figure 2 shows the distribution of RR estimates, according to experts' classification.For most risk factors, geriatricians provided more consistent estimates (e.g.female sex, 75 and 80 years old) with less outliers.Overall, the 'informed' subgroup did not provide estimates that were more consistent with published papers.The mean value of the central estimates was higher in informed, experienced and researchers than in geriatricians for almost all the risk factors considered.
The frequentist and Bayesian pooled RRs computed with non-informative and clinical priors are presented in Table 2 and Table 3. Enthusiastic and skeptical priors gave results that were too much driven by the prior distribution (data not shown).The application of a non-informative prior provided results that were very similar to the frequentist estimates (the gap is between -10% and +10% in all cases) with sometimes wider and sometimes narrower CIs; the CI for risk factors where the number of published papers was higher tended to be wider for the Bayesian estimate, like female sex.The application of an informative prior affected both the RR estimate and the CI.With the method of moments, results did not differ substantially from the frequentist ones.When we consider the prior provided by geriatricians only (with smaller variance), for risk factors with a small number of studies the CI was narrower: for example,  With the prior elicited from all the experts, in three cases there was a difference greater than ±20%.When we consider only the geriatricians' belief, the difference is smaller and in three cases we have a CI reduction >20%, that in two cases is accompanied by

dIscussIon
Our belief elicitation method presents an example of how expert opinion can be incorporated in research.We showed that this method of eliciting beliefs from clinicians is feasible and can be easily translated into a prior suitable for introduction in a random effects Bayesian meta-analysis.
This belief elicitation strategy included methods aimed at improving the validity and reliability of the elicited belief, as exemplified in the questionnaire proposed by Johnson et al. [11].First, the questionnaire is easy to use with simple instructions, clear questions and response options, reducing the potential for invalid responses based on insufficient understanding of the task.The only physician who returned an inconsistent questionnaire differed from the other experts since he practiced in an isolated context.Second, we provided a guided elicitation procedure for the first risk factor, and training and provision of examples have been shown to improve reliability [48].Third, we explicitly asked the participants to review their answers and verify whether these accurately reflected their beliefs, and to revise them at the end, if needed.The questionnaire was sent by e-mail with clear instructions and a telephone number to call if clarification was needed.Provision of feedback, verification, and opportunity for revision have also been shown to improve validity and reliability [7] although we did not evaluate them directly [11].There is no general agreement about the number of participants required for a belief-elicitation study [3,4,11].A systematic review [11] of elicitation methods found the median sample size of participants in belief-elicitation studies to be 11 (range: 1-298).Even if groups of experts are thought to perform better than a solitary expert [49,50] it has been argued that experts are correlated sources of information because of their common education and exposure to similar literature [51], and even that the inclusion of an expert with beliefs identical to one already enrolled does not add to the range of beliefs collected ClINICIANS' OPINION IN THE BAyESIAN mETA-ANAlySIS in the study [52].Some evidence suggests that the inclusion of participants with greater degree of competence improves the validity and reliability of the elicited response [50].
As experts we chose geriatricians and GPs, because geriatricians have specific competence on problems of older people, whereas GPs may have developed special competence on this topic because they directly take care of community-dwelling older people.We found that geriatricians gave consistent estimates with few outliers; they were also less overconfident than GPs, as shown by the narrower range of admissible values provided by GPs.This finding did not depend from the information about falls obtained reading papers or attending conferences.Sampling experts from the population is a critical aspect, which can be addressed by proper stratification techniques and a larger sample size, whereas the correct method of sampling experts is also of concern.Although this was a convenient and not a random sample, with a limited sample size, we tried to represent different clinical settings choosing experts representing a relatively broad range of opinions of the target population, avoiding the inclusion of more than two experts from a single group or clinical unit.
We decided to elicit a RR instead of frequency of falls in the two groups (i.e.exposed and not exposed) because this method allowed the experts to provide a measure of the estimate precision.This option was also followed by other authors [11,53].However, as stated by Johnson et al. [11], there remains some ambiguity regarding the meaning of the range of probability.We had asked participants to indicate boundaries for which there is a very little probability that the true estimate could be greater or lesser.We do not know if this was interpreted by participants as plausible estimates, extreme values, or 95% credible intervals.We used this information in the skew normal model to determine its parameters, using the means of the upper and lower limits as quantiles of order 0.005 and 0.995 respectively.This model gave a better description of the experts' belief because it utilized the whole information provided.The method of moments, conversely, focused on the consistency across experts, neglecting the individual expert's uncertainty.Johnson et al. [54] recently used experts' beliefs on a survival probability by means of a multivariate Bayesian model.To our knowledge, however, this is the very first method proposed for the translation of experts' opinion in a prior for a meta-analysis.Sutton and Abrams [9] in their discussion of the application of Bayesian methods in meta-analysis stated that "to-date, many of the applications of Bayesian methods in meta-analysis have been to mirror classical random effects models".The inclusion of experts' opinions in the prior definition is a further step towards the full application of a Bayesian view in the meta-analytic context.Even if it has been already done in other study settings, like for instance clinical trials, this is a completely new approach with regards to meta-analyses of observational studies, where non-informative priors have been usually used.
We decided to elicit information from experts directly, using questionnaires, and not to derive it from the literature, because the level of detail we needed (i.e. the interval of values of the relative risk that experts considered plausible) is seldom available in published reviews not based on meta-analysis.Moreover, in terms of epidemiological studies, the published literature was already included in the likelihood.
The use of a non-informative prior, corresponding to lack of external information, had no notable impact on the RR estimates as compared to the frequentist approach, as expected.The introduction of a subjective informative prior resulted in a marked difference in the RR estimates only when the skew normal distribution was applied.The geriatricians' opinions were more consistent and resulted in a more precise prior distribution for the method of moments.Both methods resulted in credibility interval shrinkage when the number of available studies was low.For antiepileptics, for example, only 4 studies were available from the literature and the RRs were heterogeneous, from 0.95 [32] to 6.20 [18].The geriatricians' belief, conversely, varied from 1.25 to 3.25.The frequentist pooled RR was 1.88 (1.02-3.49).When the lack of external information was expressed with a non-informative prior, the credibility interval included 1.00.The addition  ).Also when a skeptical prior was used, the RR estimate was lower, but still significant: 1.30 (1.05-1.66)(data not shown).It can be objected that the experts had previous information from the same papers included in the meta-analysis: this would result in a double counting of the same data, biasing the results toward whatever the data show and creating overconfidence [55].In the case of antiepileptics, the data from the literature were too conflicting to create a background shared opinion about the strength of the association, and the 'informed' experts (those reading papers or attending conferences on falls) were not closer to the frequentist pooled RR than the other subgroups.This example shows how this method could be applied in decision-making contexts, when a decision must be made in a brief time, even when evidence is scarce and contradictory (e.g. in health technology assessment).One of our aims was to compare the results of our approach to the frequentist one.In order to be able to perform a meaningful frequentist meta-analysis, we selected risk factors for which more than 4 studies were available.However, our method is even more useful when relevant and valid external frequency data are unavailable [56].In this case, every source of information should be taken into account and a Bayesian meta-analysis can provide a framework for information synthesis.However, even when considerable literature data are available, skepticism derived from clinical judgment can be a reasonable position that should be formally examined [57].Frequentist and non-informative Bayesian approaches tend to be closer, but this closeness will harm inference when these observations are biased and better external information is available [57].Our analysis is enhanced presenting results from different priors to reflect different opinions about the parameter and performing a sensitivity analysis by seeing how results vary as the prior is varied; in this process, data representations can help one judge the priors that are credible enough to examine [55].
In conclusion, this is, to our knowledge, the first example of the use of a clinical informative prior based on experts' opinions assessed through a questionnaire for a random effect meta-analysis of observational studies.Our study should be viewed as an exploratory analysis on the potential of this approach, also considering the inclusion of a small convenience sample of experts.Nonetheless our results show that the elicitation method is feasible, and the experts' opinion can be successfully translated into a probability distribution that can be introduced in a Bayesian random effects model.The addition of external information to a frequentist meta-analysis showed to be useful in case of few and/or heterogeneous studies from the literature.
Public Health -2014, Volume 11, Number 1 ClINICIANS' OPINION IN THE BAyESIAN mETA-ANAlySIS IN THE BAyESIAN mETA-ANAlySIS for benzodiazepines (8 studies) the RR for the frequentist model was 1.36 (1.10-1.70)and the RR with geriatricians' prior was 1.34 (1.21-1.50),for antiepileptics (4 studies) the RR for the frequentist model was 1.88 (1.02-3.49)and the RR with geriatricians' prior was 1.94 (1.18-3.11).For other risk factors no appreciable change in the CI was noticed.
fiGurE 2 disTriBuTion of rELATivE risk EsTimATEs, ACCordinG To puBLishEd sTudiEs And ExpErTs' CLAssifiCATion, for fEmALE sEx (75 And 80 yEArs oLd), AnTiEpiLEpTiCs (75 yEArs oLd) fiGurE 2 (ConTinuEd) disTriBuTion of rELATivE risk EsTimATEs, ACCordinG To puBLishEd sTudiEs And ExpErTs' CLAssifiCATion, for fEmALE sEx (75 And 80 yEArs oLd), AnTiEpiLEpTiCs (75 yEArs oLd) aCknowledgements: we wish to thank the participating physicians, as well as the statisticians who helped in the quest for an OpenBUGS code for the skew normal distribution, in particular Dana Kelly.We would like to thank Ersilia Lucenteforte for her support and help.We also wish to thank Ivana Garimoldi for her editorial assistance.Part of this work was presented with the title: "UsingBayesian informative priors for decision making: the case of the risk/benefit assessment of benzodiazepines and antiepileptics prescription in older people" during the Second Symposium on games and decisions in reliability and risk, Belgirate (VB), 19-21 May 2011 Funding: this study was partially supported by the Directorate General for Health and Consumers (DGSANCO) of the European Union 'Strategies and best practices for the reduction of Injuries' (APOLLO) program [Grant Agreement 2004119].e 8 9 0 9 -1 2