Longitudinal Joint Modelling of Binary and Continuous Outcomes : A Comparison of Bridge and Normal Distributions

Background: Longitudinal joint models consider the variation caused by repeated measurements over time as well as the association between the response variables. In the case of combining binary and continuous response variables using generalized linear mixed models, integrating over a normally distributed random intercept in the binary logistic regression sub-model does not yield a closed form. In this paper, we assessed the impact of assuming a Bridge distribution for the random intercept in the binary logistic regression submodel and compared the results to that of a normal distribution. Method: The response variables are combined through correlated random intercepts. The random intercept in the continuous outcome submodel follows a normal distribution. The random intercept in the binary outcome submodel follows a normal or Bridge distribution. The estimations were carried out using a likelihood-based approach in direct and conditional joint modelling approaches. To illustrate the performance of the models, a simulation study was conducted. Results: Based on the simulation results and regardless of the joint modelling approach, the models with a Bridge distribution for the random intercept of the binary outcome resulted in a slightly more accurate estimation and better performance. Conclusion: Our study revealed that even if the random intercept of binary logistic regression is normally distributed, assuming a Bridge distribution in the model leads to in more accurate results.


INTRODUCTION
Multivariate response variables are widely recorded longitudinally in many medical areas.Longitudinal joint models assess the effect of covariates on two or more correlated responses while it considers the association between the various response variables as well.Generalized linear mixed-effects models (GLMM) are probably the most widely used methods for analyzing longitudinal data.These models are composed of generalized linear models (GLM) and mixed effect regression models (MRM).A variety of responses (such as continuous, binary, count and etc.) can be analyzed using the GLM model.The variation caused by repeated measurements is taken into account by the MRM component [1].
Joint modelling response variables is more complex compared to that of univariate.The selection of joint modellingapproach depends on the nature of the outcomes.The most common responses in many medical studies are continuous and binary.To jointly model such responses, there are two main approaches.The first one, which was proposed by Tate [2] and utilized by many others [3][4][5][6][7], is based on the product of marginal model for one of the response variables and a conditional model for the other outcome (conditioned on the former outcome) The second approach builds a joint model for the two response variables directly [8,9].Hereafter, the first and second approaches are expressed as conditional and direct approaches in this article.
Catalano [10] and Fitzmaurice [11] used the marginal generalized linear model to combine continuous and discrete responses.A covariance pattern model with a special correlation coefficient for each outcome was used to allow the variation caused by the repeated measurements over time.The model was then extended by Catalano and Ryan.Regan and Catalano [12] proposed a widespread survey of longitudinal joint modelling of continuous and discrete outcomes such as bivariate GLMMs [13].The random effects in GLMMs usually follow normal distributions with a zero mean.In joint modelling of longitudinal responses, the random effects of several sub-models follow a multivariate normal distribution with a variance-covariance matrix which accounts for the between-response associations.
Wang and Louis showed that, assuming a Bridge distribution for the random intercept in a logistic mixed model forces the fixed effects to have the same odds ratio interpretation in marginal (i.e., integrated over the random intercepts) and conditional forms (conditional on the random intercepts) [14].However, assuming distributions other than normal for random trends can result in complexities complexities [14].This idea was then applied by Lin et al. for evaluating the association between binary and continuous clustered data [15].
The current study does not compare the direct and conditional approaches, but aims to find if considering a Bridge distribution for the random intercept of binary outcome can benefit the performance of the direct and conditional joint modelling approaches.We restrict our study to random intercepts models only.A simulation study was conducted to assess the accuracy of the estimations.The models were also applied to a real dataset from a clinical trial investigating the effect of coriander fruit syrup on the duration (as continuous response) and the severity (as binary response) of migraine attacks.
Migraine are described as a chronic and debilitating neurological disorder.They result in adverse consequences for patients and society and causes lots of adverse consequences for the patients and society [16].The World Health Organization recommends the use of traditional medicine in unresolved diseases such as migraines [17].Coriander fruit is a commonly used in alternative medical treatments.It is believed to heal headaches, anxiety and depression and to potentially affect the frequency, duration and severity of migraine attacks [18,19].This fruit is one of the most commonly prescribed herbs in Persian medicine [20].According to the strong association between the characteristics of migraine attacks such as severity and duration [18], the longitudinal joint models were fitted.

Let
and represent the continuous and binary responses respectively, for a subject i at the occasion j.The binary response can take the values 0 and 1 while the continuous can take all the values between -∞ and +∞.The two associated response variables follow a general form as follows: ( In this formula, is the expected outcome, "h" is a proper link function according to the type of response variable (e.g.identity for continuous response) and the expected response is assumed to differ from the systematic component ( ) by a subject-specific effect ( ).

Associations and distributions
Bridge and normal distributions are assumed for the random intercept in the binary outcome sub-model with a logit link function.A normally distributed random intercept for the continuous outcome sub-model is postulated.The random intercepts follow a bivariate normal distribution while a copula approach is used in the case of different distributions.A correlation parameter ρ takes Longitudinal Joint Modelling the association between the random intercepts and hence the response variables into account.The bivariate responses, a continuous and a binary, are assumed to follow normal and binomial distributions respectively.The continuous response variable is linked to a linear function of covariates and a normally distributed random intercept (mean zero and variance ) via an identity function.The binary response is also linked to the covariates through a logit link, assuming a Bridge or a normal distribution for the random intercept (mean zero and variance ).It was mentioned before that the Bridge distribution proposed by Wang and Louis [14] allows both the conditional and marginal probabilities of the binary response to follow a logistic structure.

Model specification and Likelihood
The two associated response variables can be predicted by different covariates (not necessarily the same covariates).To combine the associated response variables, direct and conditional approaches can be applied based on the nature of the association between the responses.The conditional approach is appropriate when the specification of a joint distribution can be factorized by a product of a marginal and a conditional density.This approach reduces the modelling tasks of separate specification of models.The conditional approach requires a reliable type of association between the response variables such that one of the variables plays the role of a time-varying covariate for the other one.In addition to some complexities for marginalizing one response by integrating over the conditional density, problems such as the asymmetric behavior of the responses lead to more difficulties in modelling [9].
The model and the likelihood function for the direct approach can be specified as follow: ( (3) The model and the likelihood function for the conditional approach can be written as follow: (4) (5) Regarding that and are continuous and binary variables respectively, and are forms of normal and binomial density functions.In the current study, two different assumptions about the distribution of the binary outcome random intercepts has been compared.Thus, is a multivariate normal or a normal copula function of the random intercepts.In other words, in the case of different distributional assumptions for the random intercepts, a normal copula distribution was used to combine the random intercepts.
In the conditional approach, in addition to the correlation of response variables for the same subject, the association at the same time was induced.In other words, a dependence parameter ( ) performs the association at the same time point by conditioning one response on the residual of the latter (equations 4, 5) [15].The maximum likelihood estimation was carried out by taking the expectation with respect to the joint density of the random intercepts.Non-adaptive Gaussian quadrature techniques were utilized to perform the integrals and Newton-Raphson technique was implemented for the optimization.

Bridge distribution
Let G(.) be an inverse link function with the characteristics of being monotone, increasing and twice differentiable and also let be the Bridge distribution for the subject specific random effect u.This distribution carries the feature that the marginal and conditional link functions have the same form as (6) where and c are the attenuation and unknown constant parameters, respectively.(6) After differentiating and applying the Fourier transformation of (6), one can determine the density function of Bridge distribution as (7) where F is the Fourier transformation as (8) and .
The density and cumulative distribution function of the random effect (u) can be derived as in ( 9) and ( 10) respectively.Finally, after integrating the conditional binary logistic model based on the random effect, one can observe that the logit interpretation can be satisfied carrying an additional parameter ( ).The mean and variance of the Bridge distribution are zero and , respectively.The intraclass correlation (ICC) can be determined by 1-.
(9) (10) The variance matrix of two response variables for the ith subject at the jth occasion can be derived using following matrixes: (11) where is the diagonal overdispersion matrix, is the diagonal variance matrix of response variables assuming zero random effects, and (here an identity matrix) is the matrix denoting the correlation between residual errors.Moreover, let . ( , here = 0. To compare the proposed models, we used the Akaike Information criterion (AIC).

Computational Support
The R software version 3.3.1 packages such as "copula", "bridgedist", and "MASS" as well as the SAS program version 9.2 "nlmixed" procedure were utilized to simulate and assess the data preparation and to the proposed joint models.The SAS codes are available in the Appendix.

Simulation study
A simulation study was conducted to assess the impact of a Bridge random intercept on the estimations.To do this, following settings were considered.At each step, a continuous variable (time) in 10 different occasions and a binary variable (group) were generated from uniform and binomial distributions respectively.The continuous response variable was generated from a normal distribution with the mean equal to its systematic component.The binary response was generated from a binomial distribution using the probabilities associated with logit link function.The two random intercepts were generated from a bivariate normal distribution.The correlations between the random intercepts were chosen as zero, 0.4 and 0.8.The model was fitted on three different sample sizes 50, 200 and 500.Two different approaches of joint modeling (direct and conditional) were utilized.The 18 scenarios were modeled using two different assumptions for the distribution of the binary logistic regression random intercept (normal, Bridge).The models were specified as follow: The direct approach: The conditional approach: The true values:

Migraine Data
We analyzed data of a prospective, two-arm, randomized, triple-blind, placebo-controlled trial in the neurology clinic of Shohadaye-Tajrish hospital, Tehran, Iran [18].The patients were randomly divided into two equal groups, a control group and a group that received the treatment.In addition to 500 mg of sodium valproate per day, the patients received either 15 mL of coriander fruit syrup or 15 mL of placebo syrup, three times a day, for a month.This distribution was organized according to the code provided by the department of traditional pharmacy in the Tehran University of Medical Sciences, Tehran, Iran.The subjects were followed at weeks 1, 2, 3 and 4. The mean severity of pain was evaluated, by a ten-point visual analog scale (VAS).Moreover, the patients were requested to write down the duration (hour) of their migraine attacks.At the end of each week, patients were referred to the neurology clinic to report the requested items.Severity was categorized into two levels (0-0.40 and 0.41-1) as the binary response [21].Moreover, the duration of migraine attacks was assumed as the continuous response.

A Real Data Example (Migraine Data)
Descriptive statistics of continuous and categorical characteristics of the two groups are described in details elsewhere [18].Table 1 exposes the distribution of migraine attacks severity and duration during the 4 weeks in the intervention and control groups.The two response variables were strongly associated within different points of time.

Longitudinal Joint Modelling
The results of the models are shown in Table 2.Although the results were almost the same for the four performed models, the lowest AIC was seen in models with the assumption of a Bridge distribution for the random intercept of severity in both the direct and conditional approaches.Significant variances of the random intercepts for the duration and the severity of migraine attacks showed a high level of heterogeneity among patients at the baseline.The duration and severity of migraine attacks decreased significantly during the intervention over the time.According to the results from the direct approach with the assumption of a Bridge distribution, the intervention of coriander fruit syrup decreased the duration of migraine  attacks.The slope of decrease in migraine attack duration was 2.92 more than in the control group.In contrast to the baseline, the duration of the attacks reduces significantly for the intervention group over the time.One week longer intervention of coriander fruit syrup was associated with a 0.93% reduction of severe migraine attacks as compared to the placebo (OR=exp (-2.63) =0.07).Regarding the application of a Bridge distribution for the random intercept of the binary outcome, the same interpretation of the odds ratios is possible for both of population average and subject-specific frameworks.As well as the direct approach, almost the same results was found the conditional joint modelling approach.

Simulation results
Tables 3 to 5 show the simulation results.Using the absolute value of biases (AVB=|E ( )|), one can find that the estimated values are almost close to the true values.Comparing the mean AICs as well as the AVB in both of the direct and conditional approaches, the models with a Bridge distribution for the random intercept of the binary outcome resulted in better performances.The larger the sample size, the better the estimations.Based on the lowest AIC, this simulation study showed that regardless of the amount of association between the random intercepts as well as the sample size, assuming a Bridge distribution benefits the models and makes the same population average and subject-specific interpretations possible in terms of odds ratios.Longitudinal Joint Modelling researchers have well discussed the conditional models as a major approach toward joint methods.The second approach combines the responses directly and was extended by Catalano [10] and Molenberghs et al. [23].In addition to GLMMs, the probit-normal approach and Placket-Dale model have been proposed in the literature as well.The extension of Placket-Dale model to other mixed responses is straightforward and it is well described by Faes et al. [22].In contrast to the direct model, the conditional approach adds a parameter to the likelihood function, assessing the direct association between the binary and continuous outcomes at the same time.However, using either of the joint modelling approaches needs an almost full understanding of the association between the response variables.
The generalizability of GLMM makes the extensions possible to other settings of combined discrete and continuous outcomes.According to the special characteristics of GLMMs, more complex models have been presented for dealing with special aspects of problems.For example, a logit link function is frequently used in binary logistic regression according to its ease of interpretation.However, integrating over a normally distributed random intercept does not result in a closed form [14,15].To make similar subject specific and population average interpretations in terms of odds ratios, Bridge distribution was introduced by Wang and Louis [14].
Previous studies have shown that regression effects in the random intercept logistic models are estimated almost the same for different distributional assumptions for the random effect [24,25].Our simulation study assessed the impact of assuming a Bridge distribution in direct and conditional joint modelling approaches on the performance of the models and the accuracy of estimations.Correlated random intercepts were used to combine the response variables.The random intercepts were generated from a bivariate normal distribution.In the models, we assumed that the random intercept of the binary logistic regression follows a Bridge and a normal distribution.Based on the results of the simulation study, it was shown that the models in which a Bridge distribution is considered performs better than that of a normal distribution.Although the accuracy of the estimations was the same for both of the assumptions, those with the Bridge assumed random intercepts had a smaller absolute value of biases.
In the current study, we used coriander fruit syrup data in which the duration and pain severity of the attacks were combined and assessed among migraine patients.We showed that the intervention significantly reduces the adverse outcomes of migraine.It has been demonstrated that Linalool is the main component of coriander [26,27].The results of univariate analysis have shown that the duration and frequency of migraine attacks as well as pain degree decrease over the time with use of coriander [18].

CONCLUSION
Assuming a bridge distribution for the random intercept of binary outcome provides the same interpretation of parameter estimates in both cases of integrating and not integrating over the random effects.In addition, our study revealed that even if the random intercept of binary logistic regression followed a normal distribution, assuming a Bridge distribution for this random effect in the model leads to slightly more accurate results.This result was observed in both of direct and conditional joint modelling approaches.

TABLE 1 .
Mean (SD) and frequency (percentage) of severity (Continuous response) and Pain (Binary Response) along with 4 time points

TABLE 2 .
The results of Direct and MC approaches

TABLE 3 .
Simulation study with zero correlation between the random intercepts *sample size;**the absolute value of biases; ***Mean square error

TABLE 4 .
Simulation study with 0.4 correlation between the random intercepts *sample size;**the absolute value of biases; ***Mean square error