Should we use logistic mixed model analysis for the effect estimation in a longitudinal RCT with a dichotomous outcome variable?

Background: Within epidemiology both mixed model analysis and GEE analysis are frequently used to analyse longitudinal RCT data. With a continuous outcome, both methods lead to more or less the same results, but with a dichotomous outcome the results are totally different. The purpose of the present study is to evaluate the performance of a logistic mixed model analysis and a logistic GEE analysis and to give an advice which of the two methods should be used. Methods: Two real life RCT datasets with and without missing data were used to perform this evaluation. Regarding the logistic mixed model analysis also two different estimation procedures were compared to each other. Results: The regression coefficients obtained from the two logistic mixed model analyses were different from each other, but were always higher then the regression coefficients derived from a logistic GEE analysis. Because this also holds for the standard errors, the corresponding p-values were more or less the same. It was further shown that the effect estimates derived from a logistic mixed model analysis were an overestimation of the ‘real’ effect estimates. Conclusion: Although logistic mixed model analysis is widely used for the analysis of longitudinal RCT data, this article shows that logistic mixed model analysis should not be used when one is interested in the magnitude of the regression coefficients (i.e. effect estimates).


INTRODUCTION
Within epidemiology, the two most frequently used methods to analyse longitudinal data from a ramdomised controlled trial (RCT) are Generalised Estimating Equations (GEE analysis) and mixed model analysis.The latter is also known as multilevel analysis, random coefficient analysis or hierarchical linear modeling.The general idea of both methods is that an adjustment is made for the dependency of the observations within an individual over time.In GEE analysis this adjustment is performed by modeling the within subject correlation matrix [1,2], while in mixed model analysis, this adjustment is performed by modeling the difference between the subjects (i.e. the between subject variance) [3,4].Because the correlation within the subject is essentially the same as the difference between the subjects, the estimated regression coefficients may be expected to be the same in both methods.However, there is also another difference between the two methods.GEE analysis is known as a 'population average' approach, while mixed model analysis is known as a 'subject specific' approach [5].This does not influence the values of the estimated regression coefficients obtained from a linear GEE analysis and a linear mixed mode analysis, but it does influence the values of the estimated regression coefficients obtained from a logistic GEE analysis and a logistic mixed model analysis.The difference in regression coefficients is a theoretical one, which is always in favor of a mixed model analysis, meaning that the regression coefficients obtained from a logistic mixed model analysis will always be higher (i.e.further away from zero) compared to the regression coefficients obtained from a logistic GEE analysis.This difference is based on a mathematical relationship and depends on the magnitude of the between subject variance (see equation 1) [6,7].When there is more between subject variance, the difference between the regression coefficients will be larger.
Where β (pa) is population average regression coefficient obtained from a logistic GEE analysis, σ b 2 is between subject variance and β (ss) is subject specific regression coefficient obtained from a logistic mixed model analysis.
Both GEE analysis and mixed model analysis are used for the analysis of longitudinal data with a dichotomous outcome variable, but from the literature it is not clear which of the two methods should be used and which regression coefficients should be reported [7][8][9][10].In general, it is sometimes argued that mixed model analysis should be preferred above GEE analysis because mixed model analysis is more suitable to deal with missing data [11][12][13].
In this paper we will illustrate the differences between the regression coefficients obtained from a longitudinal logistic GEE analysis and a longitudinal logistic mixed model analysis by using examples from two RCTs with and without missing data.The aim of the study was to evaluate the performance of both methods and to provide an advice on which of the two methods should be used and which of the results should be reported.

Datasets
The differences between results from a logistic GEE analysis and a logistic mixed model analysis are illustrated in datasets from two RCT's.The first example dataset is derived from an RCT aimed to assess the effectiveness of a classification based treatment approach compared to usual physical therapy care in patients with subacute or chronic low back pain [14].The outcome variable of interest was functional status, which was measured with the 10-item Oswestry Disability Index (ODI) [15], with higher scores indicating lower functional status.The maximum score on the ODI is 50 and in the present study a cut off value of 12 was used to distinguish between good (< 12) or bad (≥ 12) functional status [16].The outcome variable was assessed at 8, 26, and 52 weeks after the start of treatment.
The second example dataset is derived from a 3-arm RCT regarding an internet-based treatment for adults with depressive symptoms [17].Besides a waiting list (WL) group, two interventions were evaluated, i.e. an internetbased cognitive behavioral therapy (CBT) and an internetbased problem solving therapy (PST).As outcome variable self reported depression (measured with the Center for Epidemiological Studies Depression scale (CES-D)) was measured at 5, 8 and 12 weeks.The CES-D is widely used for identifying individuals with depression and a score of 16 or higher is considered to represent clinical depression.
The two datasets differ from each other in the number of groups to be compared and in the percentage of missing data (see table 1).Both studies were approved by the Medical Ethics Committee of the VU University Medical Center in Amsterdam.

Analysis
For both example datasets a logistic GEE analysis and a logistic mixed model analysis were performed.For all logistic GEE analyses, an exchangeable correlation structure was used and for all logistic mixed model analyses only a random intercept was modeled.Regarding the logistic mixed model analyses, two estimation procedures were used; a maximum likelihood procedure performed with the xtmelogit procedure in STATA [18] and a (2 nd order) penalized quasi likelihood procedure performed with MLwiN [19,20].
For both datasets, the differences between the groups at the different time points were estimated simultaneously, by treating time as a categorical variable represented by dummy variables and by adding interactions between the group variable(s) and the time dummy variables to the model.
Should we use logistic mixed model analysis for the effect estimation in a longitudinal RCT with a dichotomous outcome variable?
To illustrate the influence of missing data on the results of the logistic GEE analysis and the logistic mixed model analysis, in both datasets, one analysis was performed on the total dataset including missing values, and one analysis was performed on a dataset with only complete cases.To evaluate the performance of the different methods, the estimated probabilities of the outcome variable were compared to the observed percentages at the different time points.

First example dataset
Table 2 shows the results of the logistic GEE analysis and the two logistic mixed model analyses performed on the first example dataset regarding the physical therapy intervention on patients with low back pain.As expected the regression coefficients obtained from the logistic GEE analysis were much lower than the ones obtained from the logistic mixed model analyses.The magnitude of the difference between the methods was more or less expected given the estimated between subject variance and the mathematical relationship shown in equation 1.Note that also the results obtained from the two logistic mixed model analyses were quite different.
To evaluate the performance of the different methods, the predicted probabilities were compared to the observed percentage of good functional status (table 3).It can be seen that most of the predicted probabilities were different from the observed percentages.However, the predicted probabilities based on the results of the logistic GEE analysis were much closer to the observed percentages compared to the predicted probabilities based on the results of the logistic mixed model analyses.
When only the complete cases were analysed (tables 4 and 5) the difference in regression coefficients between the methods was comparable to the differences observed in the analyses regarding the total dataset (i.e.including cases with missing observations).However, in the complete data the predicted probabilities obtained from the logistic GEE analysis were exactly the same as the observed percentages, while the predicted probabilities obtained from the logistic mixed model analyses were (again) too high for probabilities above 50% or too low for probabilities below 50%.

Second example dataset
Table 6 shows the results of both the logistic GEE analysis and the (two) logistic mixed model analyses performed on the second example dataset, i.e. the 3-arm RCT regarding the internet based treatment of depressive symptoms.Table 7 shows the corresponding observed percentages of depressed subjects and the predicted probabilities.
The differences between the results obtained from the different methods were comparable to the ones observed in the first example dataset, i.e. the regression coefficients obtained from the logistic mixed model analyses were much higher (i.e.further away from zero) compared to the regression coefficients obtained from the logistic GEE analysis.Again, the predicted probabilities obtained from the logistic GEE analysis were much closer to the observed percentages than the observed probabilities obtained from the logistic mixed model analyses.
The results of the analyses on a complete dataset (tables 8 and 9) also show the same picture as for the first example dataset.The predicted probabilities from the logistic GEE analysis were exactly the same as the observed percentages, while the predicted probabilities derived from the logistic mixed model analyses were (mostly) too high.

DISCUSSION
In this paper we compared the performance of a logistic GEE analysis with the performance of logistic mixed model analysis applied on two longitudinal RCT datasets.Based on the results (i.e. the comparison    Should we use logistic mixed model analysis for the effect estimation in a longitudinal RCT with a dichotomous outcome variable? between observed and predicted probabilities), we can conclude that the regression coefficients obtained from a logistic mixed model analysis are too high and should therefore not be used as effect measure.There are several papers in which a logistic GEE analysis (a population average approach) is compared to a logistic mixed model analysis (a subject specific approach).Most of these comparisons were made on cross-sectional data with clustering of data on for instance neighborhood level, school level, etc.Although the directions of the differences were comparable to the ones observed in the present study, the magnitude of the differences was, in general, much lower [21][22][23].This is due to the fact that the between cluster differences in these cross-sectional studies are much lower than the between cluster (i.e.subject) differences within a longitudinal study.It was already mentioned that the magnitude of the differences between the results of the two methods depend on the magnitude of the between cluster/subject variance (see equation 1).Surprisingly, in none of the papers comparing logistic GEE analysis with logistic mixed model analysis, a recommendation is provided which of the two methods should be used.It is sometimes argued that preferring one method above the other depends on the question to be answered [8,24].In general, if one is interested in the regression coefficient, i.e. the effect estimation, a population average approach should be used and when one is interested in estimating the heterogeneity between subjects in a longitudinal study or between clusters in a cross-sectional study, a subject-specific approach should be used.In longitudinal RCTs, one is not interested in the heterogeneity between subjects, but one is interested in the effect estimation, taking into account the dependency of the observations within the subjects and treat it as a nuisance.For this purpose, logistic GEE analysis provides a valid estimate of the coefficient, while logistic mixed model analysis does not.
One of the arguments against the use of a logistic GEE analysis is that the results of a logistic GEE analysis are biased when there are missing data, especially when the missing data are not completely at random, i.e. not MCAR [11][12][13].In most longitudinal RCTs, there is missing data and in most longitudinal RCTs, the missing data are not MCAR, so it is a common believe that a logistic GEE analysis should not be used in those situations.Although this argument is theoretically true, it should be realised that   the percentage of missing data must be very high to have a detrimental influence on the validity of the results of a GEE-analysis [5] and that a logistic mixed model analysis is only valid in situations when missing data is missing at random (MAR) and when the model is correctly specified (i.e. with a random intercept and with all necessary random slopes) [5].In the analysis performed on the example datasets it is not clear what the impact of the missings is on the estimation of the effect of the intervention.However, looking at the predicted probabilities from both the logistic GEE analysis and the logistic mixed model analyses, the influence of missing data is not very big.In all analyses the comparison between the predicted probabilities and the observed frequencies was in favor of the logistic GEE analysis.This is despite the fact that the missing data in both datasets was not completely at random [14,17] and that the percentage of missing data in the second example dataset was relatively high.There might be theoretical situations with larger amounts of MAR data in which logistic mixed model analysis might outperform logistic GEE analysis.However, longitudinal RCTs usually have less than 25% missing data.
It is sometimes argued that logistic GEE analysis and logistic mixed model analysis can be used interchangeable, because both the regression coefficients and the standard errors are higher in a more or less systematical manner when they are derived from a logistic mixed model analysis compared to a logistic GEE analysis.Consequently, the p-values and the answer to the question whether there is a significant difference between the intervention(s) and the control group is similar between the two statistical methods.When one is only interested in hypothesis testing, this is a valid argument, but nowadays, especially in epidemiology the major interest is in the estimation of the magnitude of the effect of the intervention(s) (i.e.regression coefficients and confidence intervals) rather than in hypothesis testing.And because the effect estimates are highly different between the two methods, one should make a careful choice between the two methods irrespective of the level of significance.
The comparisons in this paper also show that the results obtained from a logistic mixed model analysis vary considerably depending on the estimation procedure used.Should we use logistic mixed model analysis for the effect estimation in a longitudinal RCT with a dichotomous outcome variable?
There was a remarkable difference in the results obtained from a penalised quasi likelihood approach compared to the results obtained from a maximum likelihood approach.
From the literature there is some evidence that the penalised quasi likelihood approach is slightly better than the maximum likelihood approach [25], which is more or less confirmed by our results.Nevertheless, both methods are frequently used.The difference observed between the two estimation procedures is a further indication that the results of a logistic mixed model analysis should be interpreted with great caution.The present study deals with longitudinal data.As been mentioned before, mixed model analysis is also used in cross-sectional studies where individual data is clustered within for instance neighborhoods or schools.In those situations the same occur, although the differences between the results obtained from a logistic GEE analysis and a logistic mixed model analysis are less pronounced, due to the lower between cluster variance.When a longitudinal multicenter trial is performed, besides the clustering of the repeated measurements within the subjects, there is also clustering on the center level.When the number of centers is relatively large, a logistic GEE analysis can not be used anymore because within a (logistic) GEE analysis it is not possible to take into account clustering on more than one level.When the number of centers is relatively small, the center could be added as a covariate to the model.Mixed model analysis is capable of dealing with clustering on more than one level, so when also the clustering on the center level must be taken into account, a (logistic) mixed model analysis should be used with the same 'problems' as has been shown in the present paper.The simplest solution to this 'problem' is to ignore the clustering on the center level and to use a logistic GEE analysis.The effect of this ignoring approach depends, of course, on the magnitude of the between center variance.An alternative solution is to use a logistic mixed model analysis taking into account both the clustering on the subject level and on the center level and to transform the obtained subject specific regression coefficients into population average regression coefficients by using equation 1.However, in the latter the estimated regression coefficients will still highly depend on the estimation procedure used.

CONCLUSIONS
This paper shows that logistic GEE analysis outperforms logistic mixed model analysis for longitudinal RCT data regarding the estimated regression coefficients (i.e the effect estimates).It is also shown that the regression coefficients obtained from a longitudinal logistic mixed model analysis are an overestimation of the actual regression coefficients.It is therefore advised to use a longitudinal logistic GEE analysis for the effect estimation in longitudinal RCTs.

TABLE 1 .
Number of subjects measured at the different time-points in the two example datasets 1between subject variance obtained from the mixed model analyses

TABLE 2 .
Regression coefficients and standard errors (between brackets) of different longitudinal logistic regression analyses regarding the low back pain intervention

TABLE 4 .
Regression coefficients and standard errors (between brackets) of different longitudinal logistic regression analyses regarding the low back pain intervention from a compete case analysis

TABLE 5 .
Observed percentages of good functional status and predicted probabilities derived from different longitudinal logistic regression analyses regarding the low back pain intervention from a complete case analysis

TABLE 3 .
Observed percentages of good functional status and predicted probabilities derived from different longitudinal logistic regression analyses regarding the low back pain intervention

TABLE 6 .
Regression coefficients and standard errors (between brackets) obtained from different logistic longitudinal data analyses performed on the 3-arm RCT regarding the internet based treatment of depressive symptoms

TABLE 7 .
Observed percentages of depressed subjects and predicted probabilities obtained from different logistic longitudinal data analyses performed on the 3-arm RCT regarding the internet based treatment of depressive symptoms

TABLE 8 .
Regression coefficients and standard errors (between brackets) obtained from different logistic longitudinal data analyses performed on the 3-arm RCT regarding the internet based treatment of depressive symptoms from a complete case analysis

TABLE 9 .
Observed percentages of depressed subjects and predicted probabilities obtained from different logistic longitudinal data analyses performed on the 3-arm RCT regarding the internet based treatment of depressive symptoms from a complete case analysis Epidemiology Biostatistics and Public Health -2017, Volume 14, Number 3