Analysing outcome variables with floor effects due to censoring: a simulation study with longitudinal trial data

Analysing outcome variables with floor effects due to censoring: a simulation study with longitudinal trial data


INTRODUCTION
Randomized controlled trials (RCTs) are assumed to be the gold standard to estimate treatment effects [1].Because RCTs often include multiple follow-up measurements, longitudinal statistical methods have to be used to estimate the treatment effects [2].When patients receive effective treatment over time it may result in reaching the limit of a certain scale.When many patients reach this limit, it results in a skewed distribution of the outcome with an excess of either the lowest (floor effect) or the highest score (ceiling effect) on the scale of measurement.In some instances the true outcome follows a normal distribution, but values below a certain threshold are not detected, a phenomenon which is also known as censoring [3].In epidemiological and clinical studies, there are many examples of outcome measures that show floor or ceiling effects over time due to censoring.Functional ability measures, such as the Disability Index of the Health Assessment Questionnaire (HAQ-DI) (floor effect) and the Barthel index (ceiling effect), are especially prone to this phenomenon [4].The problem with, for instance, the floor effect of HAQ-DI is that patients that score zero should not be regarded as patients that truly all have the same score.For some of the patients the true score may fall beyond the scale of the measurement instrument.In other words, there is a certain variance between patients at the limit that cannot be observed.
Reviewing the medical literature, it is striking to see how these kind of data (i.e.skewed to the right longitudinal RCT outcomes with floor effects due to censoring) are analysed.Non-parametric testing is mostly used which ignores the longitudinal nature of the data and does not lead to an effect estimate [5].When the longitudinal nature of the data is not ignored, traditional longitudinal methods, such as linear mixed models or linear generalised estimation equations (GEE-analysis) are used [6,7].However, these linear methods assume a normal distribution of the outcome.Previous research has shown that when a normal distribution is assumed while a (strong) floor effect is present, regression parameters become biased and artefactual non-linearity can occur [8][9][10][11].To deal with either floor or ceiling effects, a tobit mixed model analysis can be used.The tobit mixed model assumes that there is a certain variation in the upper or lower limit of the scale [12].The tobit mixed model [13,14] has rarely been used to analyse RCT data with floor effects in epidemiological and clinical studies.On the other hand, it is not clear to what extent longitudinal linear mixed models are able to handle floor effects due to censoring, and when it is necessary to use a longitudinal tobit model.Therefore, the objective of this study was to compare the performance of a longitudinal linear mixed model with a longitudinal tobit mixed model in different data situations of floor effects in a longitudinal RCT.We simulated data in which we varied the number of follow-up measurements, the sample size, the magnitude of the treatment effect and the number of zeroes in the follow-up measurements of the outcome variable.Furthermore, both methods were used to analyse data from an empirical RCT.

Simulation
With Stata [15], we simulated 'true' longitudinal RCT datasets with a normally distributed outcome variable.Table 1 shows the Stata syntax used to set up the simulations.
The simulation set up shown in Table 1 included 3 measurements (one baseline and two follow-up measurements) on 100 patients.A dichotomous treatment variable was created that randomly and evenly assigned the simulated patients to either the control group or the intervention group.The parameters used for the simulation set up were derived from the empirical dataset.Both the random intercept variance and the residual variance were about 0.5 (i.e. the square root of 0.7).The model which was used for the simulation included -besides the random intercept variance and the residual variance -a fixed intercept of 1.5, a regression coefficient for time of -0.1 and a regression coefficient for the time*treatment interaction of -0.2, leading to an outcome variable which was normally distributed.The inverse sign of the two regression coefficients indicate a linear decrease in outcome over time for the control group and a stronger linear decrease over time in outcome for the intervention group.In order to generate floor effects, all negative outcome values were set to zero.Table 2 shows the different conditions that were used in the simulations.A total of 12 conditions were compared and for each condition 1000 samples were generated.
Figure 1 shows the percentage of patients with floor effects for the different conditions, separately for the control and treatment group over time.The magnitude of the treatment effect and the number of follow-up measurements were related to the amount of zeroes.A stronger treatment effect and more follow-up measurements resulted in more patients reaching scores below zero on the scale of the outcome measurement.

Modelling simulated data with and without floor effects
For all analyses, we used mixed models with a random intercept to take into account the dependency of the observations within the patient in the simulated longitudinal data [6,16].The following analyses were performed: 1) a linear mixed model analysis on the simulated RCT data without floor effects.2) a linear mixed model analysis on the simulated RCT data with floor effects and 3) a tobit mixed model analysis on the simulated RCT data with floor effects.In all models, treatment, time as well as the interaction between treatment and time were added (see Table 1, Table 2 and Equation 1).(1) where y 1ij is the outcome; (β 0 +μ oj ) is the patient specific intercept, x 1ij is the covariate time, x 2ij is the covariate treatment, x 3ij is the time treatment interaction covariate, and ε ij is the time specific error component.The tobit mixed model is specified in a similar fashion, only the y ij is replaced by y ij , which indicates the difference between the linear mixed model and the tobit mixed model, i.e. for the latter the outcome is estimated on a latent (unobservable) scale, allowing some variation in the lower limit of the scale.
For both models the interaction between treatment and time is the most important parameter, because this reflects the difference in change over time between the intervention and control group, which can be considered as the treatment effect.
In a second analysis, from the same simulations with the same three methods, the overall difference between the treatment groups over time was estimated (see Equation 2). ( where y 1ij is the outcome; (β 0 +μ oj ) is the patient specific intercept, x 1ij is the covariate treatment, and ε ij is the time specific error component.
Where a linear mixed model analysis assumes a normal distribution for all patients, even though there might be a skewed distribution due to the high amount of zeroes, the tobit mixed model assumes a normal distribution for the patients without the floor effects and combines this with the assumption that the patients with floor effects are not all equal with respect to their observed outcome scores.Actually, the tobit model assumes that the observed outcomes result from a latent normally distributed variable, where observations are censored if they fall below a certain threshold.This enables the estimation of values outside the range of the observed patient scores, which may better match the 'true' variation in outcome scores of the patients [17].
Both linear mixed model analyses and tobit mixed model analyses were performed in Stata [15].

Performance of the models
Based on Burton et al. [18], we examined bias, standardized bias and coverage of regression coefficients to compare the performance of the linear mixed model and the tobit mixed model.Bias was determined by comparing the difference between the true regression coefficients of the interaction between treatment and time (i.e.-0.2 for the moderate treatment effect and -0.5 for the strong treatment effect) to the regression coefficients of the linear mixed model analysis and tobit mixed model analysis obtained from the data with floor effects.For illustrative purposes we also calculated the regression coefficients for the simulated datasets without censoring.
Standardized bias was calculated by expressing the bias as a percentage of the standard error of the estimate.A value above 40 percent, both negative as well as positive, was considered poor [19].The percentage of times that the confidence interval of the calculated regression coefficients included the 'true' regression coefficient was used to assess the coverage.Since we use a 95% confidence interval, the ideal coverage should have a score of 95%.Values above 95% (over-coverage) reflect a too high type II error rate in the parameter estimates, values under 95% will lead to an increase of type I errors in the parameter estimates [19].All performance parameters were calculated in Stata [15].

RESULTS
Table 3 shows the regression coefficients and standard errors (SE) of the treatment and time interaction for all simulated datasets.It can be seen that the results of the tobit mixed model analysis were the same as the true values as well as the results obtained from the mixed model analysis on the datasets without floor effects.The results of the linear mixed model analysis on the datasets with floor effects showed high underestimations of the regression coefficients, which became stronger when the treatment effect and the number of follow-up measurements increased.All differences were independent of the sample size, The differences between the methods are also reflected in the standardised bias and coverage (see Table 4).
When the overall treatment effect was considered, the regression coefficients of the tobit mixed model analysis were slightly lower than the ones obtained from the linear mixed model analysis on the datasets without floor effects.Because the simulated datasets were not set up to estimate the overall treatment effect, we could not compare the results with the true values.For the linear mixed model analyses on the datasets with floor effects, the underestimation of the regression coefficients were much stronger than for the tobit mixed model analyses (see Table 5).Again the differences between the methods were stronger when percentage of zeroes became higher (i.e.stronger treatment effect and more follow-up measurements) and were also reflected in the standardised bias and coverage of the different analyses (see Table 6).According to the standardized bias and the coverage, in the dataset with the highest percentage of zeroes, the tobit mixed model analyses also did not perform well in estimating the overall treatment effect.

Empirical example
To illustrate the differences between a linear mixed Analysing outcome variables with floor effects due to censoring: a simulation study with longitudinal trial data model analysis and a tobit mixed model analysis in a real life example, we used data from the tREACH (Treatment in the Rotterdam Early Arthritis Cohort) trial (ISRCTN26791028) [20,21].This trial contained HAQ-DI scores of 281 patients with recent-onset arthritis that had a high probability of progressing to persistent arthritis and compared three groups with each other: Group A (91 patients) received a combination therapy (methotrexate (MTX) + sulfasalazine (SSZ) + hydroxychloroquine (HCQ)) with intramuscular glucocorticoids (IM GCs), group B (93 patients) received a combination therapy (MTX + SSZ + HCQ) with an oral glucocorticoids (oral GCs) and group C (97 patients) received MTX + oral GCs.Measurements were conducted at baseline, after 3, 6, 9, and after 12 months and because there was no linear development over time, time was treated as a categorical variable represented by dummy variables.
Table 7 shows the percentages of zeroes at every time point for the three medication groups.It can be seen that the percentages of zeroes were comparable to the ones simulated in the datasets with a moderate treatment effect, although the differences between the groups were less strong.
Figure 2 shows the predicted HAQ-DI score over time for the three medication groups.Both models showed a similar development over time.However, the tobit mixed model analysis showed a larger difference between the medication groups at 6 months (time 3) and at 9 months (time 4) in comparison to the linear mixed model analysis.The largest difference was observed at 9 months, where the tobit mixed model showed a p-value close to the .05significance limit (i.e.0.055) compared to the linear mixed model (i.e.0.133).

DISCUSSION
The results of the present simulation study showed that the tobit mixed model performed much better than the linear mixed model in handling floor effects (an excess of zeroes) due to censoring in all conditions.When the interaction between treatment and time was estimated, the results of the tobit mixed model were the same as the true values, while the linear mixed model analysis showed underestimated regression coefficients.When the overall treatment effect was estimated, all regression coefficients were lower than the ones obtained from the linear mixed model analysis on the dataset without zeroes, but the tobit mixed model performed much better than the linear mixed model.However, when the percentage of zeroes increased   According to the standardized bias and the coverage, the linear mixed model only performed sufficiently for the situation with a moderate treatment effect and only 2 follow-up measurements.However, even in this situation the treatment effects were highly underestimated.Furthermore, the empirical example showed that relevant treatment effects could be missed when analysed with a linear mixed model where a tobit mixed model would be more appropriate.It should be noted that the differences between the two methods in the TReach data were relatively small, which was due to the relatively small intervention effects.
The objective of this simulation study was to compare a longitudinal linear mixed model analysis with a longitudinal tobit mixed model analysis for different situations with floor effects due to censoring in longitudinal RCT data.The comparison was performed because in most real life situations a linear mixed model analysis is used to analyse these kind of data, ignoring the specific nonnormal distribution of the data and the excess of zeroes.Not much research has been done on this topic regarding longitudinal RCT data.Twisk and Rijmen [3] conducted an empirical study and looked at the development of rehabilitation over time in stroke patients by using the Barthel index.They concluded, by examining the model fit, that the tobit mixed model was better suited to analyse outcome variables that contained ceiling effects.Wang et al. [22] have examined the performance of a linear and a tobit growth curve model in simulated data with and without ceiling effects.They showed that the use of a longitudinal linear growth model led to incorrect model selection and bias in the estimation of the magnitude of the changes over time, and that the longitudinal tobit model performed very well.In an additional empirical data analysis they found that comparing age groups with different proportions of ceiling data could lead to wrong conclusions when traditional methods were applied.We found similar results in our simulation study regarding floor effects and showed that the use of linear mixed model analyses led to a high underestimation of the interaction between treatment and time.
the overall treatment effect was considered, the linear mixed model analysis again showed a strong underestimation of the treatment effect.However, also the tobit mixed model analysis showed a slight underestimation of the treatment effect, which became stronger when the percentage of zeros increased.It should be realised that the overall treatment effect was estimated from the same simulation models as the interaction between treatment and time and that the simulation models were set up to estimate the interaction between treatment and time.In an additional simulation study, we set up the simulation models to estimate the overall treatment effect, without adding an interaction between treatment and time.In these simulations, the regression coefficients for the overall treatment effect of the tobit mixed model analyses were the same as the true values, while the regression coefficients of the linear mixed model analyses were (again) much lower (see Table 8).
It should be realized that the tobit mixed model analysis is one of the options to deal with the excess of zeros in RCTs.There are other possibilities to deal with this problem.Specifically for RCT's an adjustment for the baseline value of the outcome often leads to a normal distribution of the residuals even in situations where the outcome variable is hampered by an excess of zeros.Adjustment for the baseline value of the outcome is often used to take into account differences in baseline values between the treatment and control group [2,23].However, when the outcome variable at baseline is not hampered by an excess of zeros, while the follow-up measurements are, it is doubtful whether the residuals of the analyses will be normally distributed.Besides the tobit model, other regression models also can be considered to deal with the excess of zeros [24].An extensive discussion of these (complicated) methods is beyond the scope of this paper.
In the present simulation study, we analysed two possible treatment effects.The moderate treatment effect of 0.2 would be the most realistic to occur in real life data.We included the stronger treatment effect to investigate the model performance in more extreme situations.For both the moderate and the strong treatment effect, the use of a tobit mixed model analysis led to more valid results than a linear mixed model analysis.Moreover, we simulated a 'true' outcome without floor effects that was normally distributed, after which floor effects were created by setting all negative values to zero.Thus, we assumed that the true outcome is normally distributed.It should be noted that a stronger treatment effect that remains for a longer period over time influences the shape of the distribution.It is important to keep in mind that real life outcomes with floor effects may not always follow an underlying normal distribution.It is not clear how the tobit mixed model will behave in those situations.
In this paper we assumed that the excess of zeroes in the outcome was due to censoring.However, an excess of zeroes does not always occur because of censoring, zeroes in the data may reflect 'real' zeroes.Examples of such data are the number of cigarettes smoked to analyse smoking behaviour, or the number of hypoglycaemic events in diabetic patients.The tobit model is not suitable to analyse these kinds of data.Outcomes that contain 'real' zeroes can better be analysed with other methods [25,26].

CONCLUSION
The tobit mixed model performed much better than a linear mixed model and should therefore be used in longitudinal RCTs with floor effects due to censoring.
FIGURE 1. Percentage of patients with floor effects in the censored simulated data for the different conditions in the control and treatment group at baseline and the follow-up measurements.

FIGURE 2 .
FIGURE 2. Predicted HAQ-DI score over time for the three treatment groups

TABLE 1 .
Syntax to simulate RCT datasets with an excess of zeros *In the original simulations, negative values were not replaced by zero

TABLE 2 .
Different conditions used in the simulations

TABLE 4 .
Coverage* and standardized bias** of the treatment and time interaction estimated in different simulated datasets***

TABLE 3 .
Regression coefficients and standard errors (SE) of the treatment and time interaction estimated in different simulated datasets * Estimated treatment and time interaction in the model without zeros.

TABLE 6 .
Coverage* and standardized bias** of the overall treatment effect estimated in different simulated datasets*** * The proportion of times the 95% confidence interval include the average of the true model (values < 90 percent are in bold) ** Considered to be biased if standardized bias > 40 percent (in bold italic) *** Values are compared to the estimated values from a model without zeros

TABLE 5 .
Regression coefficients and standard errors (SE) of the overall treatment effect estimated in different simulated datasets** * Simulation models were set up to estimate the treatment and time interaction.Epidemiology Biostatistics and Public Health -2018, Volume 15, Number 2 Analysing outcome variables with floor effects due to censoring: a simulation study with longitudinal trial data strongly over time the tobit mixed model also did not result in accurate estimates of the overall treatment effect.
* Estimated overall treatment effect in the model without zeros.*