A Support Vector Regression Approach for Three – Level Longitudinal Data

Background: Longitudinal data structure is frequently observed in health science. This introduces correlation to the data that needs to be handled in modelling process. Recently, machine learning approaches have been introduced in the context of longitudinal data for prediction of the response variable purpose. In this paper a mixed-effects least squares support vector regression model is presented for three-level longitudinal data. In the proposed model, multiple random-effect terms are used for considering the existing correlation structures in longitudinal data. The proposed model is flexible in modelling (non-)linear and complex relationships between predictors and response, while it takes into account the hierarchical structure of data and is computationally efficient. Methods Both random intercept and random trend models with a special correlation structure of errors are illustrated. A real data example on human Brucellosis rate is analysed and two simulation studies are performed to illustrate the proposed model. The fitting and generalisation performance of the proposed model are investigated and compared with the ordinary least squares support vector regression and linear mixed-effects models. Results: Based on the human Brucellosis rate example and two simulation studies, the proposed models had the best performance in generalisation. Also, the fitting performances of the proposed models were better than that of the classic models. Conclusion: Our study revealed that in the presence of nonlinear relationship between covariates and outcome, the proposed MLS-SVR model has the best fitting and generalisation performance and can capture correlation of the data.


INTRODUCTION
Longitudinal data analysis are found frequently in clinical research where a subject has multiple correlated observations over time and can be assumed as a two-level data set.This correlated structure needs to be accounted for in the analysis [1,2].In some cases, the two-level longitudinal data may be nested in the centres or clusters which adds an additional level to the data and forms a three-level longitudinal data structure.The intracluster correlation is then introduced in the three-level longitudinal data, and ignoring this correlation in the analysis procedure can lead to incorrect estimations of the standard errors [2].
Variants of classical models have been developed and introduced for modelling the two-level and three-level longitudinal data under a variety of names: hierarchical linear models, random-effects models, two-stage models and random regression models [1][2][3].In most of these models, the random-effects terms have been considered to account for the existing correlations of the longitudinal data.Linear mixed-effects models (LMM) [2], generalized linear mixed-models (GLMM) [2] and generalized estimating equations (GEE) [4] models are the most commonly used methods in the longitudinal data analysis.However, fitting these models is not possible in high-dimensional data sets (number of observations < number of variables) by taking into account all covariates in the model at the same time [5].Also, the relationships between covariates and response variable are considered as linear or limited nonlinear forms and need to specify the functional form of covariates and response for the appropriate transformation of covariates [6].Therefore, it is not possible to capture unspecific complex nonlinear relationships between variables using these parametric methods.
In recent years, the use of machine learning methods has become very interesting.One promising and useful method of the machine learning is support vector machines (SVMs) introduced by Vapnik [7] for classification and function estimation problems.There are some properties which make the SVM interesting versus other machine learning techniques such as artificial neural network (ANN) and random forest (RF) methods.Unlike ANN, the SVM does not lead to local minima and is less prone to overfitting [8].RF provides discrete models for prediction and the range of predictions are limited to the range of response variable over training set [9,10].Also the RF technique tends to underestimate and overestimate higher and lower values, respectively.Moreover, theoretical analysis of the RF is difficult [9].ANN and RF require long training time [11].The SVMs can be fitted in high-dimensional data sets and can consider complex nonlinear relations between covariates and outcome by using kernel functions [12,13].Suykens [18] presented an extended model of the SVM called least-squares support vector machine (LS-SVM) which uses the least-squares loss function.The optimisation problem in LS-SVM is solved linearly with a unique solution which makes it computationally time-saving [19].The regression or function estimation form of LS-SVM is called least-squares support vector regression (LS-SVR) [19].
Kernel technique and SVM methods have been used in various studies for both classification and regression problems in longitudinal data.Luts [20] and Chen [21] used LS-SVM and kernel methods for classification of the longitudinal data.Also, a mixed-effects LS-SVR model was proposed by Seok [22] for continuous longitudinal outcomes and developed by Shim [23] for time-varying coefficients.There have been some studies which have used other machine learning techniques in longitudinal or clustered data [24][25][26][27].All these studies have been done in two-level longitudinal data.
In this study, we propose a novel mixed-effects LS-SVR (MLS-SVR) model for predicting continuous outcomes in three-level longitudinal data which can be considered by a random intercept for level-3, both random intercept and trend for level-2, and a correlation structure for errors.Also our proposed method can capture the unspecific non-linear relationships between features and outcome by using a kernel function.To the best of our knowledge, this is the first SVM or kernel method study that considers the centre or cluster (third level) effect in three-level longitudinal data setting.In a recently published study, Moqaddasi et al. developed the LS-SVR method into the three-level setting.However, they considered only the random intercept term in each level [28].We compare the generalisation performance of the proposed model with that of the ordinary LS-SVR and LMM in a real life data example and two simulation studies with various scenarios.
The rest of this paper is structured as follows.In materials and methods section, we describe the ordinary LS-SVR and present the proposed model with estimation procedure.Also, the numerical studies through a real data analysis and two simulation studies are described in this section.Then we provide the results of the real data example and simulation studies.Finally, discussion and conclusion sections are presented, respectively.

MATERIALS AND METHODS
The special property of the SVM is the nonlinear fitting using the kernel approach.In kernel-based techniques, a nonlinear problem are solved linearly by mapping the feature space into a higher dimension space [14,15].Nevertheless, the SVM is computationally intensive because of using numerical optimisation algorithms and solving a quadratic optimisation problem.Hinge and ε-insensitive loss functions are two common losses used in the SVM in classification and regression problems, respectively [16,17].Because of the expensive computation of the SVM we used LS-SVR with least squares loss function.In this section, we introduce the ordinary LS-SVR with the estimation procedure.Then, we propose the three-level random intercept and trend, i.e.MLS-SVR, with an estimation technique for three-level longitudinal data.A generalized cross-validation (GCV) function presented in [23] is used to obtain the optimal regularization parameters.

Least-squares support vector regression
In this section, we explain the ordinary LS-SVR.Suppose the training data set is denoted by with feature vector and the response .The relation of the response and feature variables is shown in regression setting as follows Three-level SVR Method where b is the bias term.φ(.) is the feature mapping function which maps the input space into a higher dimensional feature space and ω is a weight vector of the same dimension as φ(.) .The optimisation equation (J) of the nonlinear LS-SVR is as follows subject to the equality constraints of here λ>0 is a regularization parameter, is the w e i g h t vector and is the error vector.The primal Lagrange function can be constructed as where α i ≥0 are the Lagrange multipliers.The optimality conditions are given by After eliminating e i and ω , the optimal values of α i and bare obtained b y solving a linear equation as follows: where , , I n is the identity matrix of dimension n and K is the n×n matrix with elements, is the vector of ones of dimension n.After solving the linear equation, we can find the optimal values of the bias , b, and Lagrange multipliers α̂i.Then the optimal regression function for the given 0 x is obtained as:

Three-level random intercept and trend LS-SVR
Now, we propose the MLS-SVR with random intercept and trend for a three-level longitudinal data.Let the training data set be , where ijk y is the k-th observation of the response variable of the j-thcentre of the level-2 in i-thcentre of the level-3 corresponding to p×1 fixed-effects covariates ijk x .We assume that ijk y is related to in ijk x a nonlinear regression form as where φ(X ijk ) is a nonlinear feature mapping function, b is the bias term, ijk z is q×1random effect covariate vector with the random effect parameter Ꮩ ij for level-2 from , is a random effect parameter for level-3 from and nij×1 error vector .
For known ∑ and Rij the optimisation problem of the nonlinear three-level mixed effects LS-SVR can be define as subject to equality constraints Here are tuning or regularization parameters and is the (k,l)th element of the inverse matrix of Rij, .The Lagrange function can be written as where α ijk are Lagrange multipliers.The conditions for optimality are given by After eliminating , , ω , ijk e , we can obtain the optimal values of α ijk and b by solving the following linear system equation as where K is the kernel function in the form of, , 0 and 1 are the (Nt×1) vector of zeros and ones respectively, where i G is a square matrix of one by ni dimension, is the Nt×Nq block diagonal matrix with , where and , where and .The optimal regression function for the given (x0,z0) is as follows The solution of the linear system equation depends on the S and R ij .For this problem, we use an iterative procedure presented in [23] for the estimation of (α,b) and for given regularization parameters as follows: 2. Calculate based on equation (13) using the Ŝ and ˆij R obtained in the previous step.

Iterate steps until
where k is the kth iteration and ε denotes the tolerance level.We consider ε=0.1 in this paper.

Regularization parameters
The functional structure of the MLS-SVR is characterised by tuning parameters, the regularization parameters and parameters of the kernel function.To obtain optimal values of these tuning parameters, we use the GCV function.The inverse of the leftmost matrix in ( 13) can be partitioned into follow submatrices as where 11  S is a scalar, 12   S is a (1×Nt) vector, 21 S is a (Nt×1) vector and 22  S is a (Nt×Nt) matrix.Then, y ˆ can be expressed as y Sy = ˆ, where The GCV function can be obtained by applying the leaveone-out method and the first order Taylor expansion as follows where is a set of tuning parameters.The optimal values of the tuning parameters are those that minimise the GCV function.

Human Brucellosis Rate Example
The Brucellosis data contains monthly frequency of subjects diagnosed with human Brucellosis.The data was related to cities of five provinces in the west of Iran (Hamadan, Kordestan, Ilam, Lorestan and Kermanshah) in 2016-2017.We transformed the monthly frequency of Brucellosis (a count variable) to the logarithm of the rate (a continuous response variable) as follows The constant 0.04 was added to brucellosis count in equation ( 15) for obtaining the non-zero rates.The boxplot of the logarithm of brucellosis rate in each month was plotted in figure 1.
The ratio of rural to urban population, livestock (sheep, cattle, and goat) population (frequency), area of croplands, forest, and grassland (hectare) and some climatic variables such as monthly average of temperature, monthly sum of sunshine hours, rainfall, humidity and wind speed were considered as covariates.These covariates had extracted from Statistical Yearbook of Iran in 2016-2017 and Iran Meteorological Organisation website [29,30].
There were five provinces (level-3), 54 cities nested in these provinces (level-2) and 12 observations in each city (level-1).So, we had 648 observations in total.We used the first 10 observations of each city as the training and the last two observations as the testing data sets.So, the sizes of the training and testing data sets were 540 and 108 for this example respectively.We fitted MLS-SVR with random intercept and trend (MLS-SVR.it),MLS-SVR with random intercept (MLS-SVR.i),ordinary LS-SVR, LMM with random intercept and trend (LMM.it),LMM with random Three-level SVR Method intercept (LMM.i) on training data set.The testing data set was used to assess the generalisation performance of each model.Mean square error (MSE) and mean absolute error (MAE) were considered for comparison of fitted models generalisation capabilities.

Simulation study1
We performed a Monte Carlo simulation study to investigate the performance of different methods in prediction of observations.Let ijk y be the response of k th subject within j thcentre of the level-2 within ith centre of the level-3 related to covariates ijk The sample size of level-1 was 12 (number of the month) for each centreof level-2 in all scenarios.The distribution of the random effect was considered as normal with zero mean and standard deviation equal1.2for level-3.The random effect of level-2 simulated from where .We considered two levels for ρ as (-0.15, 0.7).Also the standard deviation of the error variable with standard normal distribution was considered in two levels (1.3, 0.3).There were 16 settings in this simulation study.In all conditions, we used the first ten observations of each centre of the second level as training data set and the last two observations were placed in the testing data set.We fitted the MLS-SVR.it,M LS-SVR.i,ordinary LS-SVR, LMM.it, and LMM.i on the training data and estimated the parameters of each model.In the MLS-SVR and LS-SVR methods, the radial basis function (RBF) was used as the kernel function.The fitting and generalisation performance of each model were evaluated using two measures of MSE and MAE in training and testing data sets, respectively.This procedure was repeated 1000 times in each scenario.

Simulation Study2
Another simulation study was performed to assess the performance of the proposed MLS-SVR in the presence of a correlation structure in the error variables.We compared our proposed model with the LS-SVR and the LMM.The data generated from Here the Beta, τijk, b,γ0i, and the number of centres in each level are the same as simulation example 1.Also , simulated from, and with for k,l=1,2,…12.Also, were the same as those of simulation study 1.In this simulation study, the number of observations in each centre of the level-2 was 12 which the first ten observations were used as the training data and the last two observations were considered as the testing data.We fitted MLS-SVR with random intercept using an autoregressive structure for the errors (MLS-SVR.i.cor), MLS-SVR with random intercept (MLS-SVR.i),ordinary LS-SVR, LMM with random intercept and autoregressive structure for the errors (LMM.i.cor), LMM with random intercept (LMM.i) on the training data set.Then we investigated the generalisation performance of each model using the testing data set.Like the simulation study 1, we used the RBF as the kernel function in the LS-SVR methods.Each scenario of this simulation study was repeated 1000 times.After dividing the generated data sets into training and testing sets, we had 1000 training data set and 1000 testing data set for each scenario.

Computational Support
We used the R software version 3.5.1 [31].The package "nlme" [32] was used for fitting LMMs.Also some packages such as "kernlab" [33], "Matrix" [34], and "MASS" [35] were used for coding ordinary LS-SVR and proposed MLS-SVR methods in simulation studies and real data example.The R codes of MLS-SVR are available upon the request from the first author.

Human Brucellosis rate example results
There were five provinces which Kermanshah had more cities than the others (14 cities).Lorestan and Ilam had the most and the least rate of brucellosis (per 100,000) with 80.7 and 18.8, respectively.The most livestock and lands belonged to Lorestan province with 1,899,697 people and 2,829,591 hectare, respectively.The descriptive statistics of the used variables in the human brucellosis data were shown in Table 1.
The results of fitting various models (MLS-SVR.it,MLS-SVR.i,LS-SVR, LMM.it, and LMM.i) in human Brucellosis rate example were presented in Table 2.According to the Table 2, the ordinary LS-SVR had smallest MSE (0.97) and MAE (0.72) in training data, but the generalisation performance of the proposed MLS-SVR.itmodel was better than the others based on both MSE (1.59) and MAE(0,98) in testing set.The MSE and MAE criteria for MLS-SVR.it and MLS-SVR.i were smaller than their classical counterparts (LMM.it and LMM.i) for both training and testing sets.Also, the generalisation performances of the three LS-SVR methods in Table 2 were better than the LMMs.Furthermore, by fitting the ordinary LS-SVR and ignoring the random effect terms, the generalisation performance criteria got worse in testing set.Also, the fitting and generalisation performances of the models with random intercept and trend (MLS-SVR.itand LMM.it) were better than those of the methods which only considered the random intercept term (MLS-SVR.iand LMM.i).
We plotted the observed values of training and testing sets versus the prediction values obtained by the MLS-SVR.it and LMM.it in Figure 2. Based on Figure 2, the MLS-SVR. it had a better performance than the LMM.it in prediction of response variable over both training and testing sets (the points were closer to the bisector line).

Simulation study 1 results
In this simulation study, both random intercept and trend effects were considered in data generation process.The results of the simulation study 1 were given in Table 3, 4 (MSE and MAE along with their standard errors (SE) over 1000 replications) for the training and testing data, respectively.These tables describe the performance of the proposed MLS-SVR.italong with that of the MLS-SVR.i,LS-SVR, LMM.it, and LMM.i.According to the results, the proposed method outperformed other models in terms of the used criteria.In this regard, the MLS-SVR.ityielded the smallest mean of MSEs and MAEs in both fitting and generalisation performance.The results of Table 3 and Table 4 showed that the fitting and generalisation performance of the MLS-SVR.it and MLS-SVR.i were better than their LMM counterparts (LMM.it and LMM.i) based on MSE and MAE criteria for all scenarios.Also, ignoring the random effect terms had a great influence on the prediction performance over both training and testing sets (see the columns for LS-SVR method in Table 3 and Table 4).Moreover, the prediction performance of the models which considered both random intercept and random trend (MLS-SVR.it,LMM.it) were better than those which only considered the random intercept (MLS-SVR.i,LMM.i) (Table 3 and Table 4).

Simulation study 2 results
We considered the random intercept effect and an autoregressive correlation structure for the error term in the data generating process of this simulation study.Table 5 shows the results of the simulation study 2 (MSE and MAE along with their standard errors (SE) over 1000 replications) for the training and testing data sets, respectively.The Epidemiology Biostatistics and Public Health -2019, Volume 16, Number 3 Three-level SVR Method   Three-level SVR Method prediction performance of various models (MLS-SVR.i.cor, MLS-SVR.i,LS-SVR, LMM.i.cor, and LMM.i) had been provided in this table.According to the MES and MAE measures reported in Table 5, the proposed MLS-SVR models had the best fitting performance among the fitted models.Also, the generalisation performance of the proposed MLS-SVR.i.cor was better than that of the other methods based on criteria shown in Table 5.The prediction performances of the MLS-SVR.i.cor and MLS.SVR.i were better than their LMM counterparts based on the MSE and MAE in both training and testing data sets.Furthermore, the MSE and MAE measures increased after eliminating the random effect term and the autoregressive structure of the error terms (see the columns for LS-SVR method in table 5).Although, the fitting performance of the LMM.i.cor was better than the LMM.i in training set, but its generalisation performance was worse over testing set.Nevertheless, both fitting and generalisation performances of the MLS-SVR.i.cor were better compared to the MLS-SVR.i.

Computational time
Using a computer with CPU: corei3 and 2.13 GHz, RAM: 4.00 GB, for known regularization parameters of machine learning techniques, the computation times of fitting different methods in Brucellosis rate data were 1.87, 1.20, 0.39, 2.53, and 0.26 seconds for the MLS-SVR.it,MLS-SVR.i,LS-SVR, LMM.it, and LMM.i, respectively.The computational time of the machine learning methods were different when the regularization parameters were unknown.The GCV function was used in this case which changed the computation time of the MLS-SVR.it,MLS-SVR.i, and LS-SVR to 88.00, 41.22, and 37.14 seconds, respectively.

DISCUSSION
A novel multi-level LS-SVR model was proposed in this paper for three-level longitudinal data.The proposed model used multiple random-effect terms to account for the correlated structures of the levels in the longitudinal data.Our proposed model considered both random intercept and random trends.Also, a correlation structure can be considered in the proposed MLS-SVR model for the error terms.Like the LS-SVR, the proposed MLS-SVR is computationally efficient due to a need to solve a linear system and it is useful for modelling the highly unbalanced data sets (non-fixed time points and unequal number of observations) with (non-)linear complex relationships between covariates and response.In the real data example related to the human Brucellosis rate, the performance of the proposed MLS-SVR.it and ordinary LS-SVR were the best in terms of generalisation and fitting, respectively.Moreover, the MSE and MAE of the MLS-SVR.itwere smaller than those of the LMM.it.The performances of the proposed models (MLS-SVR.itand MLS-SVR.i) were better compared with the LMMs in both training and testing data sets of this example.There might be nonlinear relationships between covariates and response which the classical approaches (LMM.it and LMM.i) could not capture them well as machine learning techniques (MLS-SVR.itand MLS-SVR.i).Also, the generalisation performance decreased after eliminating the random effect terms and fitting the ordinary LS-SVR showing the influence of random effect terms in prediction performance.Both MLS-SVR.it and LMM.it, which considered the random intercept and trend, had a better performance in prediction than the MLS-SVR.i and LMM.i, that considered only the random intercept, in training and testing data sets.It shows the importance of taking into account the random trend term which should be considered in the models for a better predictions.
Based on the simulation study 1, the generalisation and fitting performance of the proposed MLS-SVR models were the best among the fitted methods in all scenarios.According to the results of the first simulation study, the MSE and MAE measures of the MLS-SVR.it and MLS-SVR.imodels were smaller than their LMM counterparts in both training and testing sets.The reason of the better performance of the MLS-SVR method might be due to the created nonlinear relationships (using the sine function in data generation process) between the features and outcome.Also, it seems that the random intercept and random trend effects have important roles in prediction performance in the first simulation study and must be considered in the modelling.
The proposed MLS-SVR models had the best performance in fitting and generalisation among the fitted models in the simulation study 2.The best generalisation performance belonged to the MLS-SVR.i.cor in all scenarios.Also, the prediction performances of the MLS-SVR.i.cor and MLS-SVR.i were better than their LMM counterparts in training and testing sets.Like the first simulation study, the created nonlinear relationships between covariates and outcome might be the reason of this better performance.Therefore, in presence of nonlinear or complex relationships between covariates and response, the proposed MLS-SVR method could be a better choice for prediction problems.The random intercept term and autoregressive structure of the error terms were influential in prediction performance and ignoring them (fitting ordinary LS-SVR) increased the MSE and MAE of the models.Therefore, they should be considered in the modelling process.Moreover, although the performance of the LMM.it was better than that of the LMM.i in the training set, but the LMM.i had a better prediction performance than LMM.it in the testing set.Also, by increasing the sample size, the MSE and MAE measures of the MLS-SVR.i.cor, MLS-SVR.i,LMM.i.cor, and LMM.i tended to be decreased.
Several studies have been done which used the LS-SVR technique in longitudinal or clustered data.Seok et al. proposed a mixed-effects LS-SVR and used it for pharmacokinetic (PK) and pharmacodynamic (PD) longitudinal data sets.The performance of their proposed LS-SVR method was better than that of the standard method for the analysis of population PK and PD data in both training and testing data [22].In another study, the prediction performance of the proposed mixed-effects LS-SVR was better compared to the LMM in the two real data examples and two simulation studies for both training and testing sets [23].The LS-SVR methods in the above two studies were proposed for two-level longitudinal data.Moqaddsi et al. extended the LS-SVR technique and proposed three-level mixed-effects LS-SVR for count data.The prediction performance of their proposed model was better compared to the ordinary LS-SVR, LMM, (zeroinflated) poisson mixed-effects model, and (zero-inflated) negative binomial mixed-effects model in a simulation study.Also, they used their proposed model in a real data example which outperformed other techniques [28].
We used the RBF in MLS-SSVR and LS-SVR models as the kernel function.Other kernels like polynomial can be used in the proposed model.Although the numerical studies were performed on the balanced data sets, the proposed MLS-SVR can be used for the very unbalanced data sets with high-dimensionality.An autoregressive structure for the error term was used in simulation study1.Other correlation structures for error terms can be considered in the proposed MLS-SVR method.

CONCLUSIONS
It seems that, in presence of unspecific nonlinear relationships between covariates and outcome, the proposed MLS-SVR model can be more useful.Also, the    prediction performance decreases by ignoring the random effect terms in longitudinal or clustered data sets.So the multilevel structure must be considered in the analysis of these types of data.

1 .
Start with the initial values ˆq I

3 .
Calculate the estimates Ŝ and ˆij R using the obtained in the previous step as follows: and where and with .

FIGURE 1 .
FIGURE 1. Box-plot of the Brucellosis rate for each month in logarithmic scale (red points are the mean of the logarithm of brucellosis rate)

FIGURE 2 .
FIGURE 2. Comparison of MLS-SVR.it and LMM.it methods in prediction of the training and testing sets of brucellosis rate example.Top panel (a) presents observed versus prediction values of the training set.Bottom panel (b) presents observed versus prediction values of the testing set.

TABLE 1 .
Description of the Brucellosis data variables in each provience (Mean, Min and Max are based on the monthly observations in all cities of each province) *standard deviation; **minimum; ***maximum; # celcius scale; ## millimetre; ### metre per second

TABLE 2 .
Performance comparison of the proposed MLS-SVRs, ordinary LS-SVR, and LMMs for Brucellosis rate example *mean squares error; **mean absolute error

TABLE 3 .
Fitting performance comparison of the proposed MLS-SVRs, ordinary LS-SVR and LMMs on training data set for simulation study 1 (standard error in parenthesis)

TABLE 4 .
Generalization performance comparison of the proposed MLS-SVRs, ordinary LS-SVR and LMMs on testing data set for simulation study 1 (standard error in parenthesis)

TABLE 5 .
Prediction performance comparison of the proposed MLS-SVRs, ordinary LS-SVR, and LMMs for simulation study 2 on training and testing data sets (standard error in parenthesis) #meaan squares error; ## mean absolute error