Trend in BMI z-score among Private Schools’ Students in Delhi using Multiple Imputation for Growth Curve Model

Objective: The aim of the study is to assess the trend in mean BMI z-score among private schools’ students from their anthropometric records when there were missing values in the outcome. Methodology: The anthropometric measurements of student from class 1 to 12 were taken from the records of two private schools in Delhi, India from 2005 to 2010. These records comprise an unbalanced longitudinal data: not all the students had measurements recorded at each year. The trend in mean BMI z-score was estimated through growth curve model. Prior to that, missing values of BMI z-score were imputed through multiple imputation using the same model. A complete case analysis was also performed after excluding missing values to compare the results with those obtained from analysis of multiply imputed data. Results: The mean BMI z-score among school student significantly decreased over time in imputed data (β= -0.2030, se=0.0889, p=0.0232) after adjusting age, gender, class and school. Complete case analysis also shows a decrease in mean BMI z-score though it was not statistically significant (β= -0.2861, se=0.0987, p=0.065). Conclusions: The estimates obtained from multiple imputation analysis were better than those of complete data after excluding missing values in terms of lower standard errors. We showed that anthropometric measurements from schools records can be used to monitor the weight status of children and adolescents and multiple imputation using growth curve model can be useful while analysing such data.


INTRODUCTION
India is facing a nutritional transition phase where, along with the existing problem of undernourished population, the proportion of overweight people is increasing rapidly [1,2].
Children and adolescents in India are also a part of this double burden of malnutrition [3,4].While undernourished children and adolescents are more likely to have lower resistance to infections and a higher risk of morbidity and mortality [5], childhood obesity is associated with increased risk of non-communicable diseases like cardiovascular disease (CVD) and diabetes in adulthood [6,7].It is, therefore, important to monitor the BMI and weight status of children and adolescents at population level in order to make effective programs to handle this double burden among them [8].Among children and adolescents, BMI z-score and BMI percentile are used as relative measures of BMI adjusted for age and sex using some external standard reference [9] to classify weight status [10].It is suggested that BMI z-score is the optimal measure of thinness and adiposity among children [8,[11][12][13].Estimates of the trend in mean BMI or the trend in prevalence of thinness and obesity among children and adolescents are available in literature [14][15][16][17][18][19][20] but only few studies have tried to estimate the trend in mean BMI z-score among them [21,22].
In the present study, we used the anthropometric data recorded in schools' registers to estimate the trend in mean BMI z-score among students of age 5 to 20 years old from private schools in Delhi over a period of five years.The numbers of students were not equal at all the time points and that created an unbalanced longitudinal data.Also, for some students the data was not available at each year and it was considered as missing data.A two level growth curve model [23,24] in the form of linear mixed effects (LME) regression model [25] was used to estimate the trend as general linear models, like Analysis of Variance or Analysis of Covariance, are not appropriate for unbalanced longitudinal data [26].The missing data were handled by using multiple imputation (MI) technique [27] for data missing at random (MAR).We used a multiple imputation technique with LME regression model for multivariate longitudinal data provided by Schafer and Yucel [28,29].Despite recent developments in this area [30] studies have shown that MI did not add much when the data is analysed with LME regression models [31,32].We also compared the results obtained from multiple imputation analysis with those obtained from complete case analysis after excluding the missing values using the same regression model.The structure of the rest of the paper is as follows.
In the next section we provided the detail of BMI data and the structure of missing BMI z-scores and statistical analysis used to estimate the trend in mean BMI z-score.In this section, we tested the MAR assumption of missing BMI z-score values.In section 3 we presented the results, followed by discussion in section 4.

Body Mass Index Data
We collected anthropometric measurements of 4129 students, recorded in school registers of two private schools in Delhi, India, from 2005 to 2010.The consent was taken from schools authority to extract the data from their registers.These schools cater for affluent socio-economic group [33].The data consist of height, weight, class, gender and date of birth of students.We first calculated the BMI values for students and then BMI z-score was estimated using WHO criteria based on age and gender for children and adolescents of age group 5-20 years [34].Since the WHO criteria for BMI z-score among children and adolescents is based on age and gender, only those students were considered for the analysis for whom the date of birth was available and who were above five years old.The final number of students considered for the analysis is 3233 as 896 students were either younger than five years old or their date of birth was not available.For some of the students, data were not available in all the five years, thus the number of repeated measurements for each student was not equal during the study period and that generated an unbalanced longitudinal data.During the study period, these 3233 students formed 12614 data points for the analysis, out of which BMI z-scores were missing at 2452 (19.4%) occasions.If the data for a student was available at least in one year, he or she was considered for the whole study period, except lower than class one and upper than class twelve.The data of a student was considered as missing for the years in which it was not available.The missing values of age and class were filled by adding (or subtracting) one to (from) the values available at any of the years.For missing values of BMI z-scores, we use multiple imputation technique.Table 1 shows the year and class wise structure of data.

Testing the Assumption of Missing at Random for Missing BMI z-scores
We ran an analysis to test the missing at random assumption of missing BMI z-scores before performing the multiple imputation.A mixed effect logistic regression model was applied for the association of missing values with time and other characteristics of students.Missing indicator (1 if BMI z-score was missing; 0 otherwise) was considered as dependent variable and age, gender, class, school and time were taken as predictors.Interaction terms between year and age, year and gender, year and class, year and school, and gender and class were also included in the model.Students' identification number (ID) within the years was used as random effect in the model.Results show that age, class, school and interaction terms year x age, year x class were significantly associated with missing values of BMI z-score (Table 2).

Growth Curve Model
The linear growth curve model used to estimate the trend in BMI z-score of students is given as follows, Multiple imputation for BMI z-score where Y i represents the BMI values for i th student for five years, β is the vector of regression coefficients associated with students' characteristics X i , u i is the random effect associated with covariates Z i which can be treated as a subgroup of X i and e i is the residual component.Simplest growth curve model is when both X i and Z i are time points of measurements.

Multiple Imputation of missing BMI z-scores
Missing values in BMI z-scores were imputed through multiple imputation technique using growth curve model given by Schafer and Yucel [28].To impute missing BMI z-scores, all the significant predictors given in Table 2 were included in a growth curve model with BMI z-score as dependent variable.The interaction term between gender and class was also included in the model despite is was not significantly associated with missing values, considering its narrow confidence interval which is very close to one.To test any nonlinear association between age and BMI z-score, a growth curve analysis on complete data, after excluding missing observation, was performed with BMI z-score as dependent variable and age 2 along with time as independent variables.We found that age 2 was significantly associated with BMI z-score (result not shown), therefore it was also included in the imputation model.The final model used for multiple imputation of missing BMI z-score values is given as, The estimates and standard errors generated from the analysis of ten datasets were then combined using Rubin's rule.The same model 2.3 was applied on the data after excluding the missing BMI z-score values as complete case analysis and the estimates obtained from it were compared with those obtained from multiple imputation analysis.Statistical software R [35] was used for multiple imputation and also for the analysis of imputed data.Package "mice" with option "2l.norm" was applied and that imputes univariate missing data using a two-level normal model.It implements the Gibbs sampler technique for the linear multilevel model with heterogeneous with-class variance [36,37].

RESULTS
Table 3 shows the results of multiple imputation and complete case analysis.In both the analyses mean BMI z-score was found to be inversely associated with year, though this association was statistically significant in multiple imputation (β= -0.2030, se=0.0889,p=0.0232) but not in complete case analysis (β=-0.2861,se=0.0987,p=0.065).In multiple imputation analysis, mean BMI z-score significantly increased by 0.1641 unit with increase in each class (se=0.0485,p=0.0008) and the difference in mean BMI z-score between the two schools was also significant (β=0.2197,se=0.0778,p=0.0051).The associations mean BMI z-score with class (β=0.3871,se=0.0613,p<0.0001) and with school (β=0.6634,se=0.0981,p<0.0001) were significant in complete case analysis also.However, the standard errors obtained for these associations in complete case analysis were more than that of multiple imputation analysis.Age squared was significantly associated with mean BMI z-score in both the analyses.In complete case analysis, age, class and school were significantly interacted in the association between mean BMI z-score and year while in multiple imputation none of the variables included in the model was significantly interacted in the association between mean BMI z-score and year.
We performed cross validation [38,39] using random sub sample method to compare the predictive accuracy of estimates, in which the data is randomly divided in to two parts with 70% (training data) and 30% (test data) students, respectively.Both the analyses were applied on training data and the estimates obtained were fitted to the test data separately to get the predicted values of BMI z-score.The root mean squared error (rmse = ) was then calculated as the measure of predictive accuracy.Since, the missing values of BMI z-score in test data were not available to estimate rmse, it was calculated from the available data only.The rmse for the estimates from multiple imputation was 1.356 and the rmse for the estimates from complete case analysis was 1.362.
Figure 1 depicts year wise density plots of Students' BMI z-scores from complete cases and also from ten datasets obtained from multiple imputations.In each graph, the peak was highest for first year that is 2005-06 and lowest for fifth year that is 2009-10 which shows that the mean BMI z-score was highest in 2005-06 and lowest for 2009-10.

DISCUSSION
The novelty of the study is that we used growth curve model for the multiple imputation of missing BMI z-scores of students and showed that anthropometric records from school registers can be used to assess the trend in BMI or weight status in Indian context.We compared the results with those obtained from complete case analysis after excluding missing values using the same growth curve model.The results were better in terms of lower standard errors when the analysis was performed on multiply imputed data than when it was applied on complete case analysis which may be due to the larger sample size in multiple imputation.Also, the density plot shows that the distribution of BMI z-scores at each year in each of the imputed data was similar to those in complete data when missing BMI z-scores were excluded (Figure 1).However, the associations between some predictors and outcome variable are not similar in both the analyses.The estimates of the associations of BMI z-score with age and interaction terms, year x age, year x class and year x school were significant in complete case analysis while in multiple imputation the statistical significance of these variables disappeared.Though class and school retained their significance, their p values reduced.This could be due to the significant association of age, school and class with missing values.In multiple imputation the sample size increased with less variability within these variables.Association between year and BMI z-score became statistically significant in multiple imputation, though it was marginally significant in complete case analysis.The cross validation analysis shows that the rmse for predicted values of BMI z-scores from multiple imputation was slightly less than that from complete case analysis.Thus, the predictive effect of both the analyses was comparable.
The growth curve model used in this paper is a two level linear mixed effect model.We showed that multiple imputation works well when the data is further analysed through linear mixed effect model in the situation where missing pattern is assumed to be missing at random and all the covariates are complete.This is in contrast with Peters et al. [31] and Twisk et al. [32] where it was shown that multiple imputation of missing data is not important when the analysis is to be done with LME regression model.However, there are studies where multiple imputation was successfully implemented with LME models in different datasets [40][41][42], but unlike our study, in these studies the imputation models were different from LME model.
Studies that assessed trend in BMI z-scores among children and adolescents have shown mixed results in terms of changes in mean BMI z-score over time.Some studies reported an increase in mean BMI z-score [20,43] while others reported a decrease in mean BMI z-score [21,44].The present study shows a decrease in mean BMI z-score over the years among children and youth belong to affluent socio-economic group.In India, there is no nationally representative data available on BMI or weight status of children and adolescents.A national level data on trend in prevalence of thinness and obesity and trend in BMI z-score among children and adolescents may be helpful in making effective policies and designing better program to prevent them from the double burden of malnutrition.Though, Central Board of Secondary Education (CBSE) and some other State education boards mandate schools to assess height and weight of students, but the records are not collated to generate some concrete information.The present research work showed that the students' anthropometric measurements already recorded in schools can be used to estimate the trend in BMI z-score.
Since the two private schools were not chosen randomly, the results may not be generalised to all the children and adolescents belong to higher socio-economic group in India.Though, the estimates were adjusted for socio-demographic variables like age, gender, class and school, availability of more covariates like their other health indicators and family background may be helpful in imputing the missing values and estimating the trend in BMI z-score more effectively.

FIGURE 1 .
FIGURE 1. Density plots of BMI z-scores of students from complete data and from ten multiply imputed data.
1 year i +β 2 age i +β 3 class i +β 4 school i +β 5 age i 2 + β 6 year_age i +β 7 year_class i +β 8 gen_class i +u i year i +e i 2.2 where, year_age, year_class, and gen_class are the interaction terms between year and age, year and class and gender and class, respectively.Year is used as random effect that caters for the time of repeated measurements associated with students.Overall, ten datasets with imputed values of missing BMI z-scores were created using model 2.2.To estimate the trend in BMI z-score over the years model 2.3, similar to model 2.2, was applied on each imputed data.BMI i =β 0 +β 1 year i +β 2 age i +β 3 sex i +β 4 class i +β 5 school i + β 6 age 2 i +β 7 year_age i +β 8 year_class i +β 9 year_gen i + β 10 year_school i +β 11 gen_classu i +u i year i +e i 2.3

TABLE 1 .
Year and class wise data structure and mean values of age, BMI and BMI z-score.

TABLE 3 .
Association of BMI z-score with year, age, class, school and age from multiple imputation and complete case analysis.
a Estimates are the beta coefficients obtained from the growth curve model.SE is the standard error corresponding to the estimates.