Co-Plot Method: A Research on Tobacco Control in the European Region

BACKGROUND: The aim of this study is to introduce the uncommonly used Co-Plot method which is called the multivariate graphical analysis and to apply this method to a data set including tobacco control in European region. METHODS: This study uses the data from the World Health Organization database according to Human Development Index of European countries. It takes into account variables such as smoking prevalence in young people and adults, the proportion of smoking-related deaths and domestic legislations cases pertaining to tobacco products and analyses the data using the Co-Plot method. RESULTS: Results of the study demonstrated that smoking prevalence and restrictions on advertising of tobacco products were highly negatively correlated. The proportion of deaths associated with smoking-related diseases increased parallel to the increase in the smoking prevalence in young people and adults. Norway, France and Finland have enforced legal limitations on direct and indirect advertising, and thus there has been a decline in smoking prevalence among young people and adults. In some countries, including Ireland, Italy and Serbia, the prevalence of smoking among the young has decreased due to the new or increased legal restrictions on the sale distribution of tobacco products. The governments in the Czech Republic, Kazakhstan, Estonia, Croatia, the Netherlands, Belgium, and Poland have placed restrictions on direct and indirect advertising. The distribution of other causes-related deaths and lung cancer-related deaths are high. CONCLUSION: The restrictions on tobacco products were tightened in time with the increased prevalence of smoking and proportion of smoking-related deaths. It can be said that the significant relationships identified in this study have even more pertinence in developed countries. Consequently, Co-Plot method has enabled deeper data interpretations of the relationships between the countries and the variables in this study.


INTRODUCTION
In research today, the methods that can simultaneously configure the structure between observations and variables, and can produce the inferences from this structure, are critically important. This is inevitable with the use of multivariate methods, if complex data sets are analysed. There are various multivariate analyses that includes principal component analysis, factor analysis, cluster analysis, and multidimensional scaling, all of which can be applied to the complex and large data sets. These methods have a final analysis characteristic and they also form a basis for future analyses. Due to the limitations of the methods which do not facilitate simultaneous analysis of observations and variables, the adaptation of multidimensional scaling called the Co-Plot method, has been developed to gather graphical analysis [1,2].
This method enables researcher to make richer interpretations since it has the feature of showing both observations and variables on the same graph. When studying on very large data sets, it is possible to determine outlier values easily on the graph and to exclude variables that have poor correlations with others. With such features, Co-Plot method is likely to be the choice analysis for future analyses. In researches with large data sets, it is important to use as a priori or final analysis of this method, given its large number of features. When examining the literature, this method has been implemented on social sciences and medicine, but there are limited studies about this method in these fields [1][2][3][4][5][6][7].
The aim of this study is to introduce the uncommonly used Co-Plot method which is called the multivariate graphical analysis, and to apply this method to a data set that includes tobacco control in European region. In last seven years when examining the reports of tobacco control and use, both WHO report 2013 on the global tobacco epidemic and the report 2008 of Public Health and Risk Assessment subcommittee of the European Commission were denoted that the smoking prevalence and the proportions on smoking-related deaths increased, therefore some countries have made legislative regulations for tobacco control. Thus, in this study, it will be investigated how the legal restrictions on tobacco consumption affected the smoking prevalence, and the proportions of smoking-related deaths and the efficiency of those limitations by using the Co-Plot method.

Co-Plot Method
It is beneficial to use the Co-Plot graphical display technique for visual inspection of data matrices like Y nxp . The sample units are demonstrated as n points, and the variables are demonstrated as p arrows relative to the same axis and origin [6]. While high positive correlation is shown by arrows which are situated in the same direction, high negative correlation is shown by arrows situated in the opposite direction (180°). If there is no statistical correlation, arrows are situated perpendicularly [1]. To locate similar observations closely on the map, the Co-Plot method depicts observations as rows of a matrix. Followings are the principles of the Co-Plot method.

Stages of building the Co-Plot map
Stage 1: The very first step starts with standardizing the multivariate data matrix. let y=(y ij ), i =1, . . . ,n; j =1, . . . , p, of n observations and p variables of a multivariate n×p data matrix. Y n×p is standardized into a new matrix, to assess the variables equally. The elements of matrix Z n×p are the deviations from column means (y j ) divided by their standard deviations (S j ), as follows [6]: Stage 2: To calculate the distances among observations, the variables in the data set are necessary. Between each pair of observations (rows of Z n×p ) a measure of dissimilarity S ik ≥ 0 is computed and from all the pairs of observations a symmetric n×n matrix is produced. Although the Co-Plot method usually uses city block metric; some other distance metrics could be used as well. In this study, the city-block distance was preferred, that is, the sum of the absolute deviations, as a measure of dissimilarity: S ik = ∑ p j = 1 | z ij -z kj |. With the help of these distances it was possible to visualize the observations on the graph like Multidimensional Scaling (MDS) algorithm [6,10]. Through the MDS method the matrix (S ik ) is mapped in this stage. Then, observations are shown as n points p i , i =1,…,n, in an Euclidean space [6]. This can be expressed as a relationship between the map distances and the corresponding dissimilarity metrics S ik , where we require that S ik < S lm if and only if d ik < d lm [6,10].
In this study, Guttman's Smallest Space Analysis (SSA) is selected as a particular form of nonmetric MDS. The analysis supplies a graphic demonstration of pairwise interrelationships within a set of objects [6]. The SSA is said to be an iterative mapping technique and the demand for additional iterations is guided by the coefficient of alienation Θ [10,11]. The coefficient of alienation Θ varies between 0 and 1, where µ is a coefficient of monotonicity [10]. While a perfect fit is represented by the value 0, the worst possible fit is shown by the value 1. Intermediate values of the coefficient shows intermediate degrees of goodness-of-fit [12]. The smaller it is, the better the mapping reflects the original dissimilarities, and values below 0.15 are regarded well. In order to evaluate Θ another metric μ which directly measures the correlation between the dissimilarity measures and the map distances is evaluated at first [10]: Hence, μ could retrieve the maximal value of 1. This functional form does not decrease much for mediocre values of μ, however when μ approaches 1, it decreases immediately. In short, it is only low for very good correlations. Should the third stage end up with a high coefficient of alienation, the data does not fit well into two dimensions, so another technique ought to be used. It should be noted that the 2D map embraces the clear-cut characteristics of the data if the coefficient of alienation is low [10]. In short, for a two-dimensional space, this stage enables 2n coordinates (y 1i ,y 2i ), i =1, . . . ,n, where each row z=(z i1 ,...,z ik ) is mapped into a point within the two-dimensional space (y 1i ,y 2i ) [6].

Stage 4:
Since the Co-Plot method has an orientation feature, variable vectors are attained in this stage. On the Euclidean space p arrows are drawn from what was discerned in the previous stage. Each variable j is shown by an arrow j, appearing from the centre of gravity of the n points. The direction of every arrow is determined, so the correlation between the real values of the variable j and their projections on the arrow is at the maximum. In other words, the observations with a high value in this variable should be situated in the part of the space to which the arrow points, while observations with a low value will be at the other side of the map. Also, the arrows which are related to high correlation variables will locate on the same direction, and vice versa. Thus, the cosines of angles between these arrows and the correlations between their associated variables are approximately proportional to each other [1,10,13,14].
These variable vectors have four beneficial features. At first, vectors for highly correlated variables point in the same direction whereas vectors for highly negatively correlated variables are oriented along the same axis but in opposing directions and vectors for variables that are not correlated are orthogonal to each other. Second, each vector arises from the centre of gravity, functioning as the origin. An average observation is located at or near the origin which has an average value in all variables. The next feature is that the length of each vector is proportional to the correlation (nominately the goodness of fit) between the original data for that variable and the projections of the observations onto the vector. Finally, the angle between the vectors (jth and kth) is a reflection of the correlation between the jth and kth variables because the cosine of the angle is proportional to this correlation [2].

Goodness-of-Fit Measures of the Co-Plot map
Two different measures were used to assess the quality of the graph generated by the Co-Plot technique. One of them was used for stage 3 and the other was for stage 4. For the determination of the quality of twodimensional map in stage 3, the coefficient of alienation is used whereas separate measures -one for each variable -were used in the 4th stage [10]. This measure is produced while calculating the correlation between the original data for each variable and the projection of each observation onto that vector in the Co-Plot [2]. The p maximal correlations measure the goodness of fit of the p regressions and there are magnitudes of these correlations. The correlations are beneficial when choosing whether to eliminate or add variables: Variables not fitting into the graphical display with low correlations, ought to be disregarded since they do not have a good relation with the essential properties of the data determined by the 2D mapping. If a variable was removed, a re-computation would be necessary because it has an important effect on the previous stages (SSA). It is not necessary to fit all the 2 p subsets of variables as in other methods using a general coefficient of goodness-of-fit because each variable's arrow is computed separately. If the variable's correlation is higher, the variable's arrow is a better representation of the common direction and order of the projections of the n points along the axis that it is on [10].
A correlation of 1 means that the vectors have a perfect fit with the original variable data. In general, as the number of (poor) variables decreases, the average correlation increases [2]. It is widely accepted by experience that values of the coefficient of alienation that are less than 0.15 and average correlations of 0.70 or greater supply maps which perfectly fit the data [12]. Usually, when there is an increase in the number of variables, the coefficient of alienation increases as well. A coefficient of alienation equal to 0.15 is not a perfect fit but it is a good fit. A value that is equivalent to a coefficient of monotonicity of 0.99 is considered to be fairly high [2].

APPLICATION
In last seven years when examining the use and control tobacco, both WHO report 2013 on the global tobacco epidemic and the report 2008 of Public Health and Risk Assessment subcommittee of the European Commission were denoted that tobacco is the single largest cause of avoidable death in 2008, killings approximately 6 million people and causing more than half a trillion dollars of economic damage each year. Moreover, of the countries studied, only Turkey imposed the legal restrictions on tobacco, has implemented all of the tobacco control measures, according to reports [8,9]. Therefore, the records from the WHO database 2008 were used in this study.
It was designed as retrospective and crosssectional study to show astonishing results. Firstly the 42 countries that constituted the sample group were selected according to the Human Development Index (HDI) criteria [15]. The most significant feature of the Co-Plot method is that it enables researcher to analyse according to a specific categorical variable. Thus, the elicited countries (at a high or medium level development category, based on the HDI criteria) were categorized under four different levels according to quantile measure (Table 1).
Then the data were obtained from the WHO (World Health Organization) database [16]. Variables included in the study are shown in Table 2. Scoring was done according to the administration of domestic legislations on tobacco products in the respective countries: Free (0), Voluntary agreement (1), Restriction (2), Ban (3). Regarding the data that could not be obtained from the WHO database on country basis, approximately 1% of the data set was missing. An average value of the related incomplete column was taken due to the features of data. Since it is a multivariate analysis, the Co-Plot method cannot be used when there are missing values ( Table 2). The Co-Plot method was performed using Visual Co-Plot version 5.6.

RESULTS
By the help of the Co-Plot method it was possible to obtain a graph that enabled us to assess the situation related to tobacco products and domestic legislations in the European countries. While the alienation coefficient was found to be 0.145, the average correlation coefficient was 0.771, according to the Co-Plot analysis. The Co-Plot graph obtained is said to be suitable according to the criteria of goodness of fit.
The Co-Plot graph is given in Figure 1. This graph indicates that nearby countries have similar features based on the related criteria ( Figure 1). These results were obtained from the records of 2008 year.
It also reveals that smoking prevalence is said to be high in Ukraine, Russia, Latvia, Belarus, Austria, Germany and Romania among young people and adults. The proportion of smoking related death including cardiovascular diseases and other cancer types is fairly high e 1 1 4 8 0 -4  On the other hand, the highest proportion of smoking-related death due to other causes and lung cancer is seen in Croatia, the Netherlands, Belgium, Poland and the United Kingdom. In the countries where the proportion of death attributed to smoking is high, some restrictions on smoking areas and smokefree public transport have been imposed. Similar restrictions have been administered in the countries where the distribution of other causes of death and lung cancer-related deaths are high. According to the Co-Plot graph, these countries are the Czech Republic, Kazakhstan, Estonia, Croatia, the Netherlands, Belgium and Poland.
In some countries including Ireland, Italy and Serbia, the prevalence of smoking among young people has decreased due to the new or increased legal restrictions on the sale distribution of tobacco products. The main reason is the negative correlation between legal restrictions on the sale distribution variable and smoking prevalence in young population.
There have been legal limitations imposed on direct and indirect advertising in Norway, France and Finland. Thanks to those limitations, there has been a decline in smoking prevalence among young people and adults . Lithuania, Macedonia and Montenegro are located very close to the gravity centre of the graph (34.10, 28.00). By looking at the graph, it could be inferred that these countries have average values according to the related criteria (namely, variables).
However, when Tajikistan, Kyrgyzstan, Uzbekistan, Georgia, Moldova and Armenia are analysed according to the same criteria, it would not be wrong to say that they have no relation with the variables. These countries sit within at the last category in the HDI classification and it is obvious that they are clustered differently from the other countries.
The lengths of variable vectors provide information about the variables' correlations. Examining the Co-Plot graph about the variables' vectors, it is apparent that the smokingrelated death variable (%) has the highest correlation. Furthermore, lung cancer-related deaths and other cancer type-related deaths have a high correlation in smoking-related deaths proportion. Also, smoking prevalence among young people and legal arrangements regarding smoke-free public transport has the lowest correlation in this data set ( Figure 2). When smoking prevalence (of young people and adults) is taken into account, it can be concluded that those variables lead to a decline in smoking prevalence if legal restrictions are set on about direct and indirect tobacco advertising .
There has been a negative correlation between "other cancer type-related death proportion" variable and "legal restrictions on direct and indirect advertising" variable. So, it can be claimed that proportion of other cancer type-related deaths (smokingrelated) will decrease when direct and indirect advertisements are legally restricted. Correlations for each of the variables in data set are given Table 3 (Table 3).
When the angle between the variable vectors showing smoking prevalence among the young and the legal restrictions on smoking areas in Figure 2 is examined, it can be said that they have a poor relation (very close to orthogonal). The same is valid for the variable vectors of smoking prevalence among young people and legal restrictions on smoke-free public transport (Figure 2).

DISCUSSION AND CONCLUSION
Public health, epidemiology and other medical studies should be examined together with large number of variables and individuals simultaneously or the selection of the most important features from large number of variables may be requested. In these studies, all of the regional, patient and disease features need to be examined simultaneously. Also, each of the individuals in the study may be prompted for additional information. This reveals their relationship to each other's features through univariate analysis will lead to loss of information. In such cases, a multivariate graphic method is a method that can respond more effectively.
With regard to the graph, exhaustive information about tobacco consumption and legal restrictions related to the subject has been gathered for each country of the European region. Obviously, the Co-Plot method enables us to make rich interpretations in such kinds of data sets thanks to four main features [2]. At first, Co-Plot enables researcher to analyse a data set that has a greater number of variables than the number of observations. This is mostly beneficial for exploratory analyses. Second, the researcher does not have to interpret the axis 'indirectly' as in a linear combination of variables, which is different from other multivariate methods including factor analysis or principal components analysis. By looking at the Co-Plot map, the configuration of observations could be interpreted directly by any one of the original variables. As for the third feature, unlike many other multivariate methods, the researcher can benefit from less data when variables with low correlations are eliminated to develop the goodness of fit of the Co-Plot map to the data set. For instance, when using principal components analysis with k variables, even if two principal components 'explain' a great deal of the variation in the data set, all original k variables are included in the analysis and there is no real reduction of dimensionality. Fourth and the last, Co-Plot rapidly identifies duplicated or nearly duplicated variables and observations or outliers, which is not a unique feature of Co-Plot but becomes very beneficial while checking a large multivariate data set.
On the contrary, this method has some limitations, the most important of which is that the Co-Plot method does not account for interaction terms between variables and does not provide a measure of the magnitude of effect or association among variables and observations. Moreover it cannot map observations for which there are missing data therefore, the missing data must be removed from the data set or the missing data problem should be eliminated by selecting appropriate imputation methods for the missing data. Also this method does not make it easy to work on a large number of observations and variables, as it gets harder for the graph to be visualised when the number of observations and variables are increased [2,17].
To conclude, we can say that this method is quite useful in health studies when we consider the advantages and limitations of it. Moreover, in terms of application, it has explained in detail how the legal restrictions on tobacco consumption have affected the smoking prevalence and the proportions of smoking-related deaths and the efficiency of those limitations in European countries with respect to the WHO database 2008. It has likewise indicated that nearby countries similar features, based on the relevant criteria in the Co-Plot graph. The significant relationships identified in this study are particulary apparent in developed countries. Consequently, richer interpretations can be made about the relationships between countries and the related variables by performing Co-Plot method.