Choosing statistical models to assess biological interaction as a departure from additivity of effects

Vanderweele and Knol define biological interaction as an instance wherein"two exposures physically interact to bring about the outcome."A hallmark of biological interaction is that the total effect, produced when factors act together, differs from the sum of effects when the factors operate independently. Epidemiologists construct statistical models to assess biological interaction. The form of the statistical model determines whether it is suited to detecting departures from additivity of effects or for detecting departures from multiplicativity of effects. A consensus exists that biological interaction should be assessed as a departure from additivity of effects. This paper compares three statistical models' assessment of a data example that appears in several epidemiology textbooks to illustrate biological interaction in a binomial outcome. A linear binomial model quantifies departure from additivity in the data example in terms of differences in probabilities. It generates directly interpretable estimates and 95% confidence intervals for parameters including the interaction contrast (IC). Log binomial and logistic regression models detect no departure from multiplicativity in the data example. However, their estimates contribute to calculation of a"Relative Excess Risk Due to Interaction"(RERI), a measure of departure from additivity on a relative risk scale. The linear binomial model directly produces interpretable assessments of departures from additivity of effects and deserves wider use in research and in the teaching of epidemiology. Strategies exist to address the model's limitations.


INTRODUCTION Biological interaction and statistical interaction
Hypotheses related to biological interaction are often of interest in studies of clinical or population health.Vanderweele and Knol [1] define biological interaction as an instance in which "two exposures physically interact to bring about the outcome."Rothman [2] states that "biologic interaction between two causes occurs whenever the effect of one is dependent on the presence of the other." Investigators construct statistical models to detect interaction and effect modification.Rothman [2] points out that "in statistics, the term 'interaction' is used to refer to departure from the underlying form of a statistical model."A model's form can suit it for detecting departures from additivity of effects or for detecting departures from multiplicativity of effects.Because a statistical model's form affects the interpretation of statistical interaction, Rothman [2] prefers the term "effect measure modification" to interaction.
Rothman links "biological independence" with an additivity of effects and connects "biological interaction" with a departure from an additivity of effects."Why is it," Rothman asks, "that biological interaction should be evaluated as departures from additivity of effect" [2]?By 2007, the STROBE statement regarded the response to Rothman's rhetorical question to reflect a "consensus that the additive scale, which uses absolute risks, is more appropriate [than the multiplicative scale] for public health and clinical decision making" [3].The authors of the STROBE statement remind investigators that "in many circumstances, the absolute risk associated with an exposure is of greater interest than the relative risk" and ask them to "consider translating estimates of relative risk into absolute risk for a meaningful time period" [3].Vanderweele and Knol [1] remark, more pointedly, that "one reason why additive interaction is important to assess (rather than only relying on multiplicative interaction measures) is that it is the more relevant public health measure."

Additivity and multiplicativity of effects
This paper aligns with this consensus but avoids using the term "additive interaction."Instead, it links the concept to statistical models that assess evidence of a departure from additivity of effects.One such model, the "binomial model for the risk difference" [4], directly quantifies departures from additivity of effects in terms of differences in probabilities, including the interaction contrast (IC).This model is also called the "binomial regression model" [5,6].Richardson et al. [7], who employ it as a final step in a marginal structural model, call it the "linear binomial model," the term we will use.
In the linear binomial model, detection of statistical interaction constitutes direct evidence of a departure from additivity of effects.The log binomial and logistic regression models can also assess additivity indirectly, when their estimates of relative risks or odds ratios are recombined to calculate statistics like the "Relative Excess Risk due to Interaction" (RERI).
The paper also avoids using the term "multiplicative interaction" but links that concept to statistical models that assess evidence of departures from multiplicativity of effects.Log binomial models estimate effects in terms of relative risks, also called risk ratios, prevalence ratios [4,7] or prevalence proportion ratios.Logistic regression models estimate effects in terms of odds and odds ratios.In the log binomial and logistic models, which employ log transformations of probabilities or of their corresponding odds, detection of statistical interaction constitutes direct evidence of a departure from multiplicativity among effects.

METHODS Statistical models for binomial outcomes
The linear binomial, log binomial and logistic regression models are all examples of generalized linear models.Each treats the outcome as arising from a binomial distribution.Each features a linear predictor structured as a sum of terms.In this regard, all generalized linear models might be considered "additive."Accordingly, this paper does not refer to "additive or multiplicative models" but refers instead to statistical models that assess additivity or multiplicativity of effects.
All three models link a binomial outcome to a linear predictor.They are distinguished by the link functions they employ.The linear binomial model uses the identity link, the log binomial model uses the log link, and the logistic regression model uses the logit link.Thus, the linear binomial model operates directly on probabilities, while the others apply log transformations of the probabilities or of their corresponding odds.Because each model estimates a different effect measure, they differ in their ability to detect statistical interaction in a collection of data.
After reviewing the definition of additivity of effects, we compare the three statistical models using a widely cited example of biological interaction [8].The linear binomial model detects statistical interaction in these data.The log binomial and logistic regression models, which assess multiplicativity of relative risks or of odds ratios, find no evidence of statistical interaction.The absence of statistical interaction in these models does not point to an absence of biological interaction, but to a lack of departure from multiplicativity of effects.
We conclude by summarizing the three models' advantages and limitations for assessing additivity of effects.The RERI is commonly used in epidemiologic research to quantify departures from additivity despite complications in its estimation, testing and interpretation.In comparison, the linear binomial model produces readily interpretable estimates of effects, including the interaction contrast.

Defining additivity of effects
Consider a comparison of the probability or "risk" of an outcome Y among individuals who are exposed or not exposed to one or both of two "risk factors," X and Z.Then, pxz is a probability whose subscripts signify the probability or risk of the outcome Y at "levels" of X and Z (Table 1).
Rothman [2 (p.178)] states that the following equation "establishes additivity as the definition of biological independence." (Equation 1) According to Rothman's equation, two exposures (X and Z) are biologically independent, and their effects are additive, when the effect on of their joint and simultaneous effects (p 11 -p 00 ) is equal to the sum of the separate and independent effects of X (p 10 -p 00 ) and of Z (p 01 -p 00 ).A departure from additivity of effect, which Rothman considers evidence of biological interaction, is present when the exposures' joint and simultaneous effect differs from the sum of their separate effects.
Additivity can be defined equivalently as a homogeneity of effects.The terms of Equation 1 can be reordered to obtain (Equation 2) (Equation 3) Equation 2 states that the effect of X on Y is the same whether Z = 1 (p 11 -p 01 ) or Z = 0 (p 10 -p 00 ).Homogeneity of effects is reciprocal.Equation 3 states that the effect of Z on Y is the same at all levels of X, that is, whether X=1 (p 11 -p 10 ) or X=0 (p 01 -p 00 ).When the effects of X and Z are additive, the association between Y and X is homogenous at levels of Z, and the association between Y and Z is homogenous at levels of X.
Assessing additivity of effects using probabilities (the interaction contrast) or ratios (the RERI) Departures from an additivity of effects (or from biological independence), whether defined as an inequality between joint and independent effects, or as a heterogeneity among effects, can be formally assessed through the interaction contrast, whose terms are probabilities, and the RERI, whose terms are relative risks.
The terms in equation ( 1) can be ordered to produce the interaction contrast [9]: (Equation 4) Reordering the terms in Equation 4and dividing each by p 00 yields: Recognizing that these ratios of probabilities are relative risks (RR), we obtain: (Equation 5) Rothman [10] names the quantity on the left side of equation 5 the "Relative Excess Risk due to Interaction" (RERI).Rothman and Greenland [9] call it the "interaction contrast ratio" (ICR).Hosmer and Lemeshow [11] define it as "the proportion of disease among those with both exposures that is attributable to their interaction." The algebraic equivalence between equations 1 (for the IC) and 5 (for the RERI) validates the assessment of additivity of effects on either probability or relative risk scales.The IC and the RERI formally test the hypothesis that the effects on Y of X and Z are additive or, equivalently, that no interaction exists between X and Z.The STROBE statement [3 (p.825)] illustrates how to use the RERI to assess departures from additivity of effects.

Data example. Lung cancer mortality among workers with different exposures to asbestos and smoking
Hammond et al. [8] compared the risk of a dichotomous outcome, mortality from lung cancer, among 17,800 asbestos workers and among 73,763 workers who were not exposed to asbestos.They also recorded smoking status, so participants displayed combinations of exposure to cigarette smoking and to asbestos (Table 2).Hammond's study is widely used in epidemiology textbooks [2 (pp.168-180),12] to illustrate biological interaction.
Supplementary File 1 illustrates the creation of a dataset that closely approximates the properties of the published data.So that the dataset's risk probabilities (reported as lung cancer deaths per 100,000) reflect the published ones, we assumed a smoking prevalence of 0.28 for both the asbestos workers and for the comparison group of unexposed workers.
Table 1.Probabilities of an outcome (Y) at levels of two exposure or risk factors (X and Z) Z=1 ("exposed to factor Z") Z=0 ("not exposed to factor Z") X=1 ("exposed to factor X") p 11 p 10 X=0 ("not exposed to factor X") p 01 p 00 The data example illustrates a departure from additivity of effects If the effects of asbestos exposure and cigarette smoking are additive, the expected effect of experiencing both exposures would equal the sum of the exposures' separate effects (Equation 1).Following the notation introduced in Table 1 to define p xz , where X denotes cigarette smoking (1 = smokers and 0 = nonsmokers) and Z denotes asbestos exposure (1 = exposed and 0 = not exposed), the estimated risk probabilities are: excess deaths per 100,000 people, attributable to joint effects of both exposures.excess deaths per 100,000 attributable to smoking by itself.
excess deaths per 100,000 people, attributable to asbestos exposure by itself.
The number of lung cancer deaths attributable to dual exposure appears to exceed the sum of the exposures' separate effects.The interaction contrast for the data example: p 11 -p 10 -p 01 + p 00 indicates that the risk of lung cancer death in those who experience both exposures exceeds, by about 437.6 deaths per 100,000, the sum of the separate risks from smoking or from asbestos exposure.Calculated for the data example, the RERI, which quantifies additivity of effects on the relative risk scale, RR 11

The linear binomial model directly estimates the interaction contrast in the data example
The linear binomial model [4,7] estimates the interaction contrast directly in terms of probabilities and differences in probabilities: (Equation 6) Recalling that X and Z take values of 1 for "exposure" and 0 for "no exposure", then Substituting these expressions into Equation 1, which defines additivity of effects, In the linear binomial model, effects are additive if β 3 , the regression coefficient associated with the product or interaction term, is equal to zero.4illustrates that the model's estimate for β 3 directly estimates the interaction contrast: Thus, the linear binomial model's estimates for the interaction contrast and for the X*Z interaction are equivalent.Both provide direct tests of additivity; evidence against the hypothesis that β 3 =0 is evidence of a departure from additivity.

Substituting the expressions into Equation
Supplementary File 2 illustrates the construction of the linear binomial model using SAS PROC GENMOD [4,7].The model's point estimates for the number of deaths per 100,000 workers, which are presented in Table 3, are equal to those reported in Table 2. Table 3 also reports the model's estimates (and 95% CI) for regression coefficients.These coefficients include estimates for the effect on lung cancer mortality of smoking among those not exposed to asbestos (β 1 ), and of asbestos exposure in non-smokers (β 2 ).The linear binomial model produces identical inference for β 3 , which estimates the statistical interaction between smoking and asbestos exposure, and for the IC (estimate: 437.6 deaths per 100,000; 95% CI: 213.8, 661.3;P=0.00012702).
The consistency between the p values generated for these statistics verifies that they offer equivalent tests of the null hypothesis that the effects of smoking and asbestos exposure are additive.
Figure 1, which depicts the estimates and confidence intervals generated by the linear binomial model, illustrates the heterogeneity of the effects of smoking on lung cancer mortality in groups defined by asbestos exposure.The syntax that produced Table 3 and Figure 1 is contained in Supplementary File 3.

Figure 1. Biological interaction, between asbestos exposure and smoking, illustrated as a non-additivity or heterogeneity of effects
Log binomial and logistic regression models detect no departure from multiplicativity of effects in the data example In contrast to the linear binomial model, models that employ logarithmic transformations of probabilities (log binomial models) or their corresponding odds (logistic regression models) assess departures from multiplicativity of effects.Multiplicativity of effects is defined in a manner analogous to the definition of additivity of effects.The effects of two factors (X and Z) on an outcome (Y) are multiplicative if their joint effects are equal to the product of their separate and independent effects.When effects are multiplicative, relative risks will conform to the relationship: RR XZ = RR X ×RR Z , and odds ratios will conform to the relationship: OR XZ = OR X ×OR Z .A log binomial model estimates and tests the multiplicativity of relative risks.

ˆˆˆ
These equalities hold only if β 3 , the regression coefficient associated with the product term XZ, is equal to zero.Similarly, the logistic regression model, ln[ P(Y=1) ⁄ P(Y=0) ] = β 0 +β 1 X+β 2 Z + β 3 XZ, assesses multiplicativity of effects expressed as odds or odds ratios.In either model, estimates or hypothesis tests that suggest that β 3 does not equal zero constitute evidence of a departure from multiplicativity of effects.Applied to the data example, the log binomial model finds no evidence of statistical interaction between smoking and asbestos exposure (P=0.9637);measured as relative risks, the factors' effects are multiplicative and homogenous.Similarly, a logistic regression model finds no statistical interaction between smoking and asbestos exposure (P=0.9581) to suggest a departure from multiplicativity of effects measured as odds ratios.Figures 2 and 3 depict the estimates generated by the log binomial and logistic regression models.The models' construction, using SAS PROC GENMOD, is detailed in Supplementary File 4 along with the syntax that produced Figures 2  and 3.The models' differences in detecting statistical interaction do not confound the question of whether the data exemplify biological interaction.Rather, they illustrate the importance of (1) identifying an effect measure (either a difference or a ratio between probabilities or risks) that reflects the hypothesized form of the interaction and then (2) constructing a statistical model that directly estimates that effect measure.

DISCUSSION
Choosing log binomial or logistic regression models that generate estimates of the RERI Neither the log binomial model nor the logistic regression model detects statistical interaction in the data example.The models' form suits them for detecting departures from multiplicativity of effects.Nevertheless, they are widely used in epidemiology to assess departures from additivity of effects through ratio measures like the RERI [3].
Although widely used, the RERI has disadvantages.Because it is constructed from ratios, the RERI is not interpretable as the number of excess deaths attributable to exposure to both smoking and asbestos.The RERI of 38.7, calculated for the data example, lacks the ease of interpretation of the linear binomial model's estimate of the IC of 437.6 excess deaths per 100,000 (Table 3.) A second disadvantage relates to difficulties in obtaining standard errors with which to construct confidence intervals for or to test hypotheses related to the RERI.An influential approach, introduced by Hosmer and Lemeshow [11], estimates the RERI using logistic regression and obtains standard errors for its estimates using the delta method.SAS syntax for the approach is provided by Andersson et al. [13] and by Richardson and Kaufman [14], who construct a "linear odds ratio model" using SAS PROC NLMIXED.As an alternative approach, Richardson and Kaufmann [14] recommend bootstrapping for obtaining confidence intervals.An empirical 95% confidence interval on the RERI, calculated for the data example from 500 bootstrap samples, is 15.9, 132.6.However, because the bounds for the RERI's confidence interval are ratios, they present the same challenges to interpretation as the estimate itself.

Choosing the linear binomial model that directly estimates the interaction contrast
Logistic regression is widely used in epidemiology to study binomial outcomes, even though its form is suited for detecting departures from multiplicativity of effects.A major reason for the model's popularity and durability is that its use of the logit link, which is the canonical link for a binomial response, affords desirable statistical properties.Among these is logistic -RR 01 -RR 10 + 1= [601.9/11.3]-[54.6/11.3]-[121.0/11.3]+1 = 38.7.

Figure 2 .
Figure 2. Predicted log probabilities illustrate a lack of departure from multiplicativity of effects in the log binomial model.

Figure 3 .
Figure 3. Predicted log odds illustrate a lack of departure from multiplicativity of effects in the logistic regression model.

Table 2 .
Lung cancer deaths (per 100,000 workers) among those with exposure to asbestos and/or cigarette smoking

Table 3 .
Absolute risks (and risk differences) for death from lung cancer (per 100,000 workers) for those with exposure to asbestos and/or cigarette smoking, estimated by linear binomial model