Closed Testing procedure for multiplicity control. An application on oxidative stress parameters in Hashimoto’s thyroiditis

Background: Closed Testing procedures represent an effective solution to the need to make inferences on multiple aspects at the same time, controlling the Familywise Error Rate (FWER), that is the error rate of the hierarchical family. Closed Testing procedures have a high degree of adaptability to a wide range of experimental situations, both in parametric than in non-parametric ambit. Methods: The attention is focused on the Bonferroni-Holm method, frequently used to counteract the problem of multiple comparisons. The present paper aims to show an original application of the Closed Testing procedures for multiplicity control in medical research with reference to the oxidative stress; in particular the Min-P Bonferrroni-Holm method was applied to the p-value adjustment, related to three parameters (BAP, D-ROMS, AGEs) of oxidative stress in Hashimoto’s thyroidytis. Results: Comparisons between different patients are performed (cases vs controls, AbTg positive vs negative patients, AbTPO positive vs negative patients, normal vs high-normal TSH serum levels). Looking at the raw and the adjusted p-value, the Closed Testing procedure is slightly conservative in controlling type I error. Conclusion: Closed Testing procedure checks the multiplicity, controlling type I error, increasing the probability of accepting the null hypothesis.


INTRODUCTION
In the study of very complex phenomena, the statistician often needs to make inferences about most aspects of a problem.In order to make inference on multiple aspects at the same time, the global error must be controlled.In this context there is a need for a procedure that allows a decision, through the joint use of univariate and multivariate tests.The CLOSED TESTING procedures, firstly proposed by Marcus [1], represent a simple and effective solution to this issue.
In [2] and [3] the above-mentioned data analysis procedure is thoroughly discussed; the authors emphasized that the Closed Testing methods are among the most powerful multiple inference methods and are quickly gaining significant acceptance and popularity.The article widely explains the methodology, the conditions on which it is based and the tests by which it can be carried out.
In [4] the authors underline the high degree of adaptability of the Closed Testing procedures to a wide range of experimental situations, both in parametric than in non-parametric ambit; peculiarly the authors introduced, in the Closed Testing procedure, non-parametric permutation methods [5], [6], [7], [8], [9] and made a comparison between the different methods, in terms of robustness and power.The authors also point out how the permutation methods are particularly suitable for use of Closed Testing procedures, for three particular reasons: • a greater robustness of the partial and combined permutation tests, compared to the parametric tests, especially in conditions of non-normality; • the opportunity to have combined tests which can take into account the structure of dependence between variables, without it being formally explicated; • the possibility of evaluating systems by directional or not directional hypotheses, characterized by a large cardinality of the components hypotheses.In [10] the attention is focused on multivariate multiple comparisons for multiplicity control in the non-parametric permutation context.In particular the authors illustrate the selection criteria of a function to combine the p-value associated with minimal hypothesis test.
Recent contribution is due to [11] that realize a systematic comparison of methods for combining p-values from independent tests.
The present paper aims to show an original application of the Closed Testing procedures for multiplicity control in medical research, with reference to the oxidative stress; in particular the Min-P Bonferroni-Holm method was applied to the p-value adjustment, related to three parameters (BAP, D-ROMS, AGEs) of oxidative stress in Hashimoto's Thyroiditis.

The Closed testing methodology
When multiple tests are used (when comparing two or more groups), in the context of univariate and multivariate distributions, consider a family of distinct hypoteses H i : w ∈ O i , i ∈ I, where O i is a proper subset of O and I is the set of indices.The hypothesis H 0 = ∩ iI H i Hi is defined global hypothesis.If H i implies H j , (H i → H j ), then H j is a own component of H i and an implication relation exists between H i and H j .The hypotheses that do not have own component are called minimal; they are referred to the pairwise comparisons; the hypotheses that contain own components are called not minimal.
The Hierarchical Family is a family of hypotheses, where at least one implication relationship exists.
Figure 1 shows the structure of the hierarchical hypotheses with three minimal hypotheses.
When the existence of significant differences between groups is assessed, it is necessary that inferences check, at the fixed α level, the value of Familywise Error Rate (FWER), that is the error rate of the hierarchical family.The FWER is the probability of making at least an univariate first type error or the probability of making a multivariate first type error; therefore, a multiple test procedure with C 1 ,..., C k critical regions has to be applied in order to test the null hypotheses H 01 ,..., H 0k in which the probability of first type error is less than or equal to α, so that controls the FWER (when H 01 ,...,H 0k are true).The goal of Closed Testing methods is to create a procedure that is characterized by the properties of coherence and, possibly, of consonance and for which the experimental error does not exceed α.A multiple testing procedure for a hierarchical hypotheses family enjoys two important properties: • coherence properties: if, given any pair of hypothesis (H i , H j ), such that H j is included in H i , the acceptance of H j implies the acceptance of H i ; • consonance properties: if, when a non-minimal hypothesis H j is refused, there is at least a minimal hypothesis that must be refused.In Closed Testing procedures the coherence properties are required, whereas the consonance properties are desirable.
A fundamental characteristic of the Closed Testing is to refer to a set of statistical hypotheses that are closed with respect to the intersection and for which each test (associated to them) has α level.In fact, given a hypotheses family {H i (1≤i≤k)}, the "closure" of the set refers to the set H p = ∩ i ∈ p H i , p ∈ 1,...,k of all nonempty intersections of H i , with i=1,...,k.
In [1] the authors demonstrated that Closed Testing procedure controls FWER at fixed α level.
In Closed Testing the adjusted p-value, related to a certain hypothesis H i is equal to the maximum of the p-values associated to hypotheses that include Hi [12].In order to test composed hypotheses several methods were There is not a unique method that is the best in all situations; the choice of these tests depends on the nature of alternative hypothesis that has to be verified.The applicability of the Closed Testing is tied to the use of tests that have to be consistent and unbiased; among these, two tests have to be mentioned for the advantage of being released from the knowledge of the dependency structure between the minimal hypotheses tests: Bonferroni test and Simes test for composed hypotheses (see [13] for methodological deepening).In this perspective, Abdi [14] focus the attention on Bonferroni and Šidák corrections for multiple comparisons.
If the researcher aims at evaluating the difference between two independent samples on n variables, n hypotheses are formulated, in the comparison between groups; he must verify the n minimal hypotheses at the significance α level by means of adequate tests, such as Student t-test or a non parametric test [15]; alternatively, non-parametric tests based on sampling of the permutation space can be used [8]; they offer the advantage of including the effects of the dependence structure between variables, without the need to directly estimate it.The "closed" set is created, i.e. the set of all possible composed hypotheses; each hypothesis is tested through an appropriate test.Simple H i hypothesis is rejected if the simple test is significant and if the intersection of each test that includes H i are significant.
After determining, in this way, the p-value of minimal and composed hypotheses, the Closed Testing adjustment can be made by considering, for a H i hypothesis, the maximum among the p-value of the hypotheses that include H i [16].Among the procedures on several levels this paper focuses on the "Sequentially Rejective Bonferroni Procedure", known as "MinP Bonferroni-Holm Procedure", proposed by Holm [17].

Minp Bonferroni-Holm Method
The Bonferroni-Holm method is a method used to counteract the problem of multiple comparisons; it is intended to control the Familywise Error Rate and offers a simple test which is uniformly more powerful than the Bonferroni correction.It is one of the earliest usages of stepwise algorithms in simultaneous inference.It applies the Bonferroni method to generate a step-wise procedure, as follows: 1. for each single hypothesis H k (k=1,...,K) the significance of a t-test for two independent samples is calculated and the vector of significance, arranged in increasing p (1) ,...,p (k) is thus determined; 2. if p (k) ≥ α/k, we have to accept H 1 ,...,H K and the algorithm stops; otherwise we have to reject the global hypotheses and proceed; 3. if p (k-1) ≥ α/(k-1), we have to accept H (1) ,...,H (K-1) and the algorithm stops, otherwise we have to reject H (K-1) and proceed; 4. the process is repeated as the previous step, verifying if p (k-i) ≥ α/(k-i) at each subsequent step.The algorithm stops at the first tested inequality.Similarly, the composed hypothesis is rejected when (K-i) p (K-i) ≥α.This procedure requires to calculate only p-value associated with minimal assumptions.
Supposing we want to test hypotheses H 1 , H 2 , and H 3 , the closed testing procedure works as follows: 1 After adjustment for the Closed Testing, at H 3 minimal hypothesis we associated a significance of 3minP, where minP represents the minimum p-value considered in the combination.Subsequently, the intersection of hypotheses not yet rejected should be evacuate, using the abovedescribed procedure, i.e. multiplying by 2 the low p-value of minimal assumptions, following the smallest ever (already used) ... and so on.
Holm-Bonferroni method is uniformly more powerful than the classic Bonferroni correction.There are other methods for controlling the family-wise error rate that are more powerful than Holm-Bonferroni.Among those we have to cite the Hochberg and Hommel procedures [18].However, the Hochberg procedure requires the hypotheses to be independent or under certain forms of positive dependence, whereas Holm-Bonferroni can be applied with no further assumptions on the data.

RESULTS
The above-mentioned multivariate methodology was applied to the oxidative stress parameters.Oxidative stress, which occurs as a result of an imbalance between free radicals production and antioxidant defence mechanisms, has been implicated in the pathogenesis of several autoimmune disorders, including thyroid diseases.
In the study 134 euthyroid subjects were included: 71 newly diagnosed HT patients (8 Male e 63 Female; mean age 38±13 yr) and 63 age and sex-matched healthy controls.
Figure 2 shows the reference scheme for the application of Closed Testing procedure to oxidative stress, with reference to the three above-mentioned variables.
The Closed testing procedure was applied using the "MINp Bonferroni Holm" method for multiplicity control.
In the following illustrative chart, realized for each comparison, the minimal hypothesis 1 refers to BAP, the minimal hypothesis 2 to D-ROMS and finally the minimal hypothesis 3 to AGEs.The figures show the p-value associated with each hypothesis (minimal, not minimal and multivariate) and the adjusted p-value (denoted by Adj.) after correction by Closed Testing procedure, for comparison between: • cases and controls (Figure 3); • AbTg positive and negative patients (Figure 4); • AbTPO positive and negative patients (Figure 5); • normal and high-normal TSH serum levels (Figure 6) setting a cut-off value for TSH of 4.2 mIU/L, according to Mayo Clinic (one of the leading global research institutions).Table 1 shows Mean ±SD of the three variables, in different patient groups.
In order to test each minimal hypothesis H 1 , H 2 , H 3 the NPC test (based on permutation solution) was applied, for the optimal properties which characterize it [8].In Figure 3 we reported the results of Closed Testing procedure to compare cases and controls.
In the comparison between cases and controls, the minimal hypotheses (related to the three examined variables) and the multivariate hypotheses H 123 are rejected at the fixed significance level.Statistically significant differences exist between cases and controls for the three parameters of oxidative stress Bap, D-ROMS and AGEs, even after correction using Closed Testing. Figure 4 shows the results of Closed Testing for the comparison between Abtg positive and negative patients.
In the comparison between AbTg positive and negative patients, the minimal hypotheses H 1(BAP) ed H 2(dROMS) are rejected, revealing the existence of significant differences between the groups, while H 3 hypothesis concerning AGEs is accepted.The multivariate hypothesis H 123 is rejected, since the p-value is significant, even when adjusted by Closed Testing procedure.In the comparison between AbTPO positive and negative patients, all minimal hypothesis (related to the three examined variables) and the multivariate hypotheses H 123 are rejected at the significance level α=0.05, indicating the existence of significant differences between groups.
In the comparison between patients with high-normal (TSH+) and normal (TSH-) thyroid-stimulating hormone serum levels (Figure 6), all minimal hypotheses and the multivariate hypothesis H 123 are accepted.For minimal hypothesis H 2 , related to D-ROMS, we note a dissimilarity between the raw p-value (that results significant) and the adjusted p-value (that is not significant), highlighting the low degree of conservativeness of which the Closed Testing procedure is characterized.
From an endocrinological point of view this result is not surprising, considering that all patients are euthyroid, with similar levels of TSH.
Testing procedure that maintains fixed the α, as multiple error level.

CONCLUSION
This paper proposes an original application of the Closed testing procedure (by use of min-p Bonferroni-Holm method) to three parameters of oxidative stress in a population of patients affected by Hashimoto's Thyroiditis.Comparisons between different patients are performed and, for each of them, the raw and the adjusted p-values are shown.The results allow to highlight the utility of Closed testing procedure in controlling type I error.Looking at the raw and the adjusted p-value, we can note that the Closed Testing procedure is slightly conservative, because it leads to accept the null hypothesis.In fact in the comparison between normal and high-normal TSH serum levels, for only D-ROMS, the raw p-value is significant, but the adjusted p-value is not statistically significant at the fixed α level.Generalizing, Closed Testing procedure checks the multiplicity, controlling type I error, since it increases the probability of accepting the null hypothesis, such as showed in this application.
Closed Testing procedures offers strong control of FWER.For the final inferences, an elementary null hypothesis Hi is rejected if, and only if, its corresponding test is significant at α level, and every other hypothesis in the family that implies it is rejected by its α level test.
Finally, this article aims at encouraging the use of the Closed Testing procedure in medical research since it is preferable to other correction procedures, because it controls the multiplicity, very often recurrent in medicine [21], [22], [23], ensuring the observance of global α level.

FIGURE
FIGURE 3. Closed Testing procedure for comparison between cases and controls

Figure 5
Figure 5 illustrates the results of Closed Testing to control multiplicity in the comparison between AbTPO positive and negative patients.In the comparison between AbTPO positive and negative patients, all minimal hypothesis (related to the three examined variables) and the multivariate hypotheses H 123 are rejected at the significance level α=0.05, indicating the existence of significant differences between groups.In the comparison between patients with high-normal (TSH+) and normal (TSH-) thyroid-stimulating hormone serum levels (Figure6), all minimal hypotheses and the multivariate hypothesis H 123 are accepted.For minimal hypothesis H 2 , related to D-ROMS, we note a dissimilarity between the raw p-value (that results significant) and the adjusted p-value (that is not significant), highlighting the low degree of conservativeness of which the Closed Testing procedure is characterized.From an endocrinological point of view this result is FIGURE 1. Structure of the hierarchical inclusions in Closed Testing procedure.
BIOSTATISTICSEpidemiology Biostatistics and Public Health -2017, Volume 14, Number 1 Closed Testing procedure for multiplicity control.An application on oxidative stress parameters in Hashimoto's thyroiditis proposed in literature.Certainly, the choice of adequate test for minimal and composed hypotheses influences the power procedure.