Assessing Methods for Predictive Cut-Point Estimation: A Simulation-Based Comparison

Authors

  • Piergiacomo Di Gennaro Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Mario Fordellone Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Giovanni Nicolao Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Paola Schiattarella Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Annafrancesca Smimmo Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Teresa Speranza Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Vittorio Simeon Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Simona Signoriello Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml
  • Paolo Chiodini Unità di Statistica Medica, University of Campania "Luigi Vanvitelli" image/svg+xml

DOI:

https://doi.org/10.54103/2282-0930/29367

Abstract

Introduction
The identification of an optimal cut-point for continuous biomarkers plays a crucial role in defining patient subgroups likely to benefit from specific treatments. While the literature has extensively covered prognostic biomarkers, those that provide outcome prediction regardless of treatment, the methodological framework for identifying predictive effect, which inform treatment effect heterogeneity, is less developed. This is primarily due to the added complexity of modelling treatment-biomarker interactions, which poses challenges related to statistical power, overfitting, and bias.

Objectives
This study aimed to compare three statistical methods for the identification of predictive cut-points in time-to-event data. Our goal was to assess their performance in estimating the correct interaction effect and identifying a responder subgroup, under simulation settings that account for variability in treatment efficacy, biomarker predictive effect, and subgroup prevalence.

Methods
We implemented three approaches: Procedure B of the Biomarker-Adaptive Threshold Design (M1), which combines test statistics across possible cut-points using a permutation test based on likelihood-ratio statistics; the Differential Hazard Ratio method (M2), which selects the cut-point with the largest difference in HRs across adjacent thresholds; and a Minimum P-value method (M3) adapted for interaction terms in the Cox model [1,2]. We conducted a simulation study with 1000 replications from an exponential distribution with an expected censoring rate of approximately 40%. Eight main scenarios were defined by all possible combinations of two sample sizes (n = 300 and n = 500), two treatment effect sizes (HR = 1 or 0.5), two interaction effect sizes (HR = 1 or 0.5), and a biomarker prognostic effect set to HR = 0.6. In addition, we included two extra scenarios calibrated to achieve 80% power: one based on the interaction effect test (β for treatment-biomarker interaction) and one on the subgroup effect test (β within responders). In each replication, the true cut-point was randomly drawn from the biomarker distribution between the 20th and 80th percentiles. For each method, we evaluated statistical power, cut-point estimation bias, subgroup and predictive coefficient estimation bias, and type I error. A significance level of 0.05 was used for all three methods. The procedures were also evaluated on a real case on a prostate cancer clinical trial conducted by the Second Veterans Administration Cooperative Urologic Research Group [3].

Results
M1 consistently demonstrated robust performance, with type I error close to the nominal level ( , 5.6%) and minimal bias in cut-point estimation ( ,  0.005±0.06). It maintained good power even when the subgroup size was small. M2 showed unstable cut-point estimates ( ,  0.055±0.42) and high variability in interaction estimates ( ,  0.463±1.46), yielding a very low power ( , 16.2%). While the M3 achieved the highest power in some scenarios ( , 82.1%), it exhibited significant type I error inflation ( , 50.1%) and substantial bias due to multiple testing without correction ( ,  -0.401±1.730). In small subgroups, all methods experienced reduced performance, but M1 remained the most stable. On the prostate cancer dataset, M1 identified a plausible treatment-responsive subgroup, while the other two methods produced conflicting or less reliable results.

Conclusions

Our results highlight the need for robust methods in predictive cut-point estimation. M1 showed the best balance between error control and accuracy. In contrast, M2 and M3 may lead to overfitting, unstable estimates, and inflated first error rates. Future research should extend these comparisons to more complex models including multivariate biomarkers.

Downloads

Download data is not yet available.

References

[1] Jiang W, Freidlin B, Simon R, Biomarker-Adaptive Threshold Design: A Procedure for Evaluating Treatment With Possible Biomarker-Defined Subset Effect. JNCI Journal of the National Cancer Institute, 2007;99(13):1036-1043 DOI: https://doi.org/10.1093/jnci/djm022

[2] Rabbee N, Biomarker Analysis in Clinical Trials with R. Chapman and Hall/CRC, 2020 DOI: https://doi.org/10.1201/9780429428371

[3] Byar DP, Corle DK, Selecting optimal treatment in clinical trials using covariate information. J Chronic Dis, 1977; 30: 445-59 DOI: https://doi.org/10.1016/0021-9681(77)90037-6

Published

2025-09-08

How to Cite

1.
Di Gennaro P, Fordellone M, Nicolao G, Schiattarella P, Smimmo A, Speranza T, et al. Assessing Methods for Predictive Cut-Point Estimation: A Simulation-Based Comparison. ebph [Internet]. 2025 [cited 2026 Feb. 6];. Available from: https://riviste.unimi.it/index.php/ebph/article/view/29367

Issue

Section

Congress Abstract - Section 3: Metodi Biostatistici