“Clinical Stability” and Propensity Score Matching in Cardiac Surgery: is the clinical evaluation of treatment efficacy algorithmdependent in small sample size settings?

Daniele  Bottigliengo; Aslıhan Sentürk  Acar; Veronica Sciannameo; Giulia  Lorenzoni; Jonida  Bejko; Tomaso  Bottio; Emanuele Cozzi; Marta  Vadori; Jean-Paul  Soulillou; Thierry Le Torneau; Thomas  Senage; Rafael  Mañez; Cristina  Costa; Vered  Padler-Karavani; Sofia  Scali; Massimiliano Carrozzini; Emilia  Fiorello; Samuel  Fusca; Gino  Gerosa; Ileana Baldi; Paola Berchialla; Dario Gregori

Authors

Daniele Bottigliengo University of Padova
Aslıhan Sentürk Acar Hacettepe University, Ankara
Veronica Sciannameo University of Padova
Giulia Lorenzoni University of Padova
Jonida Bejko University of Brescia
Tomaso Bottio University of Padova
Emanuele Cozzi Azienda Ospedaliera di Padova
Marta Vadori Azienda Ospedaliera di Padova
Jean-Paul Soulillou Centre Hospitalier Universitaire de Nantes
Thierry Le Torneau Centre Hospitalier Universitaire de Nantes
Thomas Senage Centre Hospitalier Universitaire de Nantes
Rafael Mañez Hospitalet de Llobregat, Barcelona
Cristina Costa Hospitalet de Llobregat, Barcelona
Vered Padler-Karavani Tel Aviv University
Sofia Scali University of Padova
Massimiliano Carrozzini University of Padova
Emilia Fiorello University of Padova
Samuel Fusca University of Padova
Gino Gerosa University of Padova
Ileana Baldi University of Padova
Paola Berchialla University of Torino
Dario Gregori University of Padova

Abstract

Background: Propensity score matching represents one of the most popular techniques to deal with treatment allocation bias in observational studies. However, when the number of enrolled patients is very low, the creation of matched set of subjects may highly depend on the model used to estimate individual propensity scores, undermining the stability of consequential clinical findings. In this study, we investigate the potential issues related to the stability of the matched sets created by different propensity score models and we propose some diagnostic tools to evaluate them.

Methods: Matched groups of patients were created using five different methods: Logistic Regression, Classification and Regression Trees, Bagging, Random Forest and Generalized Boosted Model. Differences between subjects in the matched sets were evaluated by comparing both pre-treatment covariates and propensity score distributions. We applied our proposal to a cardio-surgical observational study that aims to compare two different procedures of cardiac valve replacement.

Results: Both baseline characteristics and propensity score distributions were systematically different across matched samples of patients created with different models used to estimate propensity score. The most relevant differences were observed for the matched set created by estimating individual propensity scores with Classification and Regression Trees algorithm.

Conclusion: Clinical stability of matched samples created with different statistical methods should always be evaluated to ensure reliability of final estimates. This work opens the door for future investigations that fully assess the implications of this finding.

“Clinical Stability” and Propensity Score Matching in Cardiac Surgery: is the clinical evaluation of treatment efficacy algorithmdependent in small sample size settings?

Authors

Abstract

Downloads

Published

Issue

Section

License

Information