“Clinical Stability” and Propensity Score Matching in Cardiac Surgery: is the clinical evaluation of treatment efficacy algorithmdependent in small sample size settings?

Authors

  • Daniele Bottigliengo University of Padova
  • Aslıhan Sentürk Acar Hacettepe University, Ankara
  • Veronica Sciannameo University of Padova
  • Giulia Lorenzoni University of Padova
  • Jonida Bejko University of Brescia
  • Tomaso Bottio University of Padova
  • Emanuele Cozzi Azienda Ospedaliera di Padova
  • Marta Vadori Azienda Ospedaliera di Padova
  • Jean-Paul Soulillou Centre Hospitalier Universitaire de Nantes
  • Thierry Le Torneau Centre Hospitalier Universitaire de Nantes
  • Thomas Senage Centre Hospitalier Universitaire de Nantes
  • Rafael Mañez Hospitalet de Llobregat, Barcelona
  • Cristina Costa Hospitalet de Llobregat, Barcelona
  • Vered Padler-Karavani Tel Aviv University
  • Sofia Scali University of Padova
  • Massimiliano Carrozzini University of Padova
  • Emilia Fiorello University of Padova
  • Samuel Fusca University of Padova
  • Gino Gerosa University of Padova
  • Ileana Baldi University of Padova
  • Paola Berchialla University of Torino
  • Dario Gregori University of Padova

Abstract

Background: Propensity score matching represents one of the most popular techniques to deal with treatment allocation bias in observational studies. However, when the number of enrolled patients is very low, the creation of matched set of subjects may highly depend on the model used to estimate individual propensity scores, undermining the stability of consequential clinical findings. In this study, we investigate the potential issues related to the stability of the matched sets created by different propensity score models and we propose some diagnostic tools to evaluate them.

Methods: Matched groups of patients were created using five different methods: Logistic Regression, Classification and Regression Trees, Bagging, Random Forest and Generalized Boosted Model. Differences between subjects in the matched sets were evaluated by comparing both pre-treatment covariates and propensity score distributions. We applied our proposal to a cardio-surgical observational study that aims to compare two different procedures of cardiac valve replacement.

Results: Both baseline characteristics and propensity score distributions were systematically different across matched samples of patients created with different models used to estimate propensity score. The most relevant differences were observed for the matched set created by estimating individual propensity scores with Classification and Regression Trees algorithm.

Conclusion: Clinical stability of matched samples created with different statistical methods should always be evaluated to ensure reliability of final estimates. This work opens the door for future investigations that fully assess the implications of this finding.

Downloads

Published

2022-02-11

Issue

Section

Biostatistics