Development of two machine learning models to predict conversion from primary HER2-0 breast cancer to HER2-low metastases: a proof-of-concept study
F. Miglietta, A. Collesei, C. Vernieri, T. Giarratano, C.A. Giorgi, F. Girardi, G. Griguolo, M. Cacciatore, A. Botticelli, A. Vingiani, G. Fotia, F. Piacentini, D. Massa, F. Zanghì, M. Marino, G. Pruneri, M. Fassan, A.P. Dei Tos, M.V. Dieci, and V. Guarneri
ESMO Open, 2025
Background
HER2-low expression has gained clinical relevance in breast cancer (BC) due to the availability of anti-HER2 antibody–drug conjugates for patients with HER2-low metastatic BC. The well-reported instability of HER2-low status during disease evolution highlights the need to identify patients with HER2-0 primary BC who may develop a HER2-low phenotype at relapse. In response to the urgency of maximizing treatment access, we utilized artificial intelligence to predict this occurrence.
Patients and methods
We included a large multicentric retrospective cohort of patients with BC who underwent tissue resampling at relapse. The dataset was preprocessed to address relevant issues such as missing data, feature abundance, and target class imbalance. We then trained two models: one focused on explainability [Extreme Gradient Boosting (XGBoost)] and another aimed at performance (an ensemble of XGBoost and support vector machine).
Results
A total of 1200 patients were included in this study. Among 386 patients with HER2-0 primary BC and matched HER2 status at relapse, 42.5% (n = 157) converted to a HER2-low phenotype. The explainable model achieved a balanced accuracy of 58%, with a sensitivity of 53% and a specificity of 64%. The most important variables for this model were primary BC phenotype [mean Shapley value (SHAP) 0.540], primary BC histological type (SHAP 0.101), grade (SHAP 0.182), and sites of relapse (SHAP 0.008-0.213). The ensemble model had a balanced accuracy of 64%, with a sensitivity of 75% and a specificity of 53%.
Conclusions
This work represents one of the first proof-of-concept applications of machine learning models to predict a highly relevant phenomenon for drug access in modern BC oncology. Starting with an explainable model and subsequently integrating it with an ensemble approach enabled us to enhance performance while maintaining transparency, explainability, and intelligibility.