Predicting Depression Risk in Patients With Cancer Using Multimodal Data - Algorithm Development Study
Abstract
Background: Patients with cancer starting systemic treatment programmes, such as chemotherapy, often develop depression. A prediction model may assist clinicians and health-care teams with early identification of vulnerable patients.
Objective: This study aimed to develop a prediction model for depression risk within the first month of cancer treatment.
Methods: We included 16,159 patients diagnosed with cancer who started chemo- or radiotherapy between 2008 and 2021. Machine learning models (for example, LASSO logistic regression) and natural language processing models (BERT) were used to build multimodal prediction models based on electronic health record data and unstructured text (patient messages and clinician notes). Performance was assessed on an independent test set (n=5,387, 33%) using AUROC, calibration curves, and decision-curve analysis.
Results: Among 16,159 patients, 437 (2.7%) received a depression diagnosis within the first month of treatment. LASSO models based on structured data (AUROC 0.74, 95% CI 0.71-0.78) and structured data plus message classification scores (AUROC 0.74, 95% CI 0.71-0.78) had the strongest discrimination. BERT models were around 0.71 AUROC. The message-only logistic model performed poorly (AUROC 0.54, 95% CI 0.52-0.56), and the clinician-note-only model performed worst (AUROC 0.50, 95% CI 0.49-0.52). Calibration was good for logistic models, while BERT models produced overly extreme risk estimates even after recalibration. Risks were underestimated for female and Black patients.
Conclusions: The findings highlight both the promise and limitations of multimodal machine learning for depression-risk prediction in oncology. Further work is needed to validate models externally, refine outcomes and predictors, and address subgroup bias.