Heart failure, characterized by weakening or damage to the heart muscle, causes fluid to gradually accumulate in the patient’s lungs, legs, feet and other parts of the body. The disease is chronic and incurable and often leads to arrhythmia or sudden cardiac arrest. For many centuries, bloodletting and leeching were the methods of choice, famously practiced by barbers in Europe at a time when doctors rarely operated on patients.
In the 21st century, heart failure treatment has become much less medieval: patients now undergo fit lifestyle changes, are prescribed medications, and sometimes operate pacemakers. However, heart failure remains one of the leading causes of morbidity and mortality, placing a significant burden on healthcare systems worldwide.
“About half of people diagnosed with heart failure will die within five years of diagnosis,” says Teya Bergamaschi, an MIT graduate student in the lab of Nina T. and Robert H. Rubin Professors Collin Sultz and co-author of a recent paper presenting a deep learning model for predicting heart failure. “Understanding how a patient will fare after hospitalization is really important when allocating finite resources.”
Paper, published in by a team of researchers from MIT, Mass General Brigham and Harvard Medical School shares results from the development and testing of PULSE-HF, which stands for “Predict changes in left ventricular systolic function from the ECG of patients with heart failure.” The project was conducted in Stultz’s lab, which is affiliated with MIT Abdul Latif Jameel Clinic for Machine Learning in Health. Developed and retrospectively tested in three different patient cohorts from Massachusetts General Hospital, Brigham and Women’s Hospital, and MIMIC-IV (a publicly available dataset), the deep learning model accurately predicts changes in left ventricular ejection fraction (LVEF), the percentage of blood pumped from the heart’s left ventricle.
A fit human heart pumps about 50 to 70 percent of the blood from the left ventricle with each beat – less than that is considered a sign of a potential problem. “The model accepts [electrocardiogram] and allows you to predict whether your ejection fraction will drop below 40 percent or not over the next year,” says Tiffany Yau, an MIT doctoral student in Stultz’s lab and co-first author of the PULSE-HF paper. “It’s the most serious subset of heart failure.”
If PULSE-HF predicts that a patient’s ejection fraction is likely to worsen within a year, the doctor may refer the patient for further observation first. As a result, lower-risk patients can reduce the number of hospital visits and the time it takes to attach 10 electrodes to the body for a 12-lead ECG. The model can also be implemented in resource-limited clinical settings, including rural physician offices that do not typically employ a cardiac sonographer to perform ultrasound examinations on a daily basis.
“The biggest thing that stands out [PULSE-HF] from other methods, ECG related to heart failure is used for prediction instead of detection,” says Yau. The article notes that to date, there are no other methods for predicting future LVEF decline in heart failure patients.
During the testing and validation process, researchers used a metric known as the “area under the receiver operating characteristic curve” (AUROC) to measure PULSE-HF performance. AUROC is typically used to measure a model’s ability to distinguish between classes on a scale of 0 to 1, with 0.5 being random and 1 being excellent. The PULSE-HF study achieved AUROC values ranging from 0.87 to 0.91 in all three patient cohorts.
Notably, the researchers also built a single-lead ECG version of PULSE-HF, which means only one electrode needs to be placed on the body. Although 12-lead ECGs are generally considered superior due to greater versatility and accuracy, the performance of the single-lead PULSE-HF version was as good as the 12-lead version.
Despite the elegant simplicity of the PULSE-HF idea, like most clinical AI research, it belies a labor-intensive execution. “It took years [to complete this project]“It went through many iterations,” Bergamaschi recalls.
One of the biggest challenges the team faced was collecting, processing and cleaning ECG and echocardiogram data sets. Although the model is intended to predict a patient’s ejection fraction, labels for the training data were not always readily available. Like a student learning from a textbook that contains an answer key, labeling is crucial in helping machine learning models correctly identify patterns in the data.
Pristine, linear text in the form of TXT files usually works best when training models. However, echocardiogram files are usually available as PDF files, and when PDF files are converted to TXT files, the text (which is broken up by line breaks and formatting) becomes tough for the model to read. The unpredictable nature of real-life scenarios, such as a troubled patient or a loose lead, also confounded the data. “There are a lot of signal artifacts that need to be removed,” Bergamaschi says. “It’s kind of a never-ending rabbit hole.”
While Bergamaschi and Yau acknowledge that more sophisticated methods could support filter data for better signals, the usefulness of these approaches has its limits. “At what point do you stop?” – Yau asks. “You have to think about the use case – is it easiest to have this model that works on data that’s a bit messy? Because it probably will be.”
The researchers anticipate that the next step for PULSE-HF will be to test the model in a prospective study in real patients whose future ejection fraction is unknown.
Despite the challenges of bringing clinical AI tools like PULSE-HF to the finish line, including the potential risk of extending their PhD degree for another year, students believe the years of difficult work have been worthwhile.
“I think some things are rewarding partly because they’re challenging,” Bergamaschi says. “A friend told me, ‘If you think you’re going to find your calling after you graduate, if it really is your calling, it will show up in the one extra year it takes you to graduate.’ … The way we are assessed as researchers [the ML and health] space is different from other ML space researchers. Everyone in this community understands the unique challenges that exist here.”
“There is too much suffering in the world,” says Yau, who joined Stultz’s lab after a health-related event that made her realize the importance of machine learning in health care. “I consider anything that tries to alleviate suffering to be a valuable use of my time.”
