Rashid Al Mukaddim;Emily MacKay;Nils Gessert;Ramon Erkamp;Shriram Sethuraman;Jonathan Sutton;Shyam Bharat;Melanie Jutras;Cristiana Baloescu;Christopher L Moore;Balasundar I. Raju
{"title":"Spatiotemporal Deep Learning-Based Cine Loop Quality Filter for Handheld Point-of-Care Echocardiography","authors":"Rashid Al Mukaddim;Emily MacKay;Nils Gessert;Ramon Erkamp;Shriram Sethuraman;Jonathan Sutton;Shyam Bharat;Melanie Jutras;Cristiana Baloescu;Christopher L Moore;Balasundar I. Raju","doi":"10.1109/TUFFC.2024.3396796","DOIUrl":null,"url":null,"abstract":"The reliability of automated image interpretation of point-of-care (POC) echocardiography scans depends on the quality of the acquired ultrasound data. This work reports on the development and validation of spatiotemporal deep learning models to assess the suitability of input ultrasound cine loops collected using a handheld echocardiography device for processing by an automated quantification algorithm (e.g., ejection fraction (EF) estimation). POC echocardiograms (\n<inline-formula> <tex-math>${n} = 885$ </tex-math></inline-formula>\n DICOM cine loops from 175 patients) from two sites were collected using a handheld ultrasound device and annotated for image quality at the frame level. Attributes of high-quality frames for left ventricular (LV) quantification included a temporally stable LV, reasonable coverage of LV borders, and good contrast between the borders and chamber. Attributes of low-quality frames included temporal instability of the LV and/or imaging artifacts (e.g., lack of contrast, haze, reverberation, and acoustic shadowing). Three different neural network architectures were investigated: 1) frame-level convolutional neural network (CNN) which operates on individual echo frames (VectorCNN); 2) single-stream sequence-level CNN which operates on a sequence of echo frames [VectorCNN + long short-term memory (LSTM)]; and 3) two-stream sequence-level CNNs which operate on a sequence of echo and optical flow frames (VectorCNN + LSTM + Average, VectorCNN + LSTM + MinMax, and VectorCNN + LSTM + ConvPool). Evaluation on a sequestered test dataset containing 76 DICOM cine loops with 16 914 frames showed that VectorCNN + LSTM can effectively utilize both spatial and temporal information to regress the quality of an input frame (accuracy: 0.925, sensitivity =0.860, and specificity =0.952), compared to the frame-level VectorCNN that only utilizes spatial information in that frame (accuracy: 0.903, sensitivity =0.791, and specificity =0.949). Furthermore, an independent sample t-test indicated that the cine loops classified to be of adequate quality by the VectorCNN + LSTM model had a statistically significant lower bias in the automatically estimated EF (mean bias \n<inline-formula> <tex-math>$= -3.73$ </tex-math></inline-formula>\n% \n<inline-formula> <tex-math>$\\pm ~7.46$ </tex-math></inline-formula>\n%, versus a clinically obtained reference EF) compared to the loops classified as inadequate (mean bias \n<inline-formula> <tex-math>$= -15.92$ </tex-math></inline-formula>\n% \n<inline-formula> <tex-math>$\\pm ~12.17$ </tex-math></inline-formula>\n%) (\n<inline-formula> <tex-math>${p} = 0.007$ </tex-math></inline-formula>\n). Thus, cine loop stratification using the proposed spatiotemporal CNN model improves the reliability of automated POC echocardiography image interpretation.","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"71 11","pages":"1577-1587"},"PeriodicalIF":3.0000,"publicationDate":"2024-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10517992/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
Abstract
The reliability of automated image interpretation of point-of-care (POC) echocardiography scans depends on the quality of the acquired ultrasound data. This work reports on the development and validation of spatiotemporal deep learning models to assess the suitability of input ultrasound cine loops collected using a handheld echocardiography device for processing by an automated quantification algorithm (e.g., ejection fraction (EF) estimation). POC echocardiograms (
${n} = 885$
DICOM cine loops from 175 patients) from two sites were collected using a handheld ultrasound device and annotated for image quality at the frame level. Attributes of high-quality frames for left ventricular (LV) quantification included a temporally stable LV, reasonable coverage of LV borders, and good contrast between the borders and chamber. Attributes of low-quality frames included temporal instability of the LV and/or imaging artifacts (e.g., lack of contrast, haze, reverberation, and acoustic shadowing). Three different neural network architectures were investigated: 1) frame-level convolutional neural network (CNN) which operates on individual echo frames (VectorCNN); 2) single-stream sequence-level CNN which operates on a sequence of echo frames [VectorCNN + long short-term memory (LSTM)]; and 3) two-stream sequence-level CNNs which operate on a sequence of echo and optical flow frames (VectorCNN + LSTM + Average, VectorCNN + LSTM + MinMax, and VectorCNN + LSTM + ConvPool). Evaluation on a sequestered test dataset containing 76 DICOM cine loops with 16 914 frames showed that VectorCNN + LSTM can effectively utilize both spatial and temporal information to regress the quality of an input frame (accuracy: 0.925, sensitivity =0.860, and specificity =0.952), compared to the frame-level VectorCNN that only utilizes spatial information in that frame (accuracy: 0.903, sensitivity =0.791, and specificity =0.949). Furthermore, an independent sample t-test indicated that the cine loops classified to be of adequate quality by the VectorCNN + LSTM model had a statistically significant lower bias in the automatically estimated EF (mean bias
$= -3.73$
%
$\pm ~7.46$
%, versus a clinically obtained reference EF) compared to the loops classified as inadequate (mean bias
$= -15.92$
%
$\pm ~12.17$
%) (
${p} = 0.007$
). Thus, cine loop stratification using the proposed spatiotemporal CNN model improves the reliability of automated POC echocardiography image interpretation.
期刊介绍:
IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control includes the theory, technology, materials, and applications relating to: (1) the generation, transmission, and detection of ultrasonic waves and related phenomena; (2) medical ultrasound, including hyperthermia, bioeffects, tissue characterization and imaging; (3) ferroelectric, piezoelectric, and piezomagnetic materials, including crystals, polycrystalline solids, films, polymers, and composites; (4) frequency control, timing and time distribution, including crystal oscillators and other means of classical frequency control, and atomic, molecular and laser frequency control standards. Areas of interest range from fundamental studies to the design and/or applications of devices and systems.