Pub Date : 2024-05-23DOI: 10.1109/TUFFC.2024.3404622
Erlend Loland Gundersen, Erik Smistad, Tollef Struksnes Jahren, Svein-Erik Masoy
Deep learning (DL) models have emerged as alternative methods to conventional ultrasound (US) signal processing, offering the potential to mimic signal processing chains, reduce inference time, and enable the portability of processing chains across hardware. This paper proposes a DL model that replicates the fine-tuned BMode signal processing chain of a high-end US system and explores the potential of using it with a different probe and a lower-end system. A deep neural network was trained in a supervised manner to map raw beamformed in-phase and quadrature component data into processed images. The dataset consisted of 30,000 cardiac image frames acquired using the GE HealthCare Vivid E95 system with the 4Vc-D matrix array probe. The signal processing chain includes depth-dependent bandpass filtering, elevation compounding, frequency compounding, and image compression and filtering. The results indicate that a lightweight DL model can accurately replicate the signal processing chain of a commercial scanner for a given application. Evaluation on a 15 patient test dataset of about three thousand image frames gave a structural similarity index measure of 98.56 ± 0.49. Applying the DL model to data from another probe showed equivalent or improved image quality. This indicates that a single DL model may be used for a set of probes on a given system that targets the same application, which could be a cost-effective tuning and implementation strategy for vendors. Further, the DL model enhanced image quality on a Verasonics dataset, suggesting the potential to port features from high-end US systems to lower-end counterparts.
{"title":"Hardware-Independent Deep Signal Processing: A Feasibility Study in Echocardiography.","authors":"Erlend Loland Gundersen, Erik Smistad, Tollef Struksnes Jahren, Svein-Erik Masoy","doi":"10.1109/TUFFC.2024.3404622","DOIUrl":"10.1109/TUFFC.2024.3404622","url":null,"abstract":"<p><p>Deep learning (DL) models have emerged as alternative methods to conventional ultrasound (US) signal processing, offering the potential to mimic signal processing chains, reduce inference time, and enable the portability of processing chains across hardware. This paper proposes a DL model that replicates the fine-tuned BMode signal processing chain of a high-end US system and explores the potential of using it with a different probe and a lower-end system. A deep neural network was trained in a supervised manner to map raw beamformed in-phase and quadrature component data into processed images. The dataset consisted of 30,000 cardiac image frames acquired using the GE HealthCare Vivid E95 system with the 4Vc-D matrix array probe. The signal processing chain includes depth-dependent bandpass filtering, elevation compounding, frequency compounding, and image compression and filtering. The results indicate that a lightweight DL model can accurately replicate the signal processing chain of a commercial scanner for a given application. Evaluation on a 15 patient test dataset of about three thousand image frames gave a structural similarity index measure of 98.56 ± 0.49. Applying the DL model to data from another probe showed equivalent or improved image quality. This indicates that a single DL model may be used for a set of probes on a given system that targets the same application, which could be a cost-effective tuning and implementation strategy for vendors. Further, the DL model enhanced image quality on a Verasonics dataset, suggesting the potential to port features from high-end US systems to lower-end counterparts.</p>","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141086610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-03DOI: 10.1109/TUFFC.2024.3396796
Rashid Al Mukaddim, Emily Mackay, Nils Gessert, Ramon Erkamp, Shriram Sethuraman, Jonathan Sutton, Shyam Bharat, Melanie Jutras, Cristiana Baloescu, Christopher L Moore, Balasundar Raju
The reliability of automated image interpretation of point-of-care (POC) echocardiography scans depends on the quality of the acquired ultrasound data. This work reports on the development and validation of spatiotemporal deep learning models to assess the suitability of input ultrasound cine loops collected using a handheld echocardiography device for processing by an automated quantification algorithm (e.g. ejection fraction estimation). POC echocardiograms (n=885 DICOM cine loops from 175 patients) from two sites were collected using a handheld ultrasound device and annotated for image quality at the frame-level. Attributes of high-quality frames for left ventricular (LV) quantification included a temporally-stable LV, reasonable coverage of LV borders, and good contrast between the borders and chamber. Attributes of low-quality frames included temporal instability of the LV and/or imaging artifacts (e.g., lack of contrast, haze, reverberation, acoustic shadowing). Three different neural network architectures were investigated - (a) frame-level convolutional neural network (CNN) which operates on individual echo frames (VectorCNN), (b) single-stream sequence-level CNN which operates on a sequence of echo frames (VectorCNN+LSTM) and (c) two-stream sequence-level CNNs which operate on a sequence of echo and optical flow frames (VectorCNN+LSTM+Average, VectorCNN+LSTM+MinMax, and VectorCNN+LSTM+ConvPool). Evaluation on a sequestered test dataset containing 76 DICOM cine loops with 16,914 frames showed that VectorCNN+LSTM can effectively utilize both spatial and temporal information to regress the quality of an input frame (accuracy: 0.925, sensitivity = 0.860, specificity = 0.952), compared to the frame-level VectorCNN that only utilizes spatial information in that frame (accuracy: 0.903, sensitivity = 0.791, specificity = 0.949). Furthermore, an independent sample t-test indicated that the cine loops classified to be of adequate quality by the VectorCNN+LSTM model had a statistically significant lower bias in the automatically estimated EF (mean bias = - 3.73 ± 7.46 %, versus a clinically obtained reference EF) compared to the loops classified as inadequate (mean bias = -15.92 ± 12.17 %) (p = 0.007). Thus, cine loop stratification using the proposed spatiotemporal CNN model improves the reliability of automated point-of-care echocardiography image interpretation.
{"title":"Spatiotemporal Deep Learning-Based Cine Loop Quality Filter for Handheld Point-of-Care Echocardiography.","authors":"Rashid Al Mukaddim, Emily Mackay, Nils Gessert, Ramon Erkamp, Shriram Sethuraman, Jonathan Sutton, Shyam Bharat, Melanie Jutras, Cristiana Baloescu, Christopher L Moore, Balasundar Raju","doi":"10.1109/TUFFC.2024.3396796","DOIUrl":"https://doi.org/10.1109/TUFFC.2024.3396796","url":null,"abstract":"<p><p>The reliability of automated image interpretation of point-of-care (POC) echocardiography scans depends on the quality of the acquired ultrasound data. This work reports on the development and validation of spatiotemporal deep learning models to assess the suitability of input ultrasound cine loops collected using a handheld echocardiography device for processing by an automated quantification algorithm (e.g. ejection fraction estimation). POC echocardiograms (n=885 DICOM cine loops from 175 patients) from two sites were collected using a handheld ultrasound device and annotated for image quality at the frame-level. Attributes of high-quality frames for left ventricular (LV) quantification included a temporally-stable LV, reasonable coverage of LV borders, and good contrast between the borders and chamber. Attributes of low-quality frames included temporal instability of the LV and/or imaging artifacts (e.g., lack of contrast, haze, reverberation, acoustic shadowing). Three different neural network architectures were investigated - (a) frame-level convolutional neural network (CNN) which operates on individual echo frames (VectorCNN), (b) single-stream sequence-level CNN which operates on a sequence of echo frames (VectorCNN+LSTM) and (c) two-stream sequence-level CNNs which operate on a sequence of echo and optical flow frames (VectorCNN+LSTM+Average, VectorCNN+LSTM+MinMax, and VectorCNN+LSTM+ConvPool). Evaluation on a sequestered test dataset containing 76 DICOM cine loops with 16,914 frames showed that VectorCNN+LSTM can effectively utilize both spatial and temporal information to regress the quality of an input frame (accuracy: 0.925, sensitivity = 0.860, specificity = 0.952), compared to the frame-level VectorCNN that only utilizes spatial information in that frame (accuracy: 0.903, sensitivity = 0.791, specificity = 0.949). Furthermore, an independent sample t-test indicated that the cine loops classified to be of adequate quality by the VectorCNN+LSTM model had a statistically significant lower bias in the automatically estimated EF (mean bias = - 3.73 ± 7.46 %, versus a clinically obtained reference EF) compared to the loops classified as inadequate (mean bias = -15.92 ± 12.17 %) (p = 0.007). Thus, cine loop stratification using the proposed spatiotemporal CNN model improves the reliability of automated point-of-care echocardiography image interpretation.</p>","PeriodicalId":13322,"journal":{"name":"IEEE transactions on ultrasonics, ferroelectrics, and frequency control","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140858828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-30DOI: 10.1109/TUFFC.2024.3395490
Di Xiao;Pat De la Torre;Alfred C. H. Yu
Speed-of-sound (SoS) is an intrinsic acoustic property of human tissues and has been regarded as a potential biomarker of tissue health. To foster the clinical use of this emerging biomarker in medical diagnostics, it is important for SoS estimates to be derived and displayed in real time. Here, we demonstrate that concurrent global SoS estimation and B-mode imaging can be achieved live on a portable ultrasound scanner. Our innovation is hinged upon the design of a novel pulse-echo SoS estimation framework that is based on steered plane wave imaging. It has accounted for the effects of refraction and imaging depth when the medium SoS differs from the nominal value of 1540 m/s that is conventionally used in medical imaging. The accuracy of our SoS estimation framework was comparatively analyzed with through-transmit time-of-flight measurements in vitro on 15 custom agar phantoms with different SoS values (1508–1682 m/s) and in vivo on human calf muscles ( ${N} =9$