Joint Lesion Detection and Classification of Breast Ultrasound Video via a Clinical Knowledge-Aware Framework

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-08-30 DOI:10.1109/TCSVT.2024.3452497

Minglei Li;Wushuang Gong;Pengfei Yan;Xiang Li;Yuchen Jiang;Hao Luo;Hang Zhou;Shen Yin

{"title":"Joint Lesion Detection and Classification of Breast Ultrasound Video via a Clinical Knowledge-Aware Framework","authors":"Minglei Li;Wushuang Gong;Pengfei Yan;Xiang Li;Yuchen Jiang;Hao Luo;Hang Zhou;Shen Yin","doi":"10.1109/TCSVT.2024.3452497","DOIUrl":null,"url":null,"abstract":"Ultrasound is an important routine screening modality for breast cancer. Breast ultrasound screening is a dynamic process, and clinical practice involves radiologists recording representative frames during dynamic breast scanning for subsequent diagnosis. However, existing computer-assisted diagnosis methods often concentrate on dull diagnostic results by analyzing these representative frames and ignore the valuable information in the dynamic examination process that facilitates diagnosis. Moreover, breast lesions could exhibit various characteristic differences during scanning, and effective learning of lesion representations is challenging and may affect the clinical interpretability of the methods. To this end, we draw insights from the behavior of radiologists during the dynamic breast examination and leverage the knowledge of breast anatomy to propose a clinical knowledge-aware framework for lesion detection and classification of breast lesions in ultrasound videos. It is equipped with global-local attentive aggregation and a dynamic allocation mechanism that simulates the behavior of radiologists searching for diagnostic clues, thus integrating local localization and global semantic information from the video into the feature representation of the lesion. An anatomically-aware transformer is also designed to refine the lesion feature representation using spatial relationships within and across different anatomical layers of the breast anatomy. Extensive experiments show that the proposed framework can achieve competitive performance in both lesion detection and video classification tasks while exhibiting good clinical availability and interpretability, with an average precision of 40.80% and an AUC of 85.86% on our constructed breast video dataset and an average precision of 39.79% and an AUC of 87.04% on a publicly available dataset.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 1","pages":"45-61"},"PeriodicalIF":11.1000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659844/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Ultrasound is an important routine screening modality for breast cancer. Breast ultrasound screening is a dynamic process, and clinical practice involves radiologists recording representative frames during dynamic breast scanning for subsequent diagnosis. However, existing computer-assisted diagnosis methods often concentrate on dull diagnostic results by analyzing these representative frames and ignore the valuable information in the dynamic examination process that facilitates diagnosis. Moreover, breast lesions could exhibit various characteristic differences during scanning, and effective learning of lesion representations is challenging and may affect the clinical interpretability of the methods. To this end, we draw insights from the behavior of radiologists during the dynamic breast examination and leverage the knowledge of breast anatomy to propose a clinical knowledge-aware framework for lesion detection and classification of breast lesions in ultrasound videos. It is equipped with global-local attentive aggregation and a dynamic allocation mechanism that simulates the behavior of radiologists searching for diagnostic clues, thus integrating local localization and global semantic information from the video into the feature representation of the lesion. An anatomically-aware transformer is also designed to refine the lesion feature representation using spatial relationships within and across different anatomical layers of the breast anatomy. Extensive experiments show that the proposed framework can achieve competitive performance in both lesion detection and video classification tasks while exhibiting good clinical availability and interpretability, with an average precision of 40.80% and an AUC of 85.86% on our constructed breast video dataset and an average precision of 39.79% and an AUC of 87.04% on a publicly available dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过临床知识感知框架对乳腺超声视频进行联合病灶检测和分类

超声是一种重要的乳腺癌常规筛查方式。乳腺超声筛查是一个动态过程，临床实践需要放射科医生在动态乳房扫描过程中记录有代表性的帧，以供后续诊断。然而，现有的计算机辅助诊断方法往往集中在分析这些代表性帧的枯燥诊断结果上，而忽略了动态检查过程中有助于诊断的有价值的信息。此外，乳腺病变在扫描过程中可能表现出各种特征差异，对病变表征的有效学习具有挑战性，并可能影响该方法的临床可解释性。为此，我们借鉴放射科医生在动态乳腺检查中的行为，利用乳腺解剖学知识，提出超声影像中病变检测和乳腺病变分类的临床知识感知框架。它具有全局-局部关注聚合和动态分配机制，模拟放射科医生寻找诊断线索的行为，从而将视频中的局部定位和全局语义信息整合到病变的特征表示中。此外，还设计了一个解剖学感知转换器，利用乳房解剖结构不同解剖层内和层间的空间关系来细化病变特征表示。大量的实验表明，所提出的框架在病变检测和视频分类任务中都能达到有竞争力的性能，同时表现出良好的临床可用性和可解释性，在我们构建的乳房视频数据集上平均精度为40.80%，AUC为85.86%，在公开可用的数据集上平均精度为39.79%，AUC为87.04%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.