Visual Computing for Industry Biomedicine and Art最新文献_第2页

Parallel processing model for low-dose computed tomography image denoising. 低剂量计算机断层扫描图像去噪的并行处理模型

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-06-12 DOI: 10.1186/s42492-024-00165-8

Libing Yao, Jiping Wang, Zhongyi Wu, Qiang Du, Xiaodong Yang, Ming Li, Jian Zheng

Low-dose computed tomography (LDCT) has gained increasing attention owing to its crucial role in reducing radiation exposure in patients. However, LDCT-reconstructed images often suffer from significant noise and artifacts, negatively impacting the radiologists' ability to accurately diagnose. To address this issue, many studies have focused on denoising LDCT images using deep learning (DL) methods. However, these DL-based denoising methods have been hindered by the highly variable feature distribution of LDCT data from different imaging sources, which adversely affects the performance of current denoising models. In this study, we propose a parallel processing model, the multi-encoder deep feature transformation network (MDFTN), which is designed to enhance the performance of LDCT imaging for multisource data. Unlike traditional network structures, which rely on continual learning to process multitask data, the approach can simultaneously handle LDCT images within a unified framework from various imaging sources. The proposed MDFTN consists of multiple encoders and decoders along with a deep feature transformation module (DFTM). During forward propagation in network training, each encoder extracts diverse features from its respective data source in parallel and the DFTM compresses these features into a shared feature space. Subsequently, each decoder performs an inverse operation for multisource loss estimation. Through collaborative training, the proposed MDFTN leverages the complementary advantages of multisource data distribution to enhance its adaptability and generalization. Numerous experiments were conducted on two public datasets and one local dataset, which demonstrated that the proposed network model can simultaneously process multisource data while effectively suppressing noise and preserving fine structures. The source code is available at https://github.com/123456789ey/MDFTN .

低剂量计算机断层扫描（LDCT）在减少患者辐射暴露方面发挥着至关重要的作用，因此受到越来越多的关注。然而，LDCT 重建的图像往往存在严重的噪声和伪影，对放射科医生的准确诊断能力造成了负面影响。为解决这一问题，许多研究都侧重于使用深度学习（DL）方法对 LDCT 图像进行去噪。然而，这些基于深度学习的去噪方法受到了来自不同成像源的 LDCT 数据特征分布高度可变的阻碍，这对当前去噪模型的性能产生了不利影响。在本研究中，我们提出了一种并行处理模型--多编码器深度特征变换网络（MDFTN），旨在提高多源数据的 LDCT 成像性能。与依赖持续学习来处理多任务数据的传统网络结构不同，该方法可以在统一的框架内同时处理来自不同成像源的 LDCT 图像。拟议的 MDFTN 由多个编码器和解码器以及深度特征转换模块（DFTM）组成。在网络训练的前向传播过程中，每个编码器从各自的数据源中并行提取不同的特征，DFTM 将这些特征压缩到共享特征空间中。随后，每个解码器执行反操作，进行多源损失估计。通过协作训练，拟议的 MDFTN 充分利用了多源数据分布的互补优势，增强了其适应性和通用性。我们在两个公共数据集和一个本地数据集上进行了大量实验，结果表明所提出的网络模型可以同时处理多源数据，同时有效抑制噪声并保留精细结构。源代码见 https://github.com/123456789ey/MDFTN。

{"title":"Parallel processing model for low-dose computed tomography image denoising.","authors":"Libing Yao, Jiping Wang, Zhongyi Wu, Qiang Du, Xiaodong Yang, Ming Li, Jian Zheng","doi":"10.1186/s42492-024-00165-8","DOIUrl":"10.1186/s42492-024-00165-8","url":null,"abstract":"Low-dose computed tomography (LDCT) has gained increasing attention owing to its crucial role in reducing radiation exposure in patients. However, LDCT-reconstructed images often suffer from significant noise and artifacts, negatively impacting the radiologists' ability to accurately diagnose. To address this issue, many studies have focused on denoising LDCT images using deep learning (DL) methods. However, these DL-based denoising methods have been hindered by the highly variable feature distribution of LDCT data from different imaging sources, which adversely affects the performance of current denoising models. In this study, we propose a parallel processing model, the multi-encoder deep feature transformation network (MDFTN), which is designed to enhance the performance of LDCT imaging for multisource data. Unlike traditional network structures, which rely on continual learning to process multitask data, the approach can simultaneously handle LDCT images within a unified framework from various imaging sources. The proposed MDFTN consists of multiple encoders and decoders along with a deep feature transformation module (DFTM). During forward propagation in network training, each encoder extracts diverse features from its respective data source in parallel and the DFTM compresses these features into a shared feature space. Subsequently, each decoder performs an inverse operation for multisource loss estimation. Through collaborative training, the proposed MDFTN leverages the complementary advantages of multisource data distribution to enhance its adaptability and generalization. Numerous experiments were conducted on two public datasets and one local dataset, which demonstrated that the proposed network model can simultaneously process multisource data while effectively suppressing noise and preserving fine structures. The source code is available at https://github.com/123456789ey/MDFTN .","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11169366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141307010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simulated deep CT characterization of liver metastases with high-resolution filtered back projection reconstruction. 利用高分辨率滤波背投影重建模拟肝转移灶的深度 CT 特征。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-06-11 DOI: 10.1186/s42492-024-00161-y

Christopher Wiedeman, Peter Lorraine, Ge Wang, Richard Do, Amber Simpson, Jacob Peoples, Bruno De Man

Early diagnosis and accurate prognosis of colorectal cancer is critical for determining optimal treatment plans and maximizing patient outcomes, especially as the disease progresses into liver metastases. Computed tomography (CT) is a frontline tool for this task; however, the preservation of predictive radiomic features is highly dependent on the scanning protocol and reconstruction algorithm. We hypothesized that image reconstruction with a high-frequency kernel could result in a better characterization of liver metastases features via deep neural networks. This kernel produces images that appear noisier but preserve more sinogram information. A simulation pipeline was developed to study the effects of imaging parameters on the ability to characterize the features of liver metastases. This pipeline utilizes a fractal approach to generate a diverse population of shapes representing virtual metastases, and then it superimposes them on a realistic CT liver region to perform a virtual CT scan using CatSim. Datasets of 10,000 liver metastases were generated, scanned, and reconstructed using either standard or high-frequency kernels. These data were used to train and validate deep neural networks to recover crafted metastases characteristics, such as internal heterogeneity, edge sharpness, and edge fractal dimension. In the absence of noise, models scored, on average, 12.2% ( $α = 0.012$ ) and 7.5% ( $α = 0.049)$ lower squared error for characterizing edge sharpness and fractal dimension, respectively, when using high-frequency reconstructions compared to standard. However, the differences in performance were statistically insignificant when a typical level of CT noise was simulated in the clinical scan. Our results suggest that high-frequency reconstruction kernels can better preserve information for downstream artificial intelligence-based radiomic characterization, provided that noise is limited. Future work should investigate the information-preserving kernels in datasets with clinical labels.

结直肠癌的早期诊断和准确预后对于确定最佳治疗方案和最大限度地提高患者预后至关重要，尤其是当病情发展到肝转移时。计算机断层扫描（CT）是完成这一任务的前沿工具；然而，预测性放射学特征的保留在很大程度上取决于扫描方案和重建算法。我们假设，使用高频核进行图像重建可以通过深度神经网络更好地描述肝转移灶的特征。这种核产生的图像看起来更噪点，但保留了更多的正弦信息。为了研究成像参数对肝转移特征描述能力的影响，我们开发了一个模拟管道。该流水线利用分形方法生成代表虚拟转移灶的各种形状，然后将它们叠加到真实的 CT 肝区上，利用 CatSim 进行虚拟 CT 扫描。使用标准核或高频核生成、扫描和重建了 10,000 个肝转移灶数据集。这些数据用于训练和验证深度神经网络，以恢复精心制作的转移瘤特征，如内部异质性、边缘锐利度和边缘分形维度。在没有噪声的情况下，与标准模型相比，使用高频重建时，模型在表征边缘锐利度和分形维度方面的平方误差平均分别降低了 12.2% ( α = 0.012 ) 和 7.5% ( α = 0.049 ) 。然而，在临床扫描中模拟典型的 CT 噪声水平时，性能差异在统计学上并不显著。我们的研究结果表明，如果噪音有限，高频重建核可以更好地保留信息，用于基于人工智能的下游放射学特征描述。未来的工作应研究带有临床标签的数据集中的信息保存核。

{"title":"Simulated deep CT characterization of liver metastases with high-resolution filtered back projection reconstruction.","authors":"Christopher Wiedeman, Peter Lorraine, Ge Wang, Richard Do, Amber Simpson, Jacob Peoples, Bruno De Man","doi":"10.1186/s42492-024-00161-y","DOIUrl":"10.1186/s42492-024-00161-y","url":null,"abstract":"Early diagnosis and accurate prognosis of colorectal cancer is critical for determining optimal treatment plans and maximizing patient outcomes, especially as the disease progresses into liver metastases. Computed tomography (CT) is a frontline tool for this task; however, the preservation of predictive radiomic features is highly dependent on the scanning protocol and reconstruction algorithm. We hypothesized that image reconstruction with a high-frequency kernel could result in a better characterization of liver metastases features via deep neural networks. This kernel produces images that appear noisier but preserve more sinogram information. A simulation pipeline was developed to study the effects of imaging parameters on the ability to characterize the features of liver metastases. This pipeline utilizes a fractal approach to generate a diverse population of shapes representing virtual metastases, and then it superimposes them on a realistic CT liver region to perform a virtual CT scan using CatSim. Datasets of 10,000 liver metastases were generated, scanned, and reconstructed using either standard or high-frequency kernels. These data were used to train and validate deep neural networks to recover crafted metastases characteristics, such as internal heterogeneity, edge sharpness, and edge fractal dimension. In the absence of noise, models scored, on average, 12.2% ( <math><mrow><mi>α</mi> <mo>=</mo> <mn>0.012</mn></mrow> </math> ) and 7.5% ( <math><mrow><mi>α</mi> <mo>=</mo> <mn>0.049</mn> <mo>)</mo></mrow> </math> lower squared error for characterizing edge sharpness and fractal dimension, respectively, when using high-frequency reconstructions compared to standard. However, the differences in performance were statistically insignificant when a typical level of CT noise was simulated in the clinical scan. Our results suggest that high-frequency reconstruction kernels can better preserve information for downstream artificial intelligence-based radiomic characterization, provided that noise is limited. Future work should investigate the information-preserving kernels in datasets with clinical labels.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166620/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141301767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy. 字母发音的 Schlieren 成像和视频分类：利用语音流进行语音识别和语音治疗。

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-05-22 DOI: 10.1186/s42492-024-00163-w

Mohamed Talaat, Kian Barari, Xiuhua April Si, Jinxiang Xi

Speech is a highly coordinated process that requires precise control over vocal tract morphology/motion to produce intelligible sounds while simultaneously generating unique exhaled flow patterns. The schlieren imaging technique visualizes airflows with subtle density variations. It is hypothesized that speech flows captured by schlieren, when analyzed using a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM) network, can recognize alphabet pronunciations, thus facilitating automatic speech recognition and speech disorder therapy. This study evaluates the feasibility of using a CNN-based video classification network to differentiate speech flows corresponding to the first four alphabets: /A/, /B/, /C/, and /D/. A schlieren optical system was developed, and the speech flows of alphabet pronunciations were recorded for two participants at an acquisition rate of 60 frames per second. A total of 640 video clips, each lasting 1 s, were utilized to train and test a hybrid CNN-LSTM network. Acoustic analyses of the recorded sounds were conducted to understand the phonetic differences among the four alphabets. The hybrid CNN-LSTM network was trained separately on four datasets of varying sizes (i.e., 20, 30, 40, 50 videos per alphabet), all achieving over 95% accuracy in classifying videos of the same participant. However, the network's performance declined when tested on speech flows from a different participant, with accuracy dropping to around 44%, indicating significant inter-participant variability in alphabet pronunciation. Retraining the network with videos from both participants improved accuracy to 93% on the second participant. Analysis of misclassified videos indicated that factors such as low video quality and disproportional head size affected accuracy. These results highlight the potential of CNN-assisted speech recognition and speech therapy using articulation flows, although challenges remain in expanding the alphabet set and participant cohort.

语音是一个高度协调的过程，需要对声道形态/运动进行精确控制，才能发出清晰的声音，同时产生独特的呼出气流模式。Schlieren 成像技术可将具有微妙密度变化的气流可视化。据推测，使用卷积神经网络（CNN）和长短期记忆（LSTM）网络的混合网络分析时，通过schlieren捕捉到的语音流可以识别字母发音，从而促进自动语音识别和语言障碍治疗。本研究评估了使用基于 CNN 的视频分类网络区分与前四个字母相对应的语音流的可行性：/A/、/B/、/C/和/D/。我们开发了一套裂隙光学系统，并以每秒 60 帧的采集率记录了两名参与者的字母发音语音流。共使用了 640 个视频片段（每个片段持续 1 秒）来训练和测试混合 CNN-LSTM 网络。对录制的声音进行了声学分析，以了解四种字母之间的语音差异。混合 CNN-LSTM 网络分别在四个不同规模的数据集（即每个字母 20、30、40 和 50 个视频）上进行了训练，在对同一参与者的视频进行分类时，准确率均超过 95%。然而，在对不同参与者的语音流进行测试时，该网络的性能有所下降，准确率降到了 44% 左右，这表明字母发音在参与者之间存在很大差异。使用两名参与者的视频重新训练网络后，第二名参与者的准确率提高到 93%。对错误分类视频的分析表明，视频质量低和头部大小比例失调等因素影响了准确率。这些结果凸显了利用发音流进行 CNN 辅助语音识别和语音治疗的潜力，尽管在扩大字母集和参与者群方面仍存在挑战。

{"title":"Schlieren imaging and video classification of alphabet pronunciations: exploiting phonetic flows for speech recognition and speech therapy.","authors":"Mohamed Talaat, Kian Barari, Xiuhua April Si, Jinxiang Xi","doi":"10.1186/s42492-024-00163-w","DOIUrl":"10.1186/s42492-024-00163-w","url":null,"abstract":"Speech is a highly coordinated process that requires precise control over vocal tract morphology/motion to produce intelligible sounds while simultaneously generating unique exhaled flow patterns. The schlieren imaging technique visualizes airflows with subtle density variations. It is hypothesized that speech flows captured by schlieren, when analyzed using a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM) network, can recognize alphabet pronunciations, thus facilitating automatic speech recognition and speech disorder therapy. This study evaluates the feasibility of using a CNN-based video classification network to differentiate speech flows corresponding to the first four alphabets: /A/, /B/, /C/, and /D/. A schlieren optical system was developed, and the speech flows of alphabet pronunciations were recorded for two participants at an acquisition rate of 60 frames per second. A total of 640 video clips, each lasting 1 s, were utilized to train and test a hybrid CNN-LSTM network. Acoustic analyses of the recorded sounds were conducted to understand the phonetic differences among the four alphabets. The hybrid CNN-LSTM network was trained separately on four datasets of varying sizes (i.e., 20, 30, 40, 50 videos per alphabet), all achieving over 95% accuracy in classifying videos of the same participant. However, the network's performance declined when tested on speech flows from a different participant, with accuracy dropping to around 44%, indicating significant inter-participant variability in alphabet pronunciation. Retraining the network with videos from both participants improved accuracy to 93% on the second participant. Analysis of misclassified videos indicated that factors such as low video quality and disproportional head size affected accuracy. These results highlight the potential of CNN-assisted speech recognition and speech therapy using articulation flows, although challenges remain in expanding the alphabet set and participant cohort.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11109036/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141075115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

V4RIN: visual analysis of regional industry network with domain knowledge. V4RIN：利用领域知识对区域产业网络进行可视化分析。

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-05-15 DOI: 10.1186/s42492-024-00164-9

Wenli Xiong, Chenjie Yu, Chen Shi, Yaxuan Zheng, Xiping Wang, Yanpeng Hu, Hong Yin, Chenhui Li, Changbo Wang

The regional industry network (RIN) is a type of financial network derived from industry networks that possess the capability to describe the connections between specific industries within a particular region. For most investors and financial analysts lacking extensive experience, the decision-support information provided by industry networks may be too vague. Conversely, RINs express more detailed and specific industry connections both within and outside the region. As RIN analysis is domain-specific and current financial network analysis tools are designed for generalized analytical tasks and cannot be directly applied to RINs, new visual analysis approaches are needed to enhance information exploration efficiency. In this study, we collaborated with domain experts and proposed V4RIN, an interactive visualization analysis system that integrates predefined domain knowledge and data processing methods to support users in uploading custom data. Through multiple views in the system panel, users can comprehensively explore the structure, geographical distribution, and spatiotemporal variations of the RIN. Two case studies were conducted and a set of expert interviews with five domain experts to validate the usability and reliability of our system.

区域产业网络（RIN）是从产业网络中衍生出来的一种金融网络，具有描述特定区域内特定产业之间联系的能力。对于大多数缺乏丰富经验的投资者和金融分析师来说，产业网络提供的决策支持信息可能过于模糊。相反，RIN 则能更详细、更具体地表达区域内外的产业联系。由于 RIN 分析是针对特定领域的，而当前的金融网络分析工具是为通用分析任务而设计的，无法直接应用于 RIN，因此需要新的可视化分析方法来提高信息探索效率。在本研究中，我们与领域专家合作，提出了 V4RIN 交互式可视化分析系统，该系统整合了预定义的领域知识和数据处理方法，支持用户上传自定义数据。通过系统面板上的多个视图，用户可以全面探索 RIN 的结构、地理分布和时空变化。为了验证我们系统的可用性和可靠性，我们进行了两项案例研究，并与五位领域专家进行了一组专家访谈。

{"title":"V4RIN: visual analysis of regional industry network with domain knowledge.","authors":"Wenli Xiong, Chenjie Yu, Chen Shi, Yaxuan Zheng, Xiping Wang, Yanpeng Hu, Hong Yin, Chenhui Li, Changbo Wang","doi":"10.1186/s42492-024-00164-9","DOIUrl":"10.1186/s42492-024-00164-9","url":null,"abstract":"The regional industry network (RIN) is a type of financial network derived from industry networks that possess the capability to describe the connections between specific industries within a particular region. For most investors and financial analysts lacking extensive experience, the decision-support information provided by industry networks may be too vague. Conversely, RINs express more detailed and specific industry connections both within and outside the region. As RIN analysis is domain-specific and current financial network analysis tools are designed for generalized analytical tasks and cannot be directly applied to RINs, new visual analysis approaches are needed to enhance information exploration efficiency. In this study, we collaborated with domain experts and proposed V4RIN, an interactive visualization analysis system that integrates predefined domain knowledge and data processing methods to support users in uploading custom data. Through multiple views in the system panel, users can comprehensively explore the structure, geographical distribution, and spatiotemporal variations of the RIN. Two case studies were conducted and a set of expert interviews with five domain experts to validate the usability and reliability of our system.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11096142/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140923529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Typicality- and instance-dependent label noise-combating: a novel framework for simulating and combating real-world noisy labels for endoscopic polyp classification. 典型性和实例依赖性标签降噪：为内窥镜息肉分类模拟和消除真实世界噪声标签的新型框架。

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-05-06 DOI: 10.1186/s42492-024-00162-x

Yun Gao, Junhu Fu, Yuanyuan Wang, Yi Guo

Learning with noisy labels aims to train neural networks with noisy labels. Current models handle instance-independent label noise (IIN) well; however, they fall short with real-world noise. In medical image classification, atypical samples frequently receive incorrect labels, rendering instance-dependent label noise (IDN) an accurate representation of real-world scenarios. However, the current IDN approaches fail to consider the typicality of samples, which hampers their ability to address real-world label noise effectively. To alleviate the issues, we introduce typicality- and instance-dependent label noise (TIDN) to simulate real-world noise and establish a TIDN-combating framework to combat label noise. Specifically, we use the sample's distance to decision boundaries in the feature space to represent typicality. The TIDN is then generated according to typicality. We establish a TIDN-attention module to combat label noise and learn the transition matrix from latent ground truth to the observed noisy labels. A recursive algorithm that enables the network to make correct predictions with corrections from the learned transition matrix is proposed. Our experiments demonstrate that the TIDN simulates real-world noise more closely than the existing IIN and IDN. Furthermore, the TIDN-combating framework demonstrates superior classification performance when training with simulated TIDN and actual real-world noise.

使用噪声标签学习旨在训练使用噪声标签的神经网络。目前的模型能很好地处理与实例无关的标签噪声（IIN），但它们在处理真实世界的噪声时就显得力不从心了。在医学图像分类中，非典型样本经常会收到不正确的标签，这使得与实例无关的标签噪声（IDN）成为真实世界场景的准确表征。然而，目前的 IDN 方法没有考虑样本的典型性，这就阻碍了它们有效解决真实世界标签噪声的能力。为了缓解这些问题，我们引入了典型性和实例依赖性标签噪声（TIDN）来模拟真实世界的噪声，并建立了一个 TIDN 对抗框架来对抗标签噪声。具体来说，我们使用样本与特征空间中决策边界的距离来表示典型性。然后根据典型性生成 TIDN。我们建立了一个 TIDN-注意模块来对抗标签噪声，并学习从潜在基本真实到观察到的噪声标签的过渡矩阵。我们还提出了一种递归算法，使网络能够根据所学过渡矩阵的修正做出正确的预测。我们的实验证明，TIDN 比现有的 IIN 和 IDN 更能模拟真实世界的噪声。此外，在使用模拟 TIDN 和实际真实世界噪声进行训练时，TIDN 对抗框架也表现出了卓越的分类性能。

{"title":"Typicality- and instance-dependent label noise-combating: a novel framework for simulating and combating real-world noisy labels for endoscopic polyp classification.","authors":"Yun Gao, Junhu Fu, Yuanyuan Wang, Yi Guo","doi":"10.1186/s42492-024-00162-x","DOIUrl":"10.1186/s42492-024-00162-x","url":null,"abstract":"Learning with noisy labels aims to train neural networks with noisy labels. Current models handle instance-independent label noise (IIN) well; however, they fall short with real-world noise. In medical image classification, atypical samples frequently receive incorrect labels, rendering instance-dependent label noise (IDN) an accurate representation of real-world scenarios. However, the current IDN approaches fail to consider the typicality of samples, which hampers their ability to address real-world label noise effectively. To alleviate the issues, we introduce typicality- and instance-dependent label noise (TIDN) to simulate real-world noise and establish a TIDN-combating framework to combat label noise. Specifically, we use the sample's distance to decision boundaries in the feature space to represent typicality. The TIDN is then generated according to typicality. We establish a TIDN-attention module to combat label noise and learn the transition matrix from latent ground truth to the observed noisy labels. A recursive algorithm that enables the network to make correct predictions with corrections from the learned transition matrix is proposed. Our experiments demonstrate that the TIDN simulates real-world noise more closely than the existing IIN and IDN. Furthermore, the TIDN-combating framework demonstrates superior classification performance when training with simulated TIDN and actual real-world noise.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11074096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140870083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual modality prompt learning for visual question-grounded answering in robotic surgery 机器人手术中视觉问题解答的双模式提示学习

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-04-22 DOI: 10.1186/s42492-024-00160-z

Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei

With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.

随着机器人手术的不断发展，视觉问题解答（VQA）技术也取得了长足进步。现有的视觉问题解答系统通常会生成问题的文本答案，但无法指出相关内容在图像中的位置。这一局限性限制了 VQA 模型的解释能力及其探索特定图像区域的能力。为解决这一问题，本研究提出了一种用于机器人手术的 VQA 模型，该模型能够在预测答案时定位特定区域。本研究从语言模型中的提示学习中汲取灵感，开发了一种双模态提示模型，以增强精确的多模态信息交互。具体来说，该模型引入了两个互补提示器，将视觉和文本提示有效地整合到模型的编码过程中。视觉互补提示器将视觉提示知识与视觉信息特征相结合，以指导精确定位。文本互补提示器将视觉信息与文本提示知识和文本信息相统一，引导文本信息更准确地推断答案。此外，还采用了多重迭代融合策略进行综合答案推理，以确保高质量地生成文本答案和落地答案。实验结果验证了该模型的有效性，证明其在 EndoVis-18 和 EndoVis-17 数据集上优于现有方法。

{"title":"Dual modality prompt learning for visual question-grounded answering in robotic surgery","authors":"Yue Zhang, Wanshu Fan, Peixi Peng, Xin Yang, Dongsheng Zhou, Xiaopeng Wei","doi":"10.1186/s42492-024-00160-z","DOIUrl":"https://doi.org/10.1186/s42492-024-00160-z","url":null,"abstract":"With recent advancements in robotic surgery, notable strides have been made in visual question answering (VQA). Existing VQA systems typically generate textual answers to questions but fail to indicate the location of the relevant content within the image. This limitation restricts the interpretative capacity of the VQA models and their ability to explore specific image regions. To address this issue, this study proposes a grounded VQA model for robotic surgery, capable of localizing a specific region during answer prediction. Drawing inspiration from prompt learning in language models, a dual-modality prompt model was developed to enhance precise multimodal information interactions. Specifically, two complementary prompters were introduced to effectively integrate visual and textual prompts into the encoding process of the model. A visual complementary prompter merges visual prompt knowledge with visual information features to guide accurate localization. The textual complementary prompter aligns visual information with textual prompt knowledge and textual information, guiding textual information towards a more accurate inference of the answer. Additionally, a multiple iterative fusion strategy was adopted for comprehensive answer reasoning, to ensure high-quality generation of textual and grounded answers. The experimental results validate the effectiveness of the model, demonstrating its superiority over existing methods on the EndoVis-18 and EndoVis-17 datasets.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140636841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated analysis of pectoralis major thickness in pec-fly exercises: evolving from manual measurement to deep learning techniques 自动分析胸肌练习中的胸大肌厚度：从人工测量到深度学习技术的演变

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-04-16 DOI: 10.1186/s42492-024-00159-6

Shangyu Cai, Yongsheng Lin, Haoxin Chen, Zihao Huang, Yongjin Zhou, Yongping Zheng

This study addresses a limitation of prior research on pectoralis major (PMaj) thickness changes during the pectoralis fly exercise using a wearable ultrasound imaging setup. Although previous studies used manual measurement and subjective evaluation, it is important to acknowledge the subsequent limitations of automating widespread applications. We then employed a deep learning model for image segmentation and automated measurement to solve the problem and study the additional quantitative supplementary information that could be provided. Our results revealed increased PMaj thickness changes in the coronal plane within the probe detection region when real-time ultrasound imaging (RUSI) visual biofeedback was incorporated, regardless of load intensity (50% or 80% of one-repetition maximum). Additionally, participants showed uniform thickness changes in the PMaj in response to enhanced RUSI biofeedback. Notably, the differences in PMaj thickness changes between load intensities were reduced by RUSI biofeedback, suggesting altered muscle activation strategies. We identified the optimal measurement location for the maximal PMaj thickness close to the rib end and emphasized the lightweight applicability of our model for fitness training and muscle assessment. Further studies can refine load intensities, investigate diverse parameters, and employ different network models to enhance accuracy. This study contributes to our understanding of the effects of muscle physiology and exercise training.

这项研究利用可穿戴式超声波成像装置，解决了以往研究胸大肌（PMaj）厚度在胸肌飞鸟运动中变化的局限性。尽管之前的研究使用手动测量和主观评价，但必须承认自动化广泛应用的后续局限性。于是，我们采用深度学习模型进行图像分割和自动测量，以解决这一问题，并研究可提供的额外定量补充信息。我们的研究结果表明，当结合实时超声成像（RUSI）视觉生物反馈时，无论负荷强度（50% 或 80% 的单次重复最大值）如何，探头检测区域内冠状面的 PMaj 厚度变化都会增加。此外，参试者的 PMaj 在增强的 RUSI 生物反馈下显示出均匀的厚度变化。值得注意的是，RUSI 生物反馈减少了不同负荷强度下 PMaj 厚度变化的差异，这表明肌肉激活策略发生了改变。我们确定了最大 PMaj 厚度的最佳测量位置，靠近肋骨末端，并强调了我们的模型在健身训练和肌肉评估中的轻量级适用性。进一步的研究可以改进负荷强度，研究不同的参数，并采用不同的网络模型来提高准确性。这项研究有助于我们了解肌肉生理学和运动训练的影响。

{"title":"Automated analysis of pectoralis major thickness in pec-fly exercises: evolving from manual measurement to deep learning techniques","authors":"Shangyu Cai, Yongsheng Lin, Haoxin Chen, Zihao Huang, Yongjin Zhou, Yongping Zheng","doi":"10.1186/s42492-024-00159-6","DOIUrl":"https://doi.org/10.1186/s42492-024-00159-6","url":null,"abstract":"This study addresses a limitation of prior research on pectoralis major (PMaj) thickness changes during the pectoralis fly exercise using a wearable ultrasound imaging setup. Although previous studies used manual measurement and subjective evaluation, it is important to acknowledge the subsequent limitations of automating widespread applications. We then employed a deep learning model for image segmentation and automated measurement to solve the problem and study the additional quantitative supplementary information that could be provided. Our results revealed increased PMaj thickness changes in the coronal plane within the probe detection region when real-time ultrasound imaging (RUSI) visual biofeedback was incorporated, regardless of load intensity (50% or 80% of one-repetition maximum). Additionally, participants showed uniform thickness changes in the PMaj in response to enhanced RUSI biofeedback. Notably, the differences in PMaj thickness changes between load intensities were reduced by RUSI biofeedback, suggesting altered muscle activation strategies. We identified the optimal measurement location for the maximal PMaj thickness close to the rib end and emphasized the lightweight applicability of our model for fitness training and muscle assessment. Further studies can refine load intensities, investigate diverse parameters, and employ different network models to enhance accuracy. This study contributes to our understanding of the effects of muscle physiology and exercise training.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140586928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Three-dimensional reconstruction of industrial parts from a single image. 通过单张图像进行工业部件的三维重建。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-03-27 DOI: 10.1186/s42492-024-00158-7

Zhenxing Xu, Aizeng Wang, Fei Hou, Gang Zhao

This study proposes an image-based three-dimensional (3D) vector reconstruction of industrial parts that can generate non-uniform rational B-splines (NURBS) surfaces with high fidelity and flexibility. The contributions of this study include three parts: first, a dataset of two-dimensional images is constructed for typical industrial parts, including hexagonal head bolts, cylindrical gears, shoulder rings, hexagonal nuts, and cylindrical roller bearings; second, a deep learning algorithm is developed for parameter extraction of 3D industrial parts, which can determine the final 3D parameters and pose information of the reconstructed model using two new nets, CAD-ClassNet and CAD-ReconNet; and finally, a 3D vector shape reconstruction of mechanical parts is presented to generate NURBS from the obtained shape parameters. The final reconstructed models show that the proposed approach is highly accurate, efficient, and practical.

本研究提出了一种基于图像的工业零件三维（3D）矢量重建方法，它可以生成高保真、高灵活性的非均匀有理 B 样条（NURBS）曲面。本研究的贡献包括三个部分：首先，构建了典型工业零件的二维图像数据集，包括六角头螺栓、圆柱齿轮、肩环、六角螺母和圆柱滚子轴承；其次，开发了用于三维工业零件参数提取的深度学习算法，该算法可使用两个新网络（CAD-ClassNet 和 CAD-ReconNet）确定重建模型的最终三维参数和姿态信息；最后，提出了机械零件的三维矢量形状重建方法，根据获得的形状参数生成 NURBS。最终重建的模型表明，所提出的方法非常准确、高效和实用。

引用次数: 0

PlaqueNet: deep learning enabled coronary artery plaque segmentation from coronary computed tomography angiography. PlaqueNet：通过深度学习从冠状动脉计算机断层扫描血管造影中分割冠状动脉斑块。

IF 3.2 4区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-03-22 DOI: 10.1186/s42492-024-00157-8

Linyuan Wang, Xiaofeng Zhang, Congyu Tian, Shu Chen, Yongzhi Deng, Xiangyun Liao, Qiong Wang, Weixin Si

Cardiovascular disease, primarily caused by atherosclerotic plaque formation, is a significant health concern. The early detection of these plaques is crucial for targeted therapies and reducing the risk of cardiovascular diseases. This study presents PlaqueNet, a solution for segmenting coronary artery plaques from coronary computed tomography angiography (CCTA) images. For feature extraction, the advanced residual net module was utilized, which integrates a deepwise residual optimization module into network branches, enhances feature extraction capabilities, avoiding information loss, and addresses gradient issues during training. To improve segmentation accuracy, a depthwise atrous spatial pyramid pooling based on bicubic efficient channel attention (DASPP-BICECA) module is introduced. The BICECA component amplifies the local feature sensitivity, whereas the DASPP component expands the network's information-gathering scope, resulting in elevated segmentation accuracy. Additionally, BINet, a module for joint network loss evaluation, is proposed. It optimizes the segmentation model without affecting the segmentation results. When combined with the DASPP-BICECA module, BINet enhances overall efficiency. The CCTA segmentation algorithm proposed in this study outperformed the other three comparative algorithms, achieving an intersection over Union of 87.37%, Dice of 93.26%, accuracy of 93.12%, mean intersection over Union of 93.68%, mean Dice of 96.63%, and mean pixel accuracy value of 96.55%.

主要由动脉粥样硬化斑块形成引起的心血管疾病是一个重大的健康问题。这些斑块的早期检测对于靶向治疗和降低心血管疾病风险至关重要。本研究介绍了从冠状动脉计算机断层扫描（CCTA）图像中分割冠状动脉斑块的解决方案 PlaqueNet。在特征提取方面，采用了先进的残差网模块，该模块将深度残差优化模块集成到网络分支中，增强了特征提取能力，避免了信息丢失，并解决了训练过程中的梯度问题。为提高分割精度，引入了基于双立方高效通道注意的深度无性空间金字塔池化（DASPP-BICECA）模块。BICECA 部分放大了局部特征灵敏度，而 DASPP 部分则扩大了网络的信息收集范围，从而提高了分割精度。此外，还提出了联合网络损失评估模块 BINet。它在不影响分割结果的情况下优化了分割模型。当与 DASPP-BICECA 模块结合使用时，BINet 可提高整体效率。本研究提出的 CCTA 分割算法优于其他三种比较算法，实现了 87.37% 的联合交叉率、93.26% 的骰子率、93.12% 的准确率、93.68% 的平均联合交叉率、96.63% 的平均骰子率和 96.55% 的平均像素准确率。

{"title":"PlaqueNet: deep learning enabled coronary artery plaque segmentation from coronary computed tomography angiography.","authors":"Linyuan Wang, Xiaofeng Zhang, Congyu Tian, Shu Chen, Yongzhi Deng, Xiangyun Liao, Qiong Wang, Weixin Si","doi":"10.1186/s42492-024-00157-8","DOIUrl":"10.1186/s42492-024-00157-8","url":null,"abstract":"Cardiovascular disease, primarily caused by atherosclerotic plaque formation, is a significant health concern. The early detection of these plaques is crucial for targeted therapies and reducing the risk of cardiovascular diseases. This study presents PlaqueNet, a solution for segmenting coronary artery plaques from coronary computed tomography angiography (CCTA) images. For feature extraction, the advanced residual net module was utilized, which integrates a deepwise residual optimization module into network branches, enhances feature extraction capabilities, avoiding information loss, and addresses gradient issues during training. To improve segmentation accuracy, a depthwise atrous spatial pyramid pooling based on bicubic efficient channel attention (DASPP-BICECA) module is introduced. The BICECA component amplifies the local feature sensitivity, whereas the DASPP component expands the network's information-gathering scope, resulting in elevated segmentation accuracy. Additionally, BINet, a module for joint network loss evaluation, is proposed. It optimizes the segmentation model without affecting the segmentation results. When combined with the DASPP-BICECA module, BINet enhances overall efficiency. The CCTA segmentation algorithm proposed in this study outperformed the other three comparative algorithms, achieving an intersection over Union of 87.37%, Dice of 93.26%, accuracy of 93.12%, mean intersection over Union of 93.68%, mean Dice of 96.63%, and mean pixel accuracy value of 96.55%.","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11349722/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140185849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flipover outperforms dropout in deep learning 在深度学习中，Flipover 优于 Dropout

IF 2.8 4区计算机科学 Q1 Arts and Humanities

Visual Computing for Industry Biomedicine and Art

Pub Date : 2024-02-22 DOI: 10.1186/s42492-024-00153-y

Yuxuan Liang, Chuang Niu, Pingkun Yan, Ge Wang

Flipover, an enhanced dropout technique, is introduced to improve the robustness of artificial neural networks. In contrast to dropout, which involves randomly removing certain neurons and their connections, flipover randomly selects neurons and reverts their outputs using a negative multiplier during training. This approach offers stronger regularization than conventional dropout, refining model performance by (1) mitigating overfitting, matching or even exceeding the efficacy of dropout; (2) amplifying robustness to noise; and (3) enhancing resilience against adversarial attacks. Extensive experiments across various neural networks affirm the effectiveness of flipover in deep learning.

翻转（Flipover）是一种增强的剔除技术，用于提高人工神经网络的鲁棒性。与随机移除某些神经元及其连接的 dropout 相比，flipover 是在训练过程中随机选择神经元并使用负乘数还原其输出。这种方法比传统的剔除方法提供了更强的正则化，通过以下方式完善了模型的性能：（1）减轻过拟合，与剔除的功效相匹配甚至超过；（2）增强对噪声的鲁棒性；以及（3）增强对对抗性攻击的复原力。在各种神经网络中进行的大量实验证实了翻转在深度学习中的有效性。

引用次数: 0