首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
DFEDC: Dual fusion with enhanced deformable convolution for medical image segmentation DFEDC:利用增强型可变形卷积进行医学图像分割的双重融合
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-13 DOI: 10.1016/j.imavis.2024.105277
Xian Fang, Yueqian Pan, Qiaohong Chen

Considering the complexity of lesion regions in medical images, current researches relying on CNNs typically employ large-kernel convolutions to expand the receptive field and enhance segmentation quality. However, these convolution methods are hindered by substantial computational requirements and limited capacity to extract contextual and multi-scale information, making it challenging to efficiently segment complex regions. To address this issue, we propose a dual fusion with enhanced deformable convolution network, namely DFEDC, which dynamically adjusts the receptive field and simultaneously integrates multi-scale feature information to effectively segment complex lesion areas and process boundaries. Firstly, we combine global channel and spatial fusion in a serial way, which integrates and reuses global channel attention and fully connected layers to achieve lightweight extraction of channel and spatial information. Additionally, we design a structured deformable convolution (SDC) that structures deformable convolution with inceptions and large kernel attention, and enhances the learning of offsets through parallel fusion to efficiently extract multi-scale feature information. To compensate for the loss of spatial information of SDC, we introduce a hybrid 2D and 3D feature extraction module to transform feature extraction from a single dimension to a fusion of 2D and 3D. Extensive experimental results on the Synapse, ACDC, and ISIC-2018 datasets demonstrate that our proposed DFEDC achieves superior results.

考虑到医学图像中病变区域的复杂性,目前依赖 CNN 的研究通常采用大核卷积来扩大感受野并提高分割质量。然而,这些卷积方法存在计算量大、提取上下文和多尺度信息的能力有限等问题,因而难以有效分割复杂区域。为解决这一问题,我们提出了增强型可变形卷积网络双重融合方法,即 DFEDC,它能动态调整感受野,同时整合多尺度特征信息,从而有效分割复杂病变区域和过程边界。首先,我们将全局信道和空间融合以串联的方式结合起来,整合并重用全局信道注意力和全连接层,实现信道和空间信息的轻量级提取。此外,我们还设计了一种结构化可变形卷积(SDC),将可变形卷积与概念和大核注意力进行结构化,并通过并行融合增强偏移学习,从而高效提取多尺度特征信息。为了弥补 SDC 的空间信息损失,我们引入了二维和三维混合特征提取模块,将特征提取从单一维度转变为二维和三维融合。在 Synapse、ACDC 和 ISIC-2018 数据集上的大量实验结果表明,我们提出的 DFEDC 取得了卓越的效果。
{"title":"DFEDC: Dual fusion with enhanced deformable convolution for medical image segmentation","authors":"Xian Fang,&nbsp;Yueqian Pan,&nbsp;Qiaohong Chen","doi":"10.1016/j.imavis.2024.105277","DOIUrl":"10.1016/j.imavis.2024.105277","url":null,"abstract":"<div><p>Considering the complexity of lesion regions in medical images, current researches relying on CNNs typically employ large-kernel convolutions to expand the receptive field and enhance segmentation quality. However, these convolution methods are hindered by substantial computational requirements and limited capacity to extract contextual and multi-scale information, making it challenging to efficiently segment complex regions. To address this issue, we propose a dual fusion with enhanced deformable convolution network, namely DFEDC, which dynamically adjusts the receptive field and simultaneously integrates multi-scale feature information to effectively segment complex lesion areas and process boundaries. Firstly, we combine global channel and spatial fusion in a serial way, which integrates and reuses global channel attention and fully connected layers to achieve lightweight extraction of channel and spatial information. Additionally, we design a structured deformable convolution (SDC) that structures deformable convolution with inceptions and large kernel attention, and enhances the learning of offsets through parallel fusion to efficiently extract multi-scale feature information. To compensate for the loss of spatial information of SDC, we introduce a hybrid 2D and 3D feature extraction module to transform feature extraction from a single dimension to a fusion of 2D and 3D. Extensive experimental results on the Synapse, ACDC, and ISIC-2018 datasets demonstrate that our proposed DFEDC achieves superior results.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105277"},"PeriodicalIF":4.2,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation VLAI:基于视觉语言对齐信息的探索和利用,用于机器人目标导航
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105259
Haonan Luo, Yijie Zeng, Li Yang, Kexun Chen, Zhixuan Shen, Fengmao Lv

Object Goal Navigation(ObjectNav) is the task that an agent need navigate to an instance of a specific category in an unseen environment through visual observations within limited time steps. This work plays a significant role in enhancing the efficiency of locating specific items in indoor spaces and assisting individuals in completing various tasks, as well as providing support for people with disabilities. To achieve efficient ObjectNav in unfamiliar environments, global perception capabilities, understanding the regularities of space and semantics in the environment layout are significant. In this work, we propose an explicit-prediction method called VLAI that utilizes visual-language alignment information to guide the agent's exploration, unlike previous navigation methods based on frontier potential prediction or egocentric map completion, which only leverage visual observations to construct semantic maps, thus failing to help the agent develop a better global perception. Specifically, when predicting long-term goals, we retrieve previously saved visual observations to obtain visual information around the frontiers based on their position on the incrementally built incomplete semantic map. Then, we apply our designed Chat Describer to this visual information to obtain detailed frontier object descriptions. The Chat Describer, a novel automatic-questioning approach deployed in Visual-to-Language, is composed of Large Language Model(LLM) and the visual-to-language model(VLM), which has visual question-answering functionality. In addition, we also obtain the semantic similarity of target object and frontier object categories. Ultimately, by combining the semantic similarity and the boundary descriptions, the agent can predict the long-term goals more accurately. Our experiments on the Gibson and HM3D datasets reveal that our VLAI approach yields significantly better results compared to earlier methods. The code is released at

https://github.com/31539lab/VLAI.

物体目标导航(Object Goal Navigation,ObjectNav)是指一个代理需要在有限的时间步骤内,通过视觉观察导航到未知环境中特定类别的实例。这项工作在提高室内空间特定物品的定位效率、协助个人完成各种任务以及为残障人士提供支持方面发挥着重要作用。要在陌生环境中实现高效的目标导航,全局感知能力、对空间规律性的理解以及环境布局中的语义都非常重要。在这项工作中,我们提出了一种名为 VLAI 的显式预测方法,该方法利用视觉语言对齐信息来引导代理进行探索,这与以往基于前沿势能预测或自我中心地图补全的导航方法不同,后者仅利用视觉观察来构建语义地图,从而无法帮助代理发展更好的全局感知能力。具体来说,在预测长期目标时,我们会检索之前保存的视觉观察结果,根据它们在增量构建的不完整语义地图上的位置,获取边界周围的视觉信息。然后,我们将设计的聊天描述器应用到这些视觉信息中,以获得详细的前沿对象描述。聊天描述器是在视觉转语言中部署的一种新型自动提问方法,由大语言模型(LLM)和视觉转语言模型(VLM)组成,后者具有视觉问题解答功能。此外,我们还获得了目标对象和前沿对象类别的语义相似性。最终,通过结合语义相似性和边界描述,代理可以更准确地预测长期目标。我们在 Gibson 和 HM3D 数据集上的实验表明,与早期的方法相比,我们的 VLAI 方法能产生明显更好的结果。代码发布于:https://github.com/31539lab/VLAI。
{"title":"VLAI: Exploration and Exploitation based on Visual-Language Aligned Information for Robotic Object Goal Navigation","authors":"Haonan Luo,&nbsp;Yijie Zeng,&nbsp;Li Yang,&nbsp;Kexun Chen,&nbsp;Zhixuan Shen,&nbsp;Fengmao Lv","doi":"10.1016/j.imavis.2024.105259","DOIUrl":"10.1016/j.imavis.2024.105259","url":null,"abstract":"<div><p>Object Goal Navigation(ObjectNav) is the task that an agent need navigate to an instance of a specific category in an unseen environment through visual observations within limited time steps. This work plays a significant role in enhancing the efficiency of locating specific items in indoor spaces and assisting individuals in completing various tasks, as well as providing support for people with disabilities. To achieve efficient ObjectNav in unfamiliar environments, global perception capabilities, understanding the regularities of space and semantics in the environment layout are significant. In this work, we propose an explicit-prediction method called VLAI that utilizes visual-language alignment information to guide the agent's exploration, unlike previous navigation methods based on frontier potential prediction or egocentric map completion, which only leverage visual observations to construct semantic maps, thus failing to help the agent develop a better global perception. Specifically, when predicting long-term goals, we retrieve previously saved visual observations to obtain visual information around the frontiers based on their position on the incrementally built incomplete semantic map. Then, we apply our designed Chat Describer to this visual information to obtain detailed frontier object descriptions. The Chat Describer, a novel automatic-questioning approach deployed in Visual-to-Language, is composed of Large Language Model(LLM) and the visual-to-language model(VLM), which has visual question-answering functionality. In addition, we also obtain the semantic similarity of target object and frontier object categories. Ultimately, by combining the semantic similarity and the boundary descriptions, the agent can predict the long-term goals more accurately. Our experiments on the Gibson and HM3D datasets reveal that our VLAI approach yields significantly better results compared to earlier methods. The code is released at</p><p><span><span><span>https://github.com/31539lab/VLAI</span></span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105259"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection AGIL-SwinT:用于人脸伪造检测的注意力引导的不一致性学习
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105274
Wuti Xiong , Haoyu Chen , Guoying Zhao , Xiaobai Li
Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at https://github.com/woody-xiong/AGIL-SwinT.
人脸伪造检测(FFD)在维护各种信息和媒体系统的安全性和完整性方面发挥着至关重要的作用。操纵技术造成的伪造不一致性已被证明可以有效地推广到未见数据领域。然而,现有的大多数工作都依赖于像素级的伪造注释来学习伪造不一致性。为了解决这个问题,我们提出了一种基于 Swin 变换器的新方法 AGIL-SwinT,它只需使用视频级标签就能有效地学习伪造不一致性。具体来说,我们首先利用 Swin 变换器为伪造区域生成初始掩码。然后,我们引入注意力引导的不一致性学习模块,利用无监督学习从注意力中学习不一致性。学习到的不一致性被用来修改初始掩码,以提高伪造检测能力。此外,我们还引入了伪造掩码完善模块,以获得可靠的不一致标签,从而监督不一致学习,确保掩码与伪造边界保持一致。我们在多个 FFD 基准上进行了广泛的实验,包括数据集内、跨数据集和跨操纵测试。实验结果表明,我们的方法明显优于现有方法,并能很好地推广到未见过的数据集和操作类别。我们的代码见 https://github.com/woody-xiong/AGIL-SwinT。
{"title":"AGIL-SwinT: Attention-guided inconsistency learning for face forgery detection","authors":"Wuti Xiong ,&nbsp;Haoyu Chen ,&nbsp;Guoying Zhao ,&nbsp;Xiaobai Li","doi":"10.1016/j.imavis.2024.105274","DOIUrl":"10.1016/j.imavis.2024.105274","url":null,"abstract":"<div><div>Face forgery detection (FFD) plays a vital role in maintaining the security and integrity of various information and media systems. Forgery inconsistency caused by manipulation techniques has been proven to be effective for generalizing to the unseen data domain. However, most existing works rely on pixel-level forgery annotations to learn forgery inconsistency. To address the problem, we propose a novel Swin Transformer-based method, AGIL-SwinT, that can effectively learn forgery inconsistency using only video-level labels. Specifically, we first leverage the Swin Transformer to generate the initial mask for the forgery regions. Then, we introduce an attention-guided inconsistency learning module that uses unsupervised learning to learn inconsistency from attention. The learned inconsistency is used to revise the initial mask for enhancing forgery detection. In addition, we introduce a forgery mask refinement module to obtain reliable inconsistency labels for supervising inconsistency learning and ensuring the mask is aligned with the forgery boundaries. We conduct extensive experiments on multiple FFD benchmarks, including intra-dataset, cross-dataset and cross-manipulation testing. The experimental results demonstrate that our method significantly outperforms existing methods and generalizes well to unseen datasets and manipulation categories. Our code is available at <span><span><span>https://github.com/woody-xiong/AGIL-SwinT</span></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105274"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142310475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting certified robustness via an expectation-based similarity regularization 通过基于期望的相似性正则化提高认证稳健性
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.imavis.2024.105272
Jiawen Li, Kun Fang, Xiaolin Huang, Jie Yang

A certifiably robust classifier implies the one that is theoretically guaranteed to provide robust predictions against any adversarial attacks under certain conditions. Recent defense methods aim to regularize predictions by ensuring consistency across diverse perturbed samplings around the same sample, thus enhancing the certified robustness of the classifier. However, starting from the visualization of latent representations from classifiers trained with existing defense methods, we observe that noisy samplings of other classes are still easily found near a single sample, undermining the confidence in the neighborhood of inputs required by the certified robustness. Motivated by this observation, a novel training method, namely Expectation-based Similarity Regularization for Randomized Smoothing (ESR-RS), is proposed to optimize the distance between samples utilizing metric learning. To meet the requirement of certified robustness, ESR-RS focuses on the average performance of base classifier, and adopts the expected feature approximated by the average value of multiple Gaussian-corrupted samplings around every sample, to compute similarity scores between samples in the latent space. The metric learning loss is then applied to maximize the representation similarity within the same class and minimize it between different classes. Besides, an adaptive weight correlated with the classification performance is used to control the strength of the proposed similarity regularization. Extensive experiments have verified that our method contributes to stronger certified robustness over multiple defense methods without heavy computational costs.

可认证的稳健分类器是指理论上能保证在特定条件下提供稳健预测,抵御任何对抗性攻击的分类器。最新的防御方法旨在通过确保同一样本在不同扰动采样中的一致性来规范预测,从而增强分类器的认证鲁棒性。然而,从使用现有防御方法训练的分类器的潜在表示的可视化出发,我们观察到,在单个样本附近仍然很容易发现其他类的噪声采样,从而削弱了认证鲁棒性所需的对输入邻域的信心。基于这一观察结果,我们提出了一种新的训练方法,即基于期望的随机平滑相似性正则化(ESR-RS),利用度量学习优化样本之间的距离。为了满足认证鲁棒性的要求,ESR-RS 注重基础分类器的平均性能,采用每个样本周围多次高斯干扰采样的平均值近似的期望特征,计算潜空间中样本间的相似性得分。然后应用度量学习损失来最大化同一类别内的表示相似性,最小化不同类别间的表示相似性。此外,与分类性能相关的自适应权重用于控制所建议的相似性正则化的强度。广泛的实验验证了我们的方法比多种防御方法具有更强的认证鲁棒性,而且不需要高昂的计算成本。
{"title":"Boosting certified robustness via an expectation-based similarity regularization","authors":"Jiawen Li,&nbsp;Kun Fang,&nbsp;Xiaolin Huang,&nbsp;Jie Yang","doi":"10.1016/j.imavis.2024.105272","DOIUrl":"10.1016/j.imavis.2024.105272","url":null,"abstract":"<div><p>A certifiably robust classifier implies the one that is theoretically guaranteed to provide robust predictions against <em>any</em> adversarial attacks under certain conditions. Recent defense methods aim to regularize predictions by ensuring consistency across diverse perturbed samplings around the same sample, thus enhancing the certified robustness of the classifier. However, starting from the visualization of latent representations from classifiers trained with existing defense methods, we observe that noisy samplings of other classes are still easily found near a single sample, undermining the confidence in the neighborhood of inputs required by the certified robustness. Motivated by this observation, a novel training method, namely Expectation-based Similarity Regularization for Randomized Smoothing (ESR-RS), is proposed to optimize the distance between samples utilizing metric learning. To meet the requirement of certified robustness, ESR-RS focuses on the average performance of base classifier, and adopts the expected feature approximated by the average value of multiple Gaussian-corrupted samplings around every sample, to compute similarity scores between samples in the latent space. The metric learning loss is then applied to maximize the representation similarity within the same class and minimize it between different classes. Besides, an adaptive weight correlated with the classification performance is used to control the strength of the proposed similarity regularization. Extensive experiments have verified that our method contributes to stronger certified robustness over multiple defense methods without heavy computational costs.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105272"},"PeriodicalIF":4.2,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142242496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition DFCNet +:用于连续手语识别的跨模态动态特征对比网
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-11 DOI: 10.1016/j.imavis.2024.105260
Yuan Feng , Nuoyi Chen , Yumeng Wu , Caoyu Jiang , Sheng Liu , Shengyong Chen
In sign language communication, the combination of hand signs and facial expressions is used to convey messages in a fluid manner. Accurate interpretation relies heavily on understanding the context of these signs. Current methods, however, often focus on static images, missing the continuous flow and the story that unfolds through successive movements in sign language. To address this constraint, our research introduces the Dynamic Feature Contrast Net Plus (DFCNet +), a novel model that incorporates both dynamic feature extraction and cross-modal learning. The dynamic feature extraction module of DFCNet + uses dynamic trajectory capture to monitor and record motion across frames and apply key features as an enhancement tool that highlights pixels that are critical for recognizing important sign language movements, allowing the model to follow the temporal variation of the signs. In the cross-modal learning module, we depart from the conventional approach of aligning video frames with textual descriptions. Instead, we adopt a gloss-level alignment, which provides a more detailed match between the visual signals and their corresponding text glosses, capturing the intricate relationship between what is seen and the associated text. The enhanced proficiency of DFCNet + in discerning inter-frame details translates to heightened precision on benchmarks such as PHOENIX14, PHOENIX14-T and CSL-Daily. Such performance underscores its advantage in dynamic feature capture and inter-modal learning compared to conventional approaches to sign language interpretation. Our code is available at https://github.com/fyzjut/DFCNet_Plus.
在手语交流中,手势和面部表情相结合,可以流畅地传递信息。准确的解释在很大程度上依赖于对这些手势语境的理解。然而,目前的方法通常只关注静态图像,而忽略了手语中通过连续动作展开的连续流和故事情节。为了解决这一制约因素,我们的研究引入了动态特征对比网(DFCNet Plus,DFCNet +),这是一种结合了动态特征提取和跨模态学习的新型模型。DFCNet + 的动态特征提取模块使用动态轨迹捕捉来监控和记录各帧的运动,并应用关键特征作为增强工具,突出识别重要手语动作的关键像素,使模型能够跟踪手语的时间变化。在跨模态学习模块中,我们摒弃了将视频帧与文本描述对齐的传统方法。相反,我们采用了词汇层面的对齐方式,在视觉信号和相应的文本词汇之间进行更详细的匹配,从而捕捉所见内容和相关文本之间错综复杂的关系。DFCNet + 在辨别帧间细节方面的能力得到了增强,从而提高了在 PHOENIX14、PHOENIX14-T 和 CSL-Daily 等基准测试中的精度。与传统的手语翻译方法相比,这种性能突出了它在动态特征捕捉和跨模态学习方面的优势。我们的代码见 https://github.com/fyzjut/DFCNet_Plus。
{"title":"DFCNet +: Cross-modal dynamic feature contrast net for continuous sign language recognition","authors":"Yuan Feng ,&nbsp;Nuoyi Chen ,&nbsp;Yumeng Wu ,&nbsp;Caoyu Jiang ,&nbsp;Sheng Liu ,&nbsp;Shengyong Chen","doi":"10.1016/j.imavis.2024.105260","DOIUrl":"10.1016/j.imavis.2024.105260","url":null,"abstract":"<div><div>In sign language communication, the combination of hand signs and facial expressions is used to convey messages in a fluid manner. Accurate interpretation relies heavily on understanding the context of these signs. Current methods, however, often focus on static images, missing the continuous flow and the story that unfolds through successive movements in sign language. To address this constraint, our research introduces the Dynamic Feature Contrast Net Plus (DFCNet<!--> <!-->+), a novel model that incorporates both dynamic feature extraction and cross-modal learning. The dynamic feature extraction module of DFCNet<!--> <!-->+ uses dynamic trajectory capture to monitor and record motion across frames and apply key features as an enhancement tool that highlights pixels that are critical for recognizing important sign language movements, allowing the model to follow the temporal variation of the signs. In the cross-modal learning module, we depart from the conventional approach of aligning video frames with textual descriptions. Instead, we adopt a gloss-level alignment, which provides a more detailed match between the visual signals and their corresponding text glosses, capturing the intricate relationship between what is seen and the associated text. The enhanced proficiency of DFCNet<!--> <!-->+ in discerning inter-frame details translates to heightened precision on benchmarks such as PHOENIX14, PHOENIX14-T and CSL-Daily. Such performance underscores its advantage in dynamic feature capture and inter-modal learning compared to conventional approaches to sign language interpretation. Our code is available at <span><span>https://github.com/fyzjut/DFCNet_Plus</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105260"},"PeriodicalIF":4.2,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142310478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UAV image object detection based on self-attention guidance and global feature fusion 基于自我注意引导和全局特征融合的无人机图像目标检测
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-10 DOI: 10.1016/j.imavis.2024.105262
Jing Bai , Haiyang Hu , Xiaojing Liu , Shanna Zhuang , Zhengyou Wang

Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.

无人机图像目标检测在智能交通、城市管理和农业监测等领域受到广泛关注。然而,在实际应用中,它面临着多尺度特征提取不足、处理复杂场景和小型目标时不准确等主要挑战。针对这一挑战,我们提出了一种基于自注意引导和全局特征融合的新型无人机图像目标检测网络,命名为 SGGF-Net。首先,为了优化全局视角下的特征提取并提高目标定位精度,我们引入了全局特征提取模块(GFEM),利用自注意机制捕捉并整合图像中的长距离依赖关系。其次,我们开发了基于正态分布的先验分配器(NDPA),通过测量地面实况与先验之间的相似度来提高目标位置匹配的精度,从而解决小目标定位不准的问题。此外,我们还通过多级特征的深度融合策略设计了注意力引导的 ROI 池模块(ARPM),以优化多尺度特征的整合,提高特征表示的质量。最后,实验结果证明了所提出的 SGGF-Net 方法的有效性。
{"title":"UAV image object detection based on self-attention guidance and global feature fusion","authors":"Jing Bai ,&nbsp;Haiyang Hu ,&nbsp;Xiaojing Liu ,&nbsp;Shanna Zhuang ,&nbsp;Zhengyou Wang","doi":"10.1016/j.imavis.2024.105262","DOIUrl":"10.1016/j.imavis.2024.105262","url":null,"abstract":"<div><p>Unmanned aerial vehicle (UAV) image object detection has garnered considerable attentions in fields such as Intelligent transportation, urban management and agricultural monitoring. However, it suffers from key challenges of the deficiency in multi-scale feature extraction and the inaccuracy when processing complex scenes and small-sized targets in practical applications. To address this challenge, we propose a novel UAV image object detection network based on self-attention guidance and global feature fusion, named SGGF-Net. First, in order to optimizing feature extraction in global perspective and enhancing target localization precision, the global feature extraction module (GFEM) is introduced by exploiting the self-attention mechanism to capture and integrate long-range dependencies within images. Second, a normal distribution-based prior assigner (NDPA) is developed by measuring the resemblance between ground truth and the priors, which improves the precision of target position matching and thus handle the problem of inaccurate localization of small targets. Furthermore, we design an attention-guided ROI pooling module (ARPM) via a deep fusion strategy of multilevel features for optimizing the integration of multi-scale features and improving the quality of feature representation. Finally, experimental results demonstrate the effectiveness of the proposed SGGF-Net approach.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105262"},"PeriodicalIF":4.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic deep spare clustering with a dynamic population-based evolutionary algorithm using reinforcement learning and transfer learning 使用强化学习和迁移学习的基于种群的动态进化算法自动进行深度备用聚类
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-10 DOI: 10.1016/j.imavis.2024.105258
Parham Hadikhani , Daphne Teck Ching Lai , Wee-Hong Ong , Mohammad H. Nadimi-Shahraki

Clustering data effectively remains a significant challenge in machine learning, particularly when the optimal number of clusters is unknown. Traditional deep clustering methods often struggle with balancing local and global search, leading to premature convergence and inefficiency. To address these issues, we introduce ADSC-DPE-RT (Automatic Deep Sparse Clustering with a Dynamic Population-based Evolutionary Algorithm using Reinforcement Learning and Transfer Learning), a novel deep clustering approach. ADSC-DPE-RT builds on Multi-Trial Vector-based Differential Evolution (MTDE), an algorithm that integrates sparse auto-encoding and manifold learning to enable automatic clustering without prior knowledge of cluster count. However, MTDE's fixed population size can lead to either prolonged computation or premature convergence. Our approach introduces a dynamic population generation technique guided by Reinforcement Learning (RL) and Markov Decision Process (MDP) principles. This allows for flexible adjustment of population size, preventing premature convergence and reducing computation time. Additionally, we incorporate Generative Adversarial Networks (GANs) to facilitate dynamic knowledge transfer between MTDE strategies, enhancing diversity and accelerating convergence towards the global optimum. This is the first work to address the dynamic population issue in deep clustering through RL, combined with Transfer Learning to optimize evolutionary algorithms. Our results demonstrate significant improvements in clustering performance, positioning ADSC-DPE-RT as a competitive alternative to state-of-the-art deep clustering methods.

有效地对数据进行聚类仍然是机器学习中的一项重大挑战,尤其是在最佳聚类数量未知的情况下。传统的深度聚类方法往往难以平衡局部搜索和全局搜索,导致过早收敛和效率低下。为了解决这些问题,我们引入了一种新型深度聚类方法 ADSC-DPE-RT(使用强化学习和迁移学习的基于动态种群的进化算法自动深度稀疏聚类)。ADSC-DPE-RT 基于基于多试验向量的差分进化算法(MTDE),该算法集成了稀疏自动编码和流形学习,无需事先了解聚类数量即可实现自动聚类。然而,MTDE 的固定群体大小可能导致计算时间延长或过早收敛。我们的方法引入了以强化学习(RL)和马尔可夫决策过程(MDP)原理为指导的动态群体生成技术。这样就可以灵活调整种群规模,防止过早收敛并减少计算时间。此外,我们还加入了生成对抗网络(GANs),以促进 MTDE 策略之间的动态知识转移,增强多样性并加速向全局最优的收敛。这是首次通过 RL 解决深度聚类中的动态种群问题,并结合迁移学习来优化进化算法。我们的研究结果表明,ADSC-DPE-RT 的聚类性能有了显著提高,可以替代最先进的深度聚类方法。
{"title":"Automatic deep spare clustering with a dynamic population-based evolutionary algorithm using reinforcement learning and transfer learning","authors":"Parham Hadikhani ,&nbsp;Daphne Teck Ching Lai ,&nbsp;Wee-Hong Ong ,&nbsp;Mohammad H. Nadimi-Shahraki","doi":"10.1016/j.imavis.2024.105258","DOIUrl":"10.1016/j.imavis.2024.105258","url":null,"abstract":"<div><p>Clustering data effectively remains a significant challenge in machine learning, particularly when the optimal number of clusters is unknown. Traditional deep clustering methods often struggle with balancing local and global search, leading to premature convergence and inefficiency. To address these issues, we introduce ADSC-DPE-RT (Automatic Deep Sparse Clustering with a Dynamic Population-based Evolutionary Algorithm using Reinforcement Learning and Transfer Learning), a novel deep clustering approach. ADSC-DPE-RT builds on Multi-Trial Vector-based Differential Evolution (MTDE), an algorithm that integrates sparse auto-encoding and manifold learning to enable automatic clustering without prior knowledge of cluster count. However, MTDE's fixed population size can lead to either prolonged computation or premature convergence. Our approach introduces a dynamic population generation technique guided by Reinforcement Learning (RL) and Markov Decision Process (MDP) principles. This allows for flexible adjustment of population size, preventing premature convergence and reducing computation time. Additionally, we incorporate Generative Adversarial Networks (GANs) to facilitate dynamic knowledge transfer between MTDE strategies, enhancing diversity and accelerating convergence towards the global optimum. This is the first work to address the dynamic population issue in deep clustering through RL, combined with Transfer Learning to optimize evolutionary algorithms. Our results demonstrate significant improvements in clustering performance, positioning ADSC-DPE-RT as a competitive alternative to state-of-the-art deep clustering methods.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105258"},"PeriodicalIF":4.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142232336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nighttime scene understanding with label transfer scene parser 利用标签转移场景解析器理解夜间场景
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-08 DOI: 10.1016/j.imavis.2024.105257
Thanh-Danh Nguyen , Nguyen Phan , Tam V. Nguyen , Vinh-Tiep Nguyen , Minh-Triet Tran

Semantic segmentation plays a crucial role in traffic scene understanding, especially in nighttime conditions. This paper tackles the task of semantic segmentation in nighttime scenes. The largest challenge of this task is the lack of annotated nighttime images to train a deep learning-based scene parser. The existing annotated datasets are abundant in daytime conditions but scarce in nighttime due to the high cost. Thus, we propose a novel Label Transfer Scene Parser (LTSP) framework for nighttime scene semantic segmentation by leveraging daytime annotation transfer. Our framework performs segmentation in the dark without training on real nighttime annotated data. In particular, we propose translating daytime images to nighttime conditions to obtain more data with annotation in an efficient way. In addition, we utilize the pseudo-labels inferred from unlabeled nighttime scenes to further train the scene parser. The novelty of our work is the ability to perform nighttime segmentation via daytime annotated labels and nighttime synthetic versions of the same set of images. The extensive experiments demonstrate the improvement and efficiency of our scene parser over the state-of-the-art methods with a similar semi-supervised approach on the benchmark of Nighttime Driving Test dataset. Notably, our proposed method utilizes only one-tenth of the amount of labeled and unlabeled data in comparison with the previous methods. Code is available at https://github.com/danhntd/Label_Transfer_Scene_Parser.git.

语义分割在交通场景理解中起着至关重要的作用,尤其是在夜间条件下。本文探讨了夜间场景中的语义分割任务。这项任务面临的最大挑战是缺乏有注释的夜间图像来训练基于深度学习的场景解析器。现有的注释数据集在白天条件下非常丰富,但由于成本高昂,在夜间却非常稀缺。因此,我们提出了一种新颖的标签转移场景解析器(LTSP)框架,利用白天的注释转移进行夜间场景语义分割。我们的框架无需在真实的夜间注释数据上进行训练,即可在黑暗中执行分割。特别是,我们建议将白天的图像转换到夜间条件下,从而以高效的方式获得更多带有注释的数据。此外,我们还利用从未加标签的夜间场景中推断出的伪标签来进一步训练场景解析器。我们工作的新颖之处在于能够通过同一组图像的日间注释标签和夜间合成版本进行夜间分割。大量实验证明,在夜间驾驶测试数据集的基准测试中,我们的场景解析器与采用类似半监督方法的先进方法相比,具有更高的性能和效率。值得注意的是,与之前的方法相比,我们提出的方法只使用了十分之一的标记数据和未标记数据。代码见 https://github.com/danhntd/Label_Transfer_Scene_Parser.git。
{"title":"Nighttime scene understanding with label transfer scene parser","authors":"Thanh-Danh Nguyen ,&nbsp;Nguyen Phan ,&nbsp;Tam V. Nguyen ,&nbsp;Vinh-Tiep Nguyen ,&nbsp;Minh-Triet Tran","doi":"10.1016/j.imavis.2024.105257","DOIUrl":"10.1016/j.imavis.2024.105257","url":null,"abstract":"<div><p>Semantic segmentation plays a crucial role in traffic scene understanding, especially in nighttime conditions. This paper tackles the task of semantic segmentation in nighttime scenes. The largest challenge of this task is the lack of annotated nighttime images to train a deep learning-based scene parser. The existing annotated datasets are abundant in daytime conditions but scarce in nighttime due to the high cost. Thus, we propose a novel Label Transfer Scene Parser (LTSP) framework for nighttime scene semantic segmentation by leveraging daytime annotation transfer. Our framework performs segmentation in the dark without training on real nighttime annotated data. In particular, we propose translating daytime images to nighttime conditions to obtain more data with annotation in an efficient way. In addition, we utilize the pseudo-labels inferred from unlabeled nighttime scenes to further train the scene parser. The novelty of our work is the ability to perform nighttime segmentation via daytime annotated labels and nighttime synthetic versions of the same set of images. The extensive experiments demonstrate the improvement and efficiency of our scene parser over the state-of-the-art methods with a similar semi-supervised approach on the benchmark of Nighttime Driving Test dataset. Notably, our proposed method utilizes only one-tenth of the amount of labeled and unlabeled data in comparison with the previous methods. Code is available at <span><span>https://github.com/danhntd/Label_Transfer_Scene_Parser.git</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105257"},"PeriodicalIF":4.2,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RFSC-net: Re-parameterization forward semantic compensation network in low-light environments RFSC-net:低照度环境下重新参数化的前向语义补偿网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-07 DOI: 10.1016/j.imavis.2024.105271
Wenhao Zhang , Huiying Xu , Xinzhong Zhu , Yunzhong Si , Yao Dong , Xiao Huang , Hongbo Li
Although detectors currently perform well in well-light conditions, their accuracy decreases due to insufficient object information. In addressing this issue, we propose the Re-parameterization Forward Semantic Compensation Network (RFSC-Net). We propose the Reparameterization Residual Efficient Layer Aggregation Networks (RSELAN) for feature extraction, which integrates the concepts of re-parameterization and the Efficient Layer Aggregation Networks (ELAN). While focusing on the fusion of feature maps of the same dimension, it also incorporates upward fusion of lower-level feature maps, enhancing the detailed texture information in higher-level features. Our proposed Forward Semantic Compensation Feature Fusion (FSCFF) network reduces interference from high-level to low-level semantic information, retaining finer details to improve detection accuracy in low-light conditions. Experiments on the low-light ExDark and DarkFace datasets show that RFSC-Net improves mAP by 2% on ExDark and 0.5% on DarkFace over the YOLOv8n baseline, without an increase in parameter counts. Additionally, AP50 is enhanced by 2.1% on ExDark and 1.1% on DarkFace, with a mere 3.7 ms detection latency on ExDark.
尽管目前的探测器在光线充足的条件下表现良好,但由于物体信息不足,其准确性会下降。为解决这一问题,我们提出了重新参数化前瞻语义补偿网络(RFSC-Net)。我们提出了用于特征提取的重参数化残余高效层聚合网络(RSELAN),它整合了重参数化和高效层聚合网络(ELAN)的概念。它在关注同维度特征图融合的同时,还结合了低层次特征图的向上融合,增强了高层次特征中的详细纹理信息。我们提出的前向语义补偿特征融合(FSCFF)网络减少了高层语义信息对低层语义信息的干扰,保留了更精细的细节,从而提高了弱光条件下的检测精度。在低照度 ExDark 和 DarkFace 数据集上的实验表明,与 YOLOv8n 基线相比,RFSC-Net 将 ExDark 的 mAP 提高了 2%,将 DarkFace 的 mAP 提高了 0.5%,而且没有增加参数数量。此外,AP50 在 ExDark 上提高了 2.1%,在 DarkFace 上提高了 1.1%,而在 ExDark 上的检测延迟仅为 3.7 毫秒。
{"title":"RFSC-net: Re-parameterization forward semantic compensation network in low-light environments","authors":"Wenhao Zhang ,&nbsp;Huiying Xu ,&nbsp;Xinzhong Zhu ,&nbsp;Yunzhong Si ,&nbsp;Yao Dong ,&nbsp;Xiao Huang ,&nbsp;Hongbo Li","doi":"10.1016/j.imavis.2024.105271","DOIUrl":"10.1016/j.imavis.2024.105271","url":null,"abstract":"<div><div>Although detectors currently perform well in well-light conditions, their accuracy decreases due to insufficient object information. In addressing this issue, we propose the Re-parameterization Forward Semantic Compensation Network (RFSC-Net). We propose the Reparameterization Residual Efficient Layer Aggregation Networks (RSELAN) for feature extraction, which integrates the concepts of re-parameterization and the Efficient Layer Aggregation Networks (ELAN). While focusing on the fusion of feature maps of the same dimension, it also incorporates upward fusion of lower-level feature maps, enhancing the detailed texture information in higher-level features. Our proposed Forward Semantic Compensation Feature Fusion (FSCFF) network reduces interference from high-level to low-level semantic information, retaining finer details to improve detection accuracy in low-light conditions. Experiments on the low-light ExDark and DarkFace datasets show that RFSC-Net improves mAP by 2% on ExDark and 0.5% on DarkFace over the YOLOv8n baseline, without an increase in parameter counts. Additionally, AP50 is enhanced by 2.1% on ExDark and 1.1% on DarkFace, with a mere 3.7 ms detection latency on ExDark.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105271"},"PeriodicalIF":4.2,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic segmentation of deep endometriosis in the rectosigmoid using deep learning 利用深度学习自动分割直肠乙状结肠深部子宫内膜异位症
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-06 DOI: 10.1016/j.imavis.2024.105261
Weslley Kelson Ribeiro Figueredo , Aristófanes Corrêa Silva , Anselmo Cardoso de Paiva , João Otávio Bandeira Diniz , Alice Brandão , Marco Aurelio Pinho Oliveira

Endometriosis is an inflammatory disease that causes several symptoms, such as infertility and constant pain. While biopsy remains the gold standard for diagnosing endometriosis, imaging tests, particularly magnetic resonance, are becoming increasingly prominent, especially in cases of deep infiltrating disease. However, precise and accurate MRI results require a skilled radiologist. In this study, we employ our built dataset to propose an automated method for classifying patients with endometriosis and segmenting the endometriosis lesion in magnetic resonance images of the rectum and sigmoid colon using image processing and deep learning techniques. Our goals are to assist in the diagnosis, to map the extent of the disease before a surgical procedure, and to help reduce the need for invasive diagnostic methods. This method consists of the following steps: rectosigmoid ROI extraction, image classification, initial lesion segmentation, lesion ROI extraction, and final lesion segmentation. ROI extraction is employed to limit the area while searching for lesions. Using an ensemble of networks, classification of images and patients, with or without endometriosis, achieved accuracies of 87.46% and 96.67%, respectively. One of these networks is a proposed modification of VGG-16. The initial segmentation step produces candidate regions for lesions using TransUnet, achieving a Dice index of 51%. These regions serve as the basis for extracting a new ROI. In the final lesion segmentation, and also using TransUnet, we obtain a Dice index of 65.44%.

子宫内膜异位症是一种炎症性疾病,会导致多种症状,如不孕和持续疼痛。虽然活组织检查仍是诊断子宫内膜异位症的金标准,但影像学检查,尤其是磁共振检查,正变得越来越重要,特别是在深部浸润性疾病的病例中。然而,精确的磁共振成像结果需要技术娴熟的放射科医生。在本研究中,我们利用建立的数据集提出了一种自动方法,利用图像处理和深度学习技术对子宫内膜异位症患者进行分类,并分割直肠和乙状结肠磁共振图像中的子宫内膜异位症病灶。我们的目标是协助诊断,在手术前绘制疾病范围图,并帮助减少对侵入性诊断方法的需求。该方法包括以下步骤:直肠乙状结肠 ROI 提取、图像分类、初始病灶分割、病灶 ROI 提取和最终病灶分割。提取 ROI 的目的是在搜索病灶时限制区域。使用网络组合对有或无子宫内膜异位症的图像和患者进行分类,准确率分别达到 87.46% 和 96.67%。其中一个网络是对 VGG-16 的修改。初始分割步骤使用 TransUnet 生成病变候选区域,Dice 指数达到 51%。这些区域是提取新 ROI 的基础。在最终的病变分割中,同样使用 TransUnet,我们获得了 65.44% 的 Dice 指数。
{"title":"Automatic segmentation of deep endometriosis in the rectosigmoid using deep learning","authors":"Weslley Kelson Ribeiro Figueredo ,&nbsp;Aristófanes Corrêa Silva ,&nbsp;Anselmo Cardoso de Paiva ,&nbsp;João Otávio Bandeira Diniz ,&nbsp;Alice Brandão ,&nbsp;Marco Aurelio Pinho Oliveira","doi":"10.1016/j.imavis.2024.105261","DOIUrl":"10.1016/j.imavis.2024.105261","url":null,"abstract":"<div><p>Endometriosis is an inflammatory disease that causes several symptoms, such as infertility and constant pain. While biopsy remains the gold standard for diagnosing endometriosis, imaging tests, particularly magnetic resonance, are becoming increasingly prominent, especially in cases of deep infiltrating disease. However, precise and accurate MRI results require a skilled radiologist. In this study, we employ our built dataset to propose an automated method for classifying patients with endometriosis and segmenting the endometriosis lesion in magnetic resonance images of the rectum and sigmoid colon using image processing and deep learning techniques. Our goals are to assist in the diagnosis, to map the extent of the disease before a surgical procedure, and to help reduce the need for invasive diagnostic methods. This method consists of the following steps: rectosigmoid ROI extraction, image classification, initial lesion segmentation, lesion ROI extraction, and final lesion segmentation. ROI extraction is employed to limit the area while searching for lesions. Using an ensemble of networks, classification of images and patients, with or without endometriosis, achieved accuracies of 87.46% and 96.67%, respectively. One of these networks is a proposed modification of VGG-16. The initial segmentation step produces candidate regions for lesions using TransUnet, achieving a Dice index of 51%. These regions serve as the basis for extracting a new ROI. In the final lesion segmentation, and also using TransUnet, we obtain a Dice index of 65.44%.</p></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105261"},"PeriodicalIF":4.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1