CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computer Vision and Image Understanding Pub Date : 2024-11-01 Epub Date: 2024-09-02 DOI:10.1016/j.cviu.2024.104138

Yingda Lyu , Zhehao Liu , Yingxin Zhang , Haipeng Chen , Zhimin Xu

{"title":"CRML-Net: Cross-Modal Reasoning and Multi-Task Learning Network for tooth image segmentation","authors":"Yingda Lyu , Zhehao Liu , Yingxin Zhang , Haipeng Chen , Zhimin Xu","doi":"10.1016/j.cviu.2024.104138","DOIUrl":null,"url":null,"abstract":"<div><p>Data from a single modality may suffer from noise, low contrast, or other imaging limitations that affect the model’s accuracy. Furthermore, due to the limited amount of data, most models trained on single-modality data tend to overfit the training set and perform poorly on out-of-domain data. Therefore, in this paper, we propose a network named Cross-Modal Reasoning and Multi-Task Learning Network (CRML-Net), which combines cross-modal reasoning and multi-task learning, aiming to leverage the complementary information between different modalities and tasks to enhance the model’s generalization ability and accuracy. Specifically, CRML-Net consists of two stages. In the first stage, our network extracts a new morphological information modality from the original image and then performs cross-modal fusion with the original modality image, aiming to leverage the morphological information to enhance the model’s robustness to out-of-domain datasets. In the second stage, based on the output of the previous stage, we introduce a multi-task learning mechanism, aiming to improve the model’s performance on unseen data by sharing surface detail information from auxiliary tasks. We validated our method on a publicly available tooth cone beam computed tomography dataset. Our evaluation demonstrates that our method outperforms state-of-the-art approaches.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"248 ","pages":"Article 104138"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002194","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Data from a single modality may suffer from noise, low contrast, or other imaging limitations that affect the model’s accuracy. Furthermore, due to the limited amount of data, most models trained on single-modality data tend to overfit the training set and perform poorly on out-of-domain data. Therefore, in this paper, we propose a network named Cross-Modal Reasoning and Multi-Task Learning Network (CRML-Net), which combines cross-modal reasoning and multi-task learning, aiming to leverage the complementary information between different modalities and tasks to enhance the model’s generalization ability and accuracy. Specifically, CRML-Net consists of two stages. In the first stage, our network extracts a new morphological information modality from the original image and then performs cross-modal fusion with the original modality image, aiming to leverage the morphological information to enhance the model’s robustness to out-of-domain datasets. In the second stage, based on the output of the previous stage, we introduce a multi-task learning mechanism, aiming to improve the model’s performance on unseen data by sharing surface detail information from auxiliary tasks. We validated our method on a publicly available tooth cone beam computed tomography dataset. Our evaluation demonstrates that our method outperforms state-of-the-art approaches.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CRML-Net：用于牙齿图像分割的跨模态推理和多任务学习网络

来自单一模式的数据可能存在噪声、低对比度或其他成像限制，从而影响模型的准确性。此外，由于数据量有限，大多数基于单模态数据训练的模型往往会过拟合训练集，在域外数据上表现不佳。因此，我们在本文中提出了一种名为 "跨模态推理和多任务学习网络"（Cross-Modal Reasoning and Multi-Task Learning Network，CRML-Net）的网络，它将跨模态推理和多任务学习相结合，旨在利用不同模态和任务之间的互补信息来提高模型的泛化能力和准确性。具体来说，CRML-Net 包括两个阶段。在第一阶段，我们的网络从原始图像中提取新的形态信息模态，然后与原始模态图像进行跨模态融合，旨在利用形态信息增强模型对域外数据集的鲁棒性。在第二阶段，基于前一阶段的输出，我们引入了多任务学习机制，旨在通过共享辅助任务的表面细节信息来提高模型在未见数据上的性能。我们在公开的牙齿锥形束计算机断层扫描数据集上验证了我们的方法。评估结果表明，我们的方法优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems