V²-SfMLearner: Learning Monocular Depth and Ego-Motion for Multimodal Wireless Capsule Endoscopy

IF 6.4 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2025-01-16 DOI:10.1109/TASE.2025.3530791
Long Bai;Beilei Cui;Liangyu Wang;Yanheng Li;Shilong Yao;Sishen Yuan;Yanan Wu;Yang Zhang;Max Q.-H. Meng;Zhen Li;Weiping Ding;Hongliang Ren
{"title":"V²-SfMLearner: Learning Monocular Depth and Ego-Motion for Multimodal Wireless Capsule Endoscopy","authors":"Long Bai;Beilei Cui;Liangyu Wang;Yanheng Li;Shilong Yao;Sishen Yuan;Yanan Wu;Yang Zhang;Max Q.-H. Meng;Zhen Li;Weiping Ding;Hongliang Ren","doi":"10.1109/TASE.2025.3530791","DOIUrl":null,"url":null,"abstract":"Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations that could reduce noise and improve performance. Therefore, we propose V2-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation for monocular capsule endoscopy. We construct a multimodal capsule endoscopy dataset containing vibration and visual signals, and our artificial intelligence solution develops an unsupervised method using vision-vibration signals, effectively eliminating vibration perturbations through multimodal learning. Specifically, we carefully design a vibration network branch and a Fourier fusion module, to detect and mitigate vibration noises. The fusion framework is compatible with popular vision-only algorithms. Extensive validation on the multimodal dataset demonstrates superior performance and robustness against vision-only algorithms. Without the need for large external equipment, our V2-SfMLearner has the potential for integration into clinical capsule robots, providing real-time and dependable digestive examination tools. The findings show promise for practical implementation in clinical settings, enhancing the diagnostic capabilities of doctors. Note to Practitioners—This paper is motivated by the problem of estimating the depth and ego-motion information for the wireless capsule endoscopy in the human gastrointestinal tract to realize accurate, efficient, robust, and real-time inspection. Our estimation method does not engage any external localization equipment. Instead, inspired by the existing research on integrating capsule endoscopy and inertial measurement units, we introduce vibration signals into vision-based depth and ego-motion estimation approaches, improving the accuracy and robustness of the estimation results based on multimodal learning methods. Research on capsule robots or computer vision can readily be combined with our framework for various clinical and industrial applications.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"11717-11730"},"PeriodicalIF":6.4000,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10843755","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10843755/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning can predict depth maps and capsule ego-motion from capsule endoscopy videos, aiding in 3D scene reconstruction and lesion localization. However, the collisions of the capsule endoscopies within the gastrointestinal tract cause vibration perturbations in the training data. Existing solutions focus solely on vision-based processing, neglecting other auxiliary signals like vibrations that could reduce noise and improve performance. Therefore, we propose V2-SfMLearner, a multimodal approach integrating vibration signals into vision-based depth and capsule motion estimation for monocular capsule endoscopy. We construct a multimodal capsule endoscopy dataset containing vibration and visual signals, and our artificial intelligence solution develops an unsupervised method using vision-vibration signals, effectively eliminating vibration perturbations through multimodal learning. Specifically, we carefully design a vibration network branch and a Fourier fusion module, to detect and mitigate vibration noises. The fusion framework is compatible with popular vision-only algorithms. Extensive validation on the multimodal dataset demonstrates superior performance and robustness against vision-only algorithms. Without the need for large external equipment, our V2-SfMLearner has the potential for integration into clinical capsule robots, providing real-time and dependable digestive examination tools. The findings show promise for practical implementation in clinical settings, enhancing the diagnostic capabilities of doctors. Note to Practitioners—This paper is motivated by the problem of estimating the depth and ego-motion information for the wireless capsule endoscopy in the human gastrointestinal tract to realize accurate, efficient, robust, and real-time inspection. Our estimation method does not engage any external localization equipment. Instead, inspired by the existing research on integrating capsule endoscopy and inertial measurement units, we introduce vibration signals into vision-based depth and ego-motion estimation approaches, improving the accuracy and robustness of the estimation results based on multimodal learning methods. Research on capsule robots or computer vision can readily be combined with our framework for various clinical and industrial applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
V2-SfMLearner:学习多模态无线胶囊内窥镜的单眼深度和自我运动
深度学习可以从胶囊内窥镜视频中预测深度图和胶囊自我运动,帮助3D场景重建和病灶定位。然而,胶囊内窥镜在胃肠道内的碰撞会导致训练数据中的振动扰动。现有的解决方案只关注基于视觉的处理,而忽略了其他辅助信号,如振动,可以减少噪音和提高性能。因此,我们提出了V2-SfMLearner,这是一种将振动信号集成到基于视觉的单眼胶囊内窥镜深度和胶囊运动估计中的多模态方法。我们构建了一个包含振动和视觉信号的多模态胶囊内窥镜数据集,我们的人工智能解决方案开发了一种使用视觉振动信号的无监督方法,通过多模态学习有效地消除了振动扰动。具体而言,我们精心设计了振动网络分支和傅立叶融合模块,以检测和减轻振动噪声。融合框架与流行的仅视觉算法兼容。在多模态数据集上的广泛验证表明,该算法对仅视觉算法具有优越的性能和鲁棒性。不需要大型外部设备,我们的V2-SfMLearner有潜力集成到临床胶囊机器人中,提供实时可靠的消化检查工具。研究结果显示了在临床环境中实际实施的希望,提高了医生的诊断能力。从业人员注意:本文的动机是为了实现无线胶囊内窥镜在人体胃肠道中的深度和自我运动信息的估计问题,以实现准确、高效、鲁棒和实时的检查。我们的估计方法不使用任何外部定位设备。相反,受胶囊内窥镜与惯性测量单元集成研究的启发,我们将振动信号引入到基于视觉的深度和自运动估计方法中,提高了基于多模态学习方法的估计结果的准确性和鲁棒性。胶囊机器人或计算机视觉的研究可以很容易地与我们的框架相结合,用于各种临床和工业应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Automation Science and Engineering
IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统
CiteScore
12.50
自引率
14.30%
发文量
404
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.
期刊最新文献
Multi-target Ensemble Stochastic Configuration Network for Furnace Temperature Prediction in Municipal Solid Waste Incineration Process Data-Driven Hierarchical Decision-Making Modeling for Complex Industrial Processes Reinforcement Learning-based MPC for Output Regulation of Uncertain Constrained Linear System Robust Sensor less Control of Synchronous Reluctance Motor Drive with Solar PV and Grid-Integrated Energy Management for EV Data-Driven Asynchronous Dynamic Event-Triggered H ∞ Tracking Control of SPSs with Unknown Slow Dynamics
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1