Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang
{"title":"Laparoscopic stereo matching using 3-Dimensional Fourier transform with full multi-scale features","authors":"Renkai Wu , Pengchen Liang , Yinghao Liu , Yiqi Huang , Wangyan Li , Qing Chang","doi":"10.1016/j.engappai.2024.109654","DOIUrl":null,"url":null,"abstract":"<div><div><strong>3</strong>-<strong>D</strong>imensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D <strong>F</strong>ourier <strong>T</strong>ransform combined with <strong>F</strong>ull <strong>M</strong>ulti-scale <strong>F</strong>eatures (FT-FMF Net). Specifically, the proposed <strong>F</strong>ull <strong>M</strong>ulti-scale <strong>F</strong>usion <strong>M</strong>odule (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed <strong>D</strong>ense <strong>F</strong>ourier <strong>T</strong>ransform <strong>M</strong>odule (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the <strong>R</strong>oot <strong>M</strong>ean <strong>S</strong>quare <strong>E</strong>rror Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on <span><span>https://github.com/wurenkai/FT-FMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109654"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197624018128","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
3-Dimensional (3D) reconstruction of laparoscopic surgical scenes is a key task for future surgical navigation and automated robotic minimally invasive surgery. Binocular laparoscopy with stereo matching enables 3D reconstruction. Stereo matching models used for natural images such as autopilot tend to be less suitable for laparoscopic environments due to the constraints of small samples of laparoscopic images, complex textures, and uneven illumination. In addition, current stereo matching modules use 3D convolutions and transformers in the spatial domain as the base module, which is limited by the ability to learn in the spatial domain. In this paper, we propose a model for laparoscopic stereo matching using 3D Fourier Transform combined with Full Multi-scale Features (FT-FMF Net). Specifically, the proposed Full Multi-scale Fusion Module (FMFM) is able to fuse the full multi-scale feature information from the feature extractor into the stereo matching block, which densely learns the feature information with parallax and FMFM fusion information in the frequency domain using the proposed Dense Fourier Transform Module (DFTM). We validated the proposed method in both the laparoscopic dataset (SCARED) and the endoscopic dataset (SERV-CT). In comparison with other popular and advanced deep learning models available at present, FT-FMF Net achieves the most advanced stereo matching performance available. In the SCARED and SERV-CT public datasets, the End-Point-Error (EPE) was 0.7265 and 2.3119, and the Root Mean Square Error Depth (RMSE Depth) was 4.00 mm and 3.69 mm, respectively. In addition, the inference time is only 0.17s. Our project code is available on https://github.com/wurenkai/FT-FMF.
期刊介绍:
Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.