{"title":"利用腹腔镜图像的自我关注和联合深度姿态损失,增强自我监督单目深度估计。","authors":"Wenda Li, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kazunari Misawa, Kensaku Mori","doi":"10.1007/s11548-025-03332-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope's complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images.</p><p><strong>Methods: </strong>We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss.</p><p><strong>Results: </strong>The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 <math><mo>%</mo></math> and 14.00 <math><mo>%</mo></math> in the absolute relative error when combining all of the proposed components for depth estimation on SCARED and Hamlyn datasets. The qualitative results show that the proposed method produces smooth depth maps with low error in various laparoscopic scenes. The proposed method also exhibits a trade-off between computational efficiency and performance.</p><p><strong>Conclusion: </strong>This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.</p>","PeriodicalId":51251,"journal":{"name":"International Journal of Computer Assisted Radiology and Surgery","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images.\",\"authors\":\"Wenda Li, Yuichiro Hayashi, Masahiro Oda, Takayuki Kitasaka, Kazunari Misawa, Kensaku Mori\",\"doi\":\"10.1007/s11548-025-03332-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope's complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images.</p><p><strong>Methods: </strong>We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss.</p><p><strong>Results: </strong>The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 <math><mo>%</mo></math> and 14.00 <math><mo>%</mo></math> in the absolute relative error when combining all of the proposed components for depth estimation on SCARED and Hamlyn datasets. The qualitative results show that the proposed method produces smooth depth maps with low error in various laparoscopic scenes. The proposed method also exhibits a trade-off between computational efficiency and performance.</p><p><strong>Conclusion: </strong>This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.</p>\",\"PeriodicalId\":51251,\"journal\":{\"name\":\"International Journal of Computer Assisted Radiology and Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Assisted Radiology and Surgery\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s11548-025-03332-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Assisted Radiology and Surgery","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s11548-025-03332-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images.
Purpose: Depth estimation is a powerful tool for navigation in laparoscopic surgery. Previous methods utilize predicted depth maps and the relative poses of the camera to accomplish self-supervised depth estimation. However, the smooth surfaces of organs with textureless regions and the laparoscope's complex rotations make depth and pose estimation difficult in laparoscopic scenes. Therefore, we propose a novel and effective self-supervised monocular depth estimation method with self-attention-guided pose estimation and a joint depth-pose loss function for laparoscopic images.
Methods: We extract feature maps and calculate the minimum re-projection error as a feature-metric loss to establish constraints based on feature maps with more meaningful representations. Moreover, we introduce the self-attention block in the pose estimation network to predict rotations and translations of the relative poses. In addition, we minimize the difference between predicted relative poses as the pose loss. We combine all of the losses as a joint depth-pose loss.
Results: The proposed method is extensively evaluated using SCARED and Hamlyn datasets. Quantitative results show that the proposed method achieves improvements of about 18.07 and 14.00 in the absolute relative error when combining all of the proposed components for depth estimation on SCARED and Hamlyn datasets. The qualitative results show that the proposed method produces smooth depth maps with low error in various laparoscopic scenes. The proposed method also exhibits a trade-off between computational efficiency and performance.
Conclusion: This study considers the characteristics of laparoscopic datasets and presents a simple yet effective self-supervised monocular depth estimation. We propose a joint depth-pose loss function based on the extracted feature for depth estimation on laparoscopic images guided by a self-attention block. The experimental results prove that all of the proposed components contribute to the proposed method. Furthermore, the proposed method strikes an efficient balance between computational efficiency and performance.
期刊介绍:
The International Journal for Computer Assisted Radiology and Surgery (IJCARS) is a peer-reviewed journal that provides a platform for closing the gap between medical and technical disciplines, and encourages interdisciplinary research and development activities in an international environment.