ArthroNet：一种用于膝关节镜检查的具有3D分割地图的单目深度估计技术

IF 4.4 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Intelligent medicine Pub Date : 2023-05-01 DOI:10.1016/j.imed.2022.05.001

Shahnewaz Ali, Ajay K. Pandey

{"title":"ArthroNet：一种用于膝关节镜检查的具有3D分割地图的单目深度估计技术","authors":"Shahnewaz Ali, Ajay K. Pandey","doi":"10.1016/j.imed.2022.05.001","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries. The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries, as a full 3D semantic representation of the surgical site can directly improve surgeons’ ability. It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous, and fully autonomous platforms. This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.</p></div><div><h3>Methods</h3><p>We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique. It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame. The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images. Moreover, following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion, the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.</p></div><div><h3>Results</h3><p>A total of 4,128 stereo frames from a knee phantom were used to train a network, and during the pre-trained stage, the network learned disparity maps from the stereo images. The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique. In a supervised fashion, the network learns the left image to the disparity map transformation process, whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection, gradients, and structural dissimilarity loss. Together, our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132 (structural similarity index), 0.00036120156 (L1 error distance) and 6.591908 × 10<sup>−5</sup> (L1 gradient error distance).</p></div><div><h3>Conclusion</h3><p>Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame. Moreover, the study integrates segmentation model hence, 3D segmented maps are inferred that provides extended perception ability and tissue awareness.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"3 2","pages":"Pages 129-138"},"PeriodicalIF":4.4000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ArthroNet: a monocular depth estimation technique with 3D segmented maps for knee arthroscopy\",\"authors\":\"Shahnewaz Ali, Ajay K. Pandey\",\"doi\":\"10.1016/j.imed.2022.05.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><p>Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries. The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries, as a full 3D semantic representation of the surgical site can directly improve surgeons’ ability. It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous, and fully autonomous platforms. This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.</p></div><div><h3>Methods</h3><p>We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique. It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame. The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images. Moreover, following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion, the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.</p></div><div><h3>Results</h3><p>A total of 4,128 stereo frames from a knee phantom were used to train a network, and during the pre-trained stage, the network learned disparity maps from the stereo images. The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique. In a supervised fashion, the network learns the left image to the disparity map transformation process, whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection, gradients, and structural dissimilarity loss. Together, our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132 (structural similarity index), 0.00036120156 (L1 error distance) and 6.591908 × 10<sup>−5</sup> (L1 gradient error distance).</p></div><div><h3>Conclusion</h3><p>Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame. Moreover, the study integrates segmentation model hence, 3D segmented maps are inferred that provides extended perception ability and tissue awareness.</p></div>\",\"PeriodicalId\":73400,\"journal\":{\"name\":\"Intelligent medicine\",\"volume\":\"3 2\",\"pages\":\"Pages 129-138\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2667102622000341\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102622000341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 2

摘要

背景医学成像系统缺乏深度感知是微创手术长期以来的技术局限之一。在3D中可视化解剖结构的能力可以改进传统的关节镜手术，因为手术部位的完整3D语义表示可以直接提高外科医生的能力。它还为开发半自主和全自主平台带来了术中图像与术前临床记录配准的可能性。本研究旨在提出一种新的单目深度预测模型，从单色关节镜视频帧中推断深度图。方法我们应用了一种新的技术，该技术提供了将监督和自监督损失项相结合的能力，从而消除了每种技术的缺点。它能够从单个无纹理的关节镜框架中估计边缘保留深度图。所提出的图像采集技术在表面投影人工纹理，以提高立体图像视差图的质量。此外，在将注意力-软件多尺度特征提取技术与场景全局上下文约束和多尺度深度融合相结合之后，该模型可以预测符合场景几何形状的手术部位的可靠和准确的组织深度。结果使用来自膝关节模型的4128个立体帧来训练网络，在预训练阶段，网络从立体图像中学习视差图。微调训练阶段使用来自尸体实验的12695个膝关节镜立体帧以及从立体匹配技术获得的相应的粗略视差图。在有监督的方式中，网络学习左图像到视差图的转换过程，而自监督损失项通过最小化重投影、梯度和结构相异性损失来细化粗略深度图。总之，我们的方法产生了具有最小重投影损失的高质量3D地图，重投影损失分别为0.0004132（结构相似性指数）、0.00036120156（L1误差距离）和6.591908×10−5（L1梯度误差距离）。此外，该研究集成了分割模型，因此推断出了提供扩展感知能力和组织感知的3D分割图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

ArthroNet: a monocular depth estimation technique with 3D segmented maps for knee arthroscopy

Background

Lack of depth perception from medical imaging systems is one of the long-standing technological limitations of minimally invasive surgeries. The ability to visualize anatomical structures in 3D can improve conventional arthroscopic surgeries, as a full 3D semantic representation of the surgical site can directly improve surgeons’ ability. It also brings the possibility of intraoperative image registration with preoperative clinical records for the development of semi-autonomous, and fully autonomous platforms. This study aimed to present a novel monocular depth prediction model to infer depth maps from a single-color arthroscopic video frame.

Methods

We applied a novel technique that provides the ability to combine both supervised and self-supervised loss terms and thus eliminate the drawback of each technique. It enabled the estimation of edge-preserving depth maps from a single untextured arthroscopic frame. The proposed image acquisition technique projected artificial textures on the surface to improve the quality of disparity maps from stereo images. Moreover, following the integration of the attention-ware multi-scale feature extraction technique along with scene global contextual constraints and multiscale depth fusion, the model could predict reliable and accurate tissue depth of the surgical sites that complies with scene geometry.

Results

A total of 4,128 stereo frames from a knee phantom were used to train a network, and during the pre-trained stage, the network learned disparity maps from the stereo images. The fine-tuned training phase uses 12,695 knee arthroscopic stereo frames from cadaver experiments along with their corresponding coarse disparity maps obtained from the stereo matching technique. In a supervised fashion, the network learns the left image to the disparity map transformation process, whereas the self-supervised loss term refines the coarse depth map by minimizing reprojection, gradients, and structural dissimilarity loss. Together, our method produces high-quality 3D maps with minimum re-projection loss that are 0.0004132 (structural similarity index), 0.00036120156 (L1 error distance) and 6.591908 × 10⁻⁵ (L1 gradient error distance).

Conclusion

Machine learning techniques for monocular depth prediction is studied to infer accurate depth maps from a single-color arthroscopic video frame. Moreover, the study integrates segmentation model hence, 3D segmented maps are inferred that provides extended perception ability and tissue awareness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering

CiteScore

5.20

自引率

0.00%

发文量