Multiscale geometric window transformer for orthodontic teeth point cloud registration

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Multimedia Systems Pub Date : 2024-05-31 DOI:10.1007/s00530-024-01369-x

Hao Wang, Yan Tian, Yongchuan Xu, Jiahui Xu, Tao Yang, Yan Lu, Hong Chen

{"title":"Multiscale geometric window transformer for orthodontic teeth point cloud registration","authors":"Hao Wang, Yan Tian, Yongchuan Xu, Jiahui Xu, Tao Yang, Yan Lu, Hong Chen","doi":"10.1007/s00530-024-01369-x","DOIUrl":null,"url":null,"abstract":"<p>Digital orthodontic treatment monitoring has been gaining increasing attention in the past decade. However, current methods based on deep learning still face difficult challenges. Transformer, due to its excellent ability to model long-term dependencies, can be applied to the task of tooth point cloud registration. Nonetheless, most transformer-based point cloud registration networks suffer from two problems. First, they lack the embedding of credible geometric information, resulting in learned features that are not geometrically discriminative and blur the boundary between inliers and outliers. Second, the attention mechanism lacks continuous downsampling during geometric transformation invariant feature extraction at the superpixel level, thereby limiting the field of view and potentially limiting the model’s perception of local and global information. In this paper, we propose GeoSwin, which uses a novel geometric window transformer to achieve accurate registration of tooth point clouds in different stages of orthodontic treatment. This method uses the point distance, normal vector angle, and bidirectional spatial angular distances as the input geometric embedding of transformer, and then uses a proposed variable multiscale attention mechanism to achieve geometric information perception from local to global perspectives. Experiments on the Shing3D Dental Dataset demonstrate the effectiveness of our approach and that it outperforms other state-of-the-art approaches across multiple metrics. Our code and models are available at GeoSwin.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"495 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01369-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Digital orthodontic treatment monitoring has been gaining increasing attention in the past decade. However, current methods based on deep learning still face difficult challenges. Transformer, due to its excellent ability to model long-term dependencies, can be applied to the task of tooth point cloud registration. Nonetheless, most transformer-based point cloud registration networks suffer from two problems. First, they lack the embedding of credible geometric information, resulting in learned features that are not geometrically discriminative and blur the boundary between inliers and outliers. Second, the attention mechanism lacks continuous downsampling during geometric transformation invariant feature extraction at the superpixel level, thereby limiting the field of view and potentially limiting the model’s perception of local and global information. In this paper, we propose GeoSwin, which uses a novel geometric window transformer to achieve accurate registration of tooth point clouds in different stages of orthodontic treatment. This method uses the point distance, normal vector angle, and bidirectional spatial angular distances as the input geometric embedding of transformer, and then uses a proposed variable multiscale attention mechanism to achieve geometric information perception from local to global perspectives. Experiments on the Shing3D Dental Dataset demonstrate the effectiveness of our approach and that it outperforms other state-of-the-art approaches across multiple metrics. Our code and models are available at GeoSwin.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于正畸牙齿点云注册的多尺度几何窗口变换器

过去十年间，数字化正畸治疗监测越来越受到关注。然而，目前基于深度学习的方法仍然面临着困难的挑战。变压器具有出色的长期依赖性建模能力，可以应用于牙齿点云配准任务。然而，大多数基于变换器的点云配准网络都存在两个问题。首先，它们缺乏可信的几何信息嵌入，导致学习到的特征不具有几何鉴别力，并且模糊了异常值和离群值之间的界限。其次，在超像素级的几何变换不变特征提取过程中，注意力机制缺乏连续的下采样，从而限制了视场，并可能限制模型对局部和全局信息的感知。在本文中，我们提出了 GeoSwin，它使用一种新颖的几何窗口变换器来实现正畸治疗不同阶段牙齿点云的精确配准。该方法使用点距、法向量角和双向空间角距作为变换器的输入几何嵌入，然后使用提出的可变多尺度注意机制实现从局部到全局的几何信息感知。在 Shing3D Dental Dataset 上的实验证明了我们的方法的有效性，并且在多个指标上都优于其他最先进的方法。我们的代码和模型可在 GeoSwin 网站上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Multimedia Systems 工程技术-计算机：理论方法

CiteScore

5.40

自引率

7.70%

发文量

148

审稿时长

4.5 months

期刊介绍： This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.