Sediment grain segmentation in thin-section images using dual-modal Vision Transformer

IF 4.2 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Geosciences Pub Date : 2024-06-21 DOI:10.1016/j.cageo.2024.105664

Dongyu Zheng , Li Hou , Xiumian Hu , Mingcai Hou , Kai Dong , Sihai Hu , Runlin Teng , Chao Ma

{"title":"Sediment grain segmentation in thin-section images using dual-modal Vision Transformer","authors":"Dongyu Zheng , Li Hou , Xiumian Hu , Mingcai Hou , Kai Dong , Sihai Hu , Runlin Teng , Chao Ma","doi":"10.1016/j.cageo.2024.105664","DOIUrl":null,"url":null,"abstract":"<div><p>Accurately identifying grain types in thin sections of sandy sediments or sandstones is crucial for understanding their provenance, depositional environments, and potential as natural resources. Although traditional computer vision methods and machine learning algorithms have been used for automatic grain identification, recent advancements in deep learning techniques have opened up new possibilities for achieving more reliable results with less manual labor. In this study, we present Trans-SedNet, a state-of-the-art dual-modal Vision-Transformer (ViT) model that uses both cross- (XPL) and plane-polarized light (PPL) images to achieve semantic segmentation of thin-section images. Our model classifies a total of ten grain types, including subtypes of quartz, feldspar, and lithic fragments, to emulate the manual identification process in sedimentary petrology. To optimize performance, we use SegFormer as the model backbone and add window- and mix-attention to the encoder to identify local information in the images and to best use XPL and PPL images. We also use a combination of focal and dice loss and a smoothing procedure to address imbalances and reduce over-segmentation. Our comparative analysis of several deep convolution neural networks and ViT models, including FCN, U-Net, DeepLabV3Plus, SegNeXT, and CMX, shows that Trans-SedNet outperforms the other models with a significant increase in evaluation metrics of mIoU and mPA. We also conduct an experiment to test the models' ability to handle dual-modal information, which reveals that the dual-modal models, including Trans-SedNet, achieve better results than single-modal models with the extra input of PPL images. Our study demonstrates the potential of ViT models in semantic segmentation of thin-section images and highlights the importance of dual-modal models for handling complex input in various geoscience disciplines. By improving data quality and quantity, our model has the potential to enhance the efficiency and reliability of grain identification in sedimentary petrology and relevant subjects.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"191 ","pages":"Article 105664"},"PeriodicalIF":4.2000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S009830042400147X","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately identifying grain types in thin sections of sandy sediments or sandstones is crucial for understanding their provenance, depositional environments, and potential as natural resources. Although traditional computer vision methods and machine learning algorithms have been used for automatic grain identification, recent advancements in deep learning techniques have opened up new possibilities for achieving more reliable results with less manual labor. In this study, we present Trans-SedNet, a state-of-the-art dual-modal Vision-Transformer (ViT) model that uses both cross- (XPL) and plane-polarized light (PPL) images to achieve semantic segmentation of thin-section images. Our model classifies a total of ten grain types, including subtypes of quartz, feldspar, and lithic fragments, to emulate the manual identification process in sedimentary petrology. To optimize performance, we use SegFormer as the model backbone and add window- and mix-attention to the encoder to identify local information in the images and to best use XPL and PPL images. We also use a combination of focal and dice loss and a smoothing procedure to address imbalances and reduce over-segmentation. Our comparative analysis of several deep convolution neural networks and ViT models, including FCN, U-Net, DeepLabV3Plus, SegNeXT, and CMX, shows that Trans-SedNet outperforms the other models with a significant increase in evaluation metrics of mIoU and mPA. We also conduct an experiment to test the models' ability to handle dual-modal information, which reveals that the dual-modal models, including Trans-SedNet, achieve better results than single-modal models with the extra input of PPL images. Our study demonstrates the potential of ViT models in semantic segmentation of thin-section images and highlights the importance of dual-modal models for handling complex input in various geoscience disciplines. By improving data quality and quantity, our model has the potential to enhance the efficiency and reliability of grain identification in sedimentary petrology and relevant subjects.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用双模视觉变换器在薄片图像中分割沉积物颗粒

准确识别砂质沉积物或砂岩薄片中的晶粒类型对于了解其出处、沉积环境和作为自然资源的潜力至关重要。虽然传统的计算机视觉方法和机器学习算法已被用于谷物自动识别，但深度学习技术的最新进展为以更少的人工劳动获得更可靠的结果提供了新的可能性。在本研究中，我们提出了 Trans-SedNet，这是一种最先进的双模态视觉变换器（ViT）模型，它同时使用交叉光（XPL）和平面偏振光（PPL）图像来实现薄片图像的语义分割。我们的模型共可对十种晶粒类型进行分类，包括石英、长石和碎石的子类型，以模拟沉积岩石学中的人工识别过程。为了优化性能，我们使用 SegFormer 作为模型主干，并在编码器中添加了窗口和混合注意，以识别图像中的局部信息，并充分利用 XPL 和 PPL 图像。我们还结合使用了焦点损失和骰子损失以及平滑程序，以解决不平衡问题并减少过度分割。我们对几种深度卷积神经网络和 ViT 模型（包括 FCN、U-Net、DeepLabV3Plus、SegNeXT 和 CMX）进行了比较分析，结果表明 Trans-SedNet 的 mIoU 和 mPA 评估指标显著提高，优于其他模型。我们还进行了一项实验来测试模型处理双模态信息的能力，结果表明，在额外输入 PPL 图像的情况下，包括 Trans-SedNet 在内的双模态模型比单模态模型取得了更好的结果。我们的研究证明了 ViT 模型在薄断面图像语义分割方面的潜力，并强调了双模态模型在处理各种地球科学学科复杂输入方面的重要性。通过提高数据质量和数量，我们的模型有可能提高沉积岩石学和相关学科中晶粒识别的效率和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.