Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer

IF 4.2 2区地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Geosciences Pub Date : 2024-08-17 DOI:10.1016/j.cageo.2024.105701

Fukai Zhang , Zhengli Yan , Chao Liu , Haiyan Zhang , Shan Zhao , Jun Liu , Ziqi Zhao

{"title":"Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer","authors":"Fukai Zhang , Zhengli Yan , Chao Liu , Haiyan Zhang , Shan Zhao , Jun Liu , Ziqi Zhao","doi":"10.1016/j.cageo.2024.105701","DOIUrl":null,"url":null,"abstract":"<div><p>The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"192 ","pages":"Article 105701"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001845","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用转换器进行图像-文本整合，加强对燧石化石的分类鉴定

燧石化石的准确分类鉴定在古生物学、古生态学和古地理学中具有重要的科学价值。然而，图像样本的不平衡导致模型倾向于从样本较多的类别中学习特征，而忽略样本较少的类别，从而大大降低了化石鉴定的预测准确性。此外，化石的文字描述包含丰富的特征信息。我们收集并创建了一个顺序化石多模态（OFM）数据集进行研究。我们提出了一种基于变压器的多模态集成框架（TMIF），利用深度学习来识别燧石化石。与传统的神经网络相比，变换器可以在不同位置的特征之间建立全局依赖关系。TMIF 包含图像和文本分支，专门用于提取两种模态的特征，还有一个关键的跨模态整合模块，可以让视觉特征充分学习文本语义特征，从而获得更全面的特征表示。使用 OFM 数据集进行的实验评估表明，TMIF 的预测准确率达到了 81.7%，比仅基于图像的方法提高了 2.8%。对多个网络的进一步比较分析表明，TMIF 在解决样本不平衡的燧石化石分类鉴定方面表现最佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computers & Geosciences 地学-地球科学综合

CiteScore

9.30

自引率

6.80%

发文量

164

审稿时长

3.4 months

期刊介绍： Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.