Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer

IF 4.2 2区 地球科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computers & Geosciences Pub Date : 2024-08-17 DOI:10.1016/j.cageo.2024.105701
Fukai Zhang , Zhengli Yan , Chao Liu , Haiyan Zhang , Shan Zhao , Jun Liu , Ziqi Zhao
{"title":"Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer","authors":"Fukai Zhang ,&nbsp;Zhengli Yan ,&nbsp;Chao Liu ,&nbsp;Haiyan Zhang ,&nbsp;Shan Zhao ,&nbsp;Jun Liu ,&nbsp;Ziqi Zhao","doi":"10.1016/j.cageo.2024.105701","DOIUrl":null,"url":null,"abstract":"<div><p>The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":"192 ","pages":"Article 105701"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001845","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用转换器进行图像-文本整合,加强对燧石化石的分类鉴定
燧石化石的准确分类鉴定在古生物学、古生态学和古地理学中具有重要的科学价值。然而,图像样本的不平衡导致模型倾向于从样本较多的类别中学习特征,而忽略样本较少的类别,从而大大降低了化石鉴定的预测准确性。此外,化石的文字描述包含丰富的特征信息。我们收集并创建了一个顺序化石多模态(OFM)数据集进行研究。我们提出了一种基于变压器的多模态集成框架(TMIF),利用深度学习来识别燧石化石。与传统的神经网络相比,变换器可以在不同位置的特征之间建立全局依赖关系。TMIF 包含图像和文本分支,专门用于提取两种模态的特征,还有一个关键的跨模态整合模块,可以让视觉特征充分学习文本语义特征,从而获得更全面的特征表示。使用 OFM 数据集进行的实验评估表明,TMIF 的预测准确率达到了 81.7%,比仅基于图像的方法提高了 2.8%。对多个网络的进一步比较分析表明,TMIF 在解决样本不平衡的燧石化石分类鉴定方面表现最佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computers & Geosciences
Computers & Geosciences 地学-地球科学综合
CiteScore
9.30
自引率
6.80%
发文量
164
审稿时长
3.4 months
期刊介绍: Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.
期刊最新文献
Multimodal feature integration network for lithology identification from point cloud data A two-dimensional magnetotelluric deep learning inversion approach based on improved Dense Convolutional Network Removing atmospheric noise from InSAR interferograms in mountainous regions with a convolutional neural network Novel empirical curvelet denoising strategy for suppressing mixed noise of microseismic data Curvilinear lineament extraction: Bayesian optimization of Principal Component Wavelet Analysis and Hysteresis Thresholding
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1