{"title":"Enhanced taxonomic identification of fusulinid fossils through image–text integration using transformer","authors":"","doi":"10.1016/j.cageo.2024.105701","DOIUrl":null,"url":null,"abstract":"<div><p>The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.</p></div>","PeriodicalId":55221,"journal":{"name":"Computers & Geosciences","volume":null,"pages":null},"PeriodicalIF":4.2000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Geosciences","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098300424001845","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate taxonomic identification of fusulinid fossils holds significant scientific value in palaeontology, paleoecology, and palaeogeography. However, imbalanced image samples lead to the model preferring to learn features from categories with many samples while ignoring fewer sample categories, greatly reducing the prediction accuracy of fusulinid fossil identification. Moreover, the textual description of fusulinid fossils contains rich feature information. We collected and created an order fusulinid multimodal (OFM) dataset for research. We proposed a transformer-based multimodal integration framework (TMIF) using deep learning for fusulinid fossil identification. Compared to traditional neural networks, the transformer can create global dependencies between features at different locations. TMIF incorporates image and text branches dedicated to extracting features for both modalities, and a pivotal cross-modal integration module that allows visual features to learn textual semantic features sufficiently to obtain a more comprehensive feature representation. Experimental evaluation using the OFM dataset shows that TMIF achieves a prediction accuracy of 81.7%, which is a 2.8% improvement over the only image-based method. Further comparative analyses across multiple networks affirm that the TMIF performs optimally in addressing the taxonomic identification of fusulinid fossils with imbalanced samples.
期刊介绍:
Computers & Geosciences publishes high impact, original research at the interface between Computer Sciences and Geosciences. Publications should apply modern computer science paradigms, whether computational or informatics-based, to address problems in the geosciences.