{"title":"An image dataset of fusulinid foraminifera generated with the aid of deep learning","authors":"Hanhui Huang, Yukun Shi, Qin Chen, Huiqing Xu, Sicong Song, Yujie Shi, Furao Shen, Junxuan Fan","doi":"10.1002/gdj3.215","DOIUrl":null,"url":null,"abstract":"<p>Fusulinid foraminifera are among the most common microfossils of the Late Palaeozoic and act as key fossils for stratigraphic correlation, paleogeographic and paleoenvironmental indication, and evolutionary studies of marine life. Accurate and efficient identification forms the basis of such research involving fusulinids but is limited by the lack of digitized image datasets. This article presents the first large image dataset of fusulinids containing 2,400 images of individual samples subjected to 16 genera of all six fusulinid families and labelled to species level. These images were collected from the literature and our unpublished samples through an automatic segmentation procedure implementing BlendMask, a deep learning model. The dataset shows promise for the efficient accumulation of fossil images through automated procedures and will facilitate taxonomists in future morphologic and systematic studies.</p>","PeriodicalId":54351,"journal":{"name":"Geoscience Data Journal","volume":"11 1","pages":"46-56"},"PeriodicalIF":3.3000,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/gdj3.215","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoscience Data Journal","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/gdj3.215","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GEOSCIENCES, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Fusulinid foraminifera are among the most common microfossils of the Late Palaeozoic and act as key fossils for stratigraphic correlation, paleogeographic and paleoenvironmental indication, and evolutionary studies of marine life. Accurate and efficient identification forms the basis of such research involving fusulinids but is limited by the lack of digitized image datasets. This article presents the first large image dataset of fusulinids containing 2,400 images of individual samples subjected to 16 genera of all six fusulinid families and labelled to species level. These images were collected from the literature and our unpublished samples through an automatic segmentation procedure implementing BlendMask, a deep learning model. The dataset shows promise for the efficient accumulation of fossil images through automated procedures and will facilitate taxonomists in future morphologic and systematic studies.
Geoscience Data JournalGEOSCIENCES, MULTIDISCIPLINARYMETEOROLOGY-METEOROLOGY & ATMOSPHERIC SCIENCES
CiteScore
5.90
自引率
9.40%
发文量
35
审稿时长
4 weeks
期刊介绍:
Geoscience Data Journal provides an Open Access platform where scientific data can be formally published, in a way that includes scientific peer-review. Thus the dataset creator attains full credit for their efforts, while also improving the scientific record, providing version control for the community and allowing major datasets to be fully described, cited and discovered.
An online-only journal, GDJ publishes short data papers cross-linked to – and citing – datasets that have been deposited in approved data centres and awarded DOIs. The journal will also accept articles on data services, and articles which support and inform data publishing best practices.
Data is at the heart of science and scientific endeavour. The curation of data and the science associated with it is as important as ever in our understanding of the changing earth system and thereby enabling us to make future predictions. Geoscience Data Journal is working with recognised Data Centres across the globe to develop the future strategy for data publication, the recognition of the value of data and the communication and exploitation of data to the wider science and stakeholder communities.