{"title":"DiTing:用于地震学人工智能的大规模中国地震基准数据集","authors":"Ming Zhao , Zhuowei Xiao , Shi Chen , Lihua Fang","doi":"10.1016/j.eqs.2022.01.022","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, artificial intelligence technology has exhibited great potential in seismic signal recognition, setting off a new wave of research. Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research. In this study, based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center, we constructed an artificial intelligence seismological training dataset (“DiTing”) with the largest known total time length. Data were recorded using broadband and short-period seismometers. The obtained dataset included 2,734,748 three-component waveform traces from 787,010 regional seismic events, the corresponding P- and S-phase arrival time labels, and 641,025 P-wave first-motion polarity labels. All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake. Each three-component waveform contained a considerable amount of descriptive information, such as the epicentral distance, back azimuth, and signal-to-noise ratios. The magnitudes of seismic events, epicentral distance, signal-to-noise ratio of P-wave data, and signal-to-noise ratio of S-wave data ranged from 0 to 7.7, 0 to 330 km, –0.05 to 5.31 dB, and –0.05 to 4.73 dB, respectively. The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection, seismic phase picking, first-motion polarity determination, earthquake magnitude prediction, early warning systems, and strong ground-motion prediction. Such research will further promote the development and application of artificial intelligence in seismology.</p></div>","PeriodicalId":46333,"journal":{"name":"Earthquake Science","volume":"36 2","pages":"Pages 84-94"},"PeriodicalIF":1.2000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"DiTing: A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology\",\"authors\":\"Ming Zhao , Zhuowei Xiao , Shi Chen , Lihua Fang\",\"doi\":\"10.1016/j.eqs.2022.01.022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In recent years, artificial intelligence technology has exhibited great potential in seismic signal recognition, setting off a new wave of research. Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research. In this study, based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center, we constructed an artificial intelligence seismological training dataset (“DiTing”) with the largest known total time length. Data were recorded using broadband and short-period seismometers. The obtained dataset included 2,734,748 three-component waveform traces from 787,010 regional seismic events, the corresponding P- and S-phase arrival time labels, and 641,025 P-wave first-motion polarity labels. All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake. Each three-component waveform contained a considerable amount of descriptive information, such as the epicentral distance, back azimuth, and signal-to-noise ratios. The magnitudes of seismic events, epicentral distance, signal-to-noise ratio of P-wave data, and signal-to-noise ratio of S-wave data ranged from 0 to 7.7, 0 to 330 km, –0.05 to 5.31 dB, and –0.05 to 4.73 dB, respectively. The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection, seismic phase picking, first-motion polarity determination, earthquake magnitude prediction, early warning systems, and strong ground-motion prediction. Such research will further promote the development and application of artificial intelligence in seismology.</p></div>\",\"PeriodicalId\":46333,\"journal\":{\"name\":\"Earthquake Science\",\"volume\":\"36 2\",\"pages\":\"Pages 84-94\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2023-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Earthquake Science\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1674451922000222\",\"RegionNum\":4,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Earth and Planetary Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earthquake Science","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1674451922000222","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
DiTing: A large-scale Chinese seismic benchmark dataset for artificial intelligence in seismology
In recent years, artificial intelligence technology has exhibited great potential in seismic signal recognition, setting off a new wave of research. Vast amounts of high-quality labeled data are required to develop and apply artificial intelligence in seismology research. In this study, based on the 2013–2020 seismic cataloging reports of the China Earthquake Networks Center, we constructed an artificial intelligence seismological training dataset (“DiTing”) with the largest known total time length. Data were recorded using broadband and short-period seismometers. The obtained dataset included 2,734,748 three-component waveform traces from 787,010 regional seismic events, the corresponding P- and S-phase arrival time labels, and 641,025 P-wave first-motion polarity labels. All waveforms were sampled at 50 Hz and cut to a time length of 180 s starting from a random number of seconds before the occurrence of an earthquake. Each three-component waveform contained a considerable amount of descriptive information, such as the epicentral distance, back azimuth, and signal-to-noise ratios. The magnitudes of seismic events, epicentral distance, signal-to-noise ratio of P-wave data, and signal-to-noise ratio of S-wave data ranged from 0 to 7.7, 0 to 330 km, –0.05 to 5.31 dB, and –0.05 to 4.73 dB, respectively. The dataset compiled in this study can serve as a high-quality benchmark for machine learning model development and data-driven seismological research on earthquake detection, seismic phase picking, first-motion polarity determination, earthquake magnitude prediction, early warning systems, and strong ground-motion prediction. Such research will further promote the development and application of artificial intelligence in seismology.
期刊介绍:
Earthquake Science (EQS) aims to publish high-quality, original, peer-reviewed articles on earthquake-related research subjects. It is an English international journal sponsored by the Seismological Society of China and the Institute of Geophysics, China Earthquake Administration.
The topics include, but not limited to, the following
● Seismic sources of all kinds.
● Earth structure at all scales.
● Seismotectonics.
● New methods and theoretical seismology.
● Strong ground motion.
● Seismic phenomena of all kinds.
● Seismic hazards, earthquake forecasting and prediction.
● Seismic instrumentation.
● Significant recent or past seismic events.
● Documentation of recent seismic events or important observations.
● Descriptions of field deployments, new methods, and available software tools.
The types of manuscripts include the following. There is no length requirement, except for the Short Notes.
【Articles】 Original contributions that have not been published elsewhere.
【Short Notes】 Short papers of recent events or topics that warrant rapid peer reviews and publications. Limited to 4 publication pages.
【Rapid Communications】 Significant contributions that warrant rapid peer reviews and publications.
【Review Articles】Review articles are by invitation only. Please contact the editorial office and editors for possible proposals.
【Toolboxes】 Descriptions of novel numerical methods and associated computer codes.
【Data Products】 Documentation of datasets of various kinds that are interested to the community and available for open access (field data, processed data, synthetic data, or models).
【Opinions】Views on important topics and future directions in earthquake science.
【Comments and Replies】Commentaries on a recently published EQS paper is welcome. The authors of the paper commented will be invited to reply. Both the Comment and the Reply are subject to peer review.