{"title":"CREDIT-X1local: A reference dataset for machine learning seismology from ChinArray in Southwest China","authors":"Lu Li , Weitao Wang , Ziye Yu , Yini Chen","doi":"10.1016/j.eqs.2024.01.018","DOIUrl":null,"url":null,"abstract":"<div><p>High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology. Here, we present an earthquake dataset based on the ChinArray Phase I records (X1). ChinArray Phase I was deployed in the southern north-south seismic zone (20° N–32° N, 95° E–110° E) in 2011–2013 using 355 portable broadband seismic stations. CREDIT-X1local, the first release of the ChinArray Reference Earthquake Dataset for Innovative Techniques (CREDIT), includes comprehensive information for the 105,455 local events that occurred in the southern north-south seismic zone during array observation, incorporating them into a single HDF5 file. Original 100-Hz sampled three-component waveforms are organized by event for stations within epicenter distances of 1,000 km, and records of ≥ 200 s are included for each waveform. Two types of phase labels are provided. The first includes manually picked labels for 5,999 events with magnitudes ≥ 2.0, providing 66,507 Pg, 42,310 Sg, 12,823 Pn, and 546 Sn phases. The second contains automatically labeled phases for 105,442 events with magnitudes of −1.6 to 7.6. These phases were picked using a recurrent neural network phase picker and screened using the corresponding travel time curves, resulting in 1,179,808 Pg, 884,281 Sg, 176,089 Pn, and 22,986 Sn phases. Additionally, first-motion polarities are included for 31,273 Pg phases. The event and station locations are provided, so that deep learning networks for both conventional phase picking and phase association can be trained and validated. The CREDIT-X1local dataset is the first million-scale dataset constructed from a dense seismic array, which is designed to support various multi-station deep-learning methods, high-precision focal mechanism inversion, and seismic tomography studies. Additionally, owing to the high seismicity in the southern north-south seismic zone in China, this dataset has great potential for future scientific discoveries.</p></div>","PeriodicalId":46333,"journal":{"name":"Earthquake Science","volume":"37 2","pages":"Pages 139-157"},"PeriodicalIF":1.2000,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1674451924000223/pdfft?md5=8e02eb44f9bdbf58fbce3bd1a13349cf&pid=1-s2.0-S1674451924000223-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Earthquake Science","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1674451924000223","RegionNum":4,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
High-quality datasets are critical for the development of advanced machine-learning algorithms in seismology. Here, we present an earthquake dataset based on the ChinArray Phase I records (X1). ChinArray Phase I was deployed in the southern north-south seismic zone (20° N–32° N, 95° E–110° E) in 2011–2013 using 355 portable broadband seismic stations. CREDIT-X1local, the first release of the ChinArray Reference Earthquake Dataset for Innovative Techniques (CREDIT), includes comprehensive information for the 105,455 local events that occurred in the southern north-south seismic zone during array observation, incorporating them into a single HDF5 file. Original 100-Hz sampled three-component waveforms are organized by event for stations within epicenter distances of 1,000 km, and records of ≥ 200 s are included for each waveform. Two types of phase labels are provided. The first includes manually picked labels for 5,999 events with magnitudes ≥ 2.0, providing 66,507 Pg, 42,310 Sg, 12,823 Pn, and 546 Sn phases. The second contains automatically labeled phases for 105,442 events with magnitudes of −1.6 to 7.6. These phases were picked using a recurrent neural network phase picker and screened using the corresponding travel time curves, resulting in 1,179,808 Pg, 884,281 Sg, 176,089 Pn, and 22,986 Sn phases. Additionally, first-motion polarities are included for 31,273 Pg phases. The event and station locations are provided, so that deep learning networks for both conventional phase picking and phase association can be trained and validated. The CREDIT-X1local dataset is the first million-scale dataset constructed from a dense seismic array, which is designed to support various multi-station deep-learning methods, high-precision focal mechanism inversion, and seismic tomography studies. Additionally, owing to the high seismicity in the southern north-south seismic zone in China, this dataset has great potential for future scientific discoveries.
期刊介绍:
Earthquake Science (EQS) aims to publish high-quality, original, peer-reviewed articles on earthquake-related research subjects. It is an English international journal sponsored by the Seismological Society of China and the Institute of Geophysics, China Earthquake Administration.
The topics include, but not limited to, the following
● Seismic sources of all kinds.
● Earth structure at all scales.
● Seismotectonics.
● New methods and theoretical seismology.
● Strong ground motion.
● Seismic phenomena of all kinds.
● Seismic hazards, earthquake forecasting and prediction.
● Seismic instrumentation.
● Significant recent or past seismic events.
● Documentation of recent seismic events or important observations.
● Descriptions of field deployments, new methods, and available software tools.
The types of manuscripts include the following. There is no length requirement, except for the Short Notes.
【Articles】 Original contributions that have not been published elsewhere.
【Short Notes】 Short papers of recent events or topics that warrant rapid peer reviews and publications. Limited to 4 publication pages.
【Rapid Communications】 Significant contributions that warrant rapid peer reviews and publications.
【Review Articles】Review articles are by invitation only. Please contact the editorial office and editors for possible proposals.
【Toolboxes】 Descriptions of novel numerical methods and associated computer codes.
【Data Products】 Documentation of datasets of various kinds that are interested to the community and available for open access (field data, processed data, synthetic data, or models).
【Opinions】Views on important topics and future directions in earthquake science.
【Comments and Replies】Commentaries on a recently published EQS paper is welcome. The authors of the paper commented will be invited to reply. Both the Comment and the Reply are subject to peer review.