SeisAug: A data augmentation python toolkit

IF 3.2 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Applied Computing and Geosciences Pub Date : 2025-02-01 Epub Date: 2025-03-06 DOI:10.1016/j.acags.2025.100232
D. Pragnath , G. Srijayanthi , Santosh Kumar , Sumer Chopra
{"title":"SeisAug: A data augmentation python toolkit","authors":"D. Pragnath ,&nbsp;G. Srijayanthi ,&nbsp;Santosh Kumar ,&nbsp;Sumer Chopra","doi":"10.1016/j.acags.2025.100232","DOIUrl":null,"url":null,"abstract":"<div><div>A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.</div></div>","PeriodicalId":33804,"journal":{"name":"Applied Computing and Geosciences","volume":"25 ","pages":"Article 100232"},"PeriodicalIF":3.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Geosciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S259019742500014X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/6 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

A common limitation in applying any deep learning and machine learning techniques is the limited labelled dataset which can be addressed through Data augmentation (DA). SeisAug is a DA python toolkit to address this challenge in seismological studies. DA. DA helps to balance the imbalanced classes of a dataset by creating more examples of under-represented classes. It significantly mitigates overfitting by increasing the volume of training data and introducing variability, thereby improving the model's performance on unseen data. Given the rapid advancements in deep learning for seismology, ‘SeisAug’ assists in extensibility by generating a substantial amount of data (2–6 times more data) which can aid in developing an indigenous robust model. Further, this study demonstrates the role of DA in developing a robust model. For this we utilized a basic two class identification models between earthquake/signal and noise/(non-earthquake). The model is trained with original, 1 and 5 times augmented datasets and their performance metrics are evaluated. The model trained with 5X times augmented dataset significantly outperforms with accuracy of 0.991, AUC 0.999 and AUC-PR 0.999 compared to the model trained with original dataset with accuracy of 0.50, AUC 0.75 and AUC-PR 0.80. Furthermore, by making all codes available on GitHub, the toolkit facilitates the easy application of DA techniques, empowering end-users to enhance their seismological waveform datasets effectively and overcome the initial drawbacks posed by the scarcity of labelled data.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一个数据增强python工具包
应用任何深度学习和机器学习技术的一个常见限制是有限的标记数据集,可以通过数据增强(DA)来解决。SeisAug是一个数据处理python工具包,用于解决地震学研究中的这一挑战。哒。数据分析通过创建更多代表性不足的类的示例来帮助平衡数据集的不平衡类。它通过增加训练数据量和引入可变性来显著减轻过拟合,从而提高模型在未见数据上的性能。鉴于地震学深度学习的快速发展,“SeisAug”通过生成大量数据(2-6倍的数据)来帮助扩展,这可以帮助开发本地鲁棒模型。此外,本研究证明了数据分析在建立稳健模型中的作用。为此,我们利用了地震/信号和噪声/(非地震)之间的基本两类识别模型。使用原始、1倍和5倍增强数据集训练模型,并评估其性能指标。5倍增强数据集训练的模型准确率为0.991,AUC为0.999,AUC- pr为0.999,明显优于原始数据集训练的模型,准确率为0.50,AUC为0.75,AUC- pr为0.80。此外,通过在GitHub上提供所有代码,该工具包促进了数据分析技术的简单应用,使最终用户能够有效地增强他们的地震波形数据集,并克服了标记数据稀缺所带来的最初缺点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Applied Computing and Geosciences
Applied Computing and Geosciences Computer Science-General Computer Science
CiteScore
5.50
自引率
0.00%
发文量
23
审稿时长
5 weeks
期刊最新文献
Quartz grain microtexture analysis using Artificial Intelligence: Application to tsunami and storm deposits provenance studies Editorial Board Enhancing neuro-symbolic AI for mineral prediction via LLM-guided knowledge integration Synergizing Radon transform and DINOv2 for artifact-resilient digital rock segmentation Comparison of deep learning models for 1D magnetotelluric inversion
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1