Change point detection of events in molecular simulations using dupin

IF 7.2 2区 物理与天体物理 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer Physics Communications Pub Date : 2024-06-28 DOI:10.1016/j.cpc.2024.109297
Brandon L. Butler , Domagoj Fijan , Sharon C. Glotzer
{"title":"Change point detection of events in molecular simulations using dupin","authors":"Brandon L. Butler ,&nbsp;Domagoj Fijan ,&nbsp;Sharon C. Glotzer","doi":"10.1016/j.cpc.2024.109297","DOIUrl":null,"url":null,"abstract":"<div><p>Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via <em>ad hoc</em> solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, <span>dupin</span>, that allows for universal event detection from particle trajectory data irrespective of the system details. <span>dupin</span> works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, <span>dupin</span> can serve as a tool in automated and reproducible workflows. We demonstrate the application of <span>dupin</span> using three examples and discuss its applicability to a wider class of problems.</p></div><div><h3>Program summary</h3><p><em>Program Title:</em> <span>dupin</span></p><p><em>CPC Library link to program files:</em> <span>https://doi.org/10.17632/kjcn97zc46.1</span><svg><path></path></svg>%</p><p><em>Developer's repository link::</em> <span>https://github.com/glotzerlab/dupin</span><svg><path></path></svg></p><p><em>Licensing provisions:</em> BSD 3-clause</p><p><em>Programming language:</em> Python</p><p><em>Nature of problem:</em> In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.</p><p><em>Solution method:</em> We develop a versatile python package called <span>dupin</span> for detecting molecular events and structural transitions in simulation trajectories. <span>dupin</span>'s workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In <span>dupin</span>, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package <span>ruptures</span>. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, <span>dupin</span> may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.</p><p><em>Additional comments including restrictions and unusual features:</em> Our package, <span>dupin</span>, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.</p></div>","PeriodicalId":285,"journal":{"name":"Computer Physics Communications","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Physics Communications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010465524002200","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Particle tracking is commonly used to study time-dependent behavior in many different types of physical and chemical systems involving constituents that span many length scales, including atoms, molecules, nanoparticles, granular particles, and even larger objects. Behaviors of interest studied using particle tracking information include disorder-order transitions, thermodynamic phase transitions, structural transitions, protein folding, crystallization, gelation, swarming, avalanches and fracture. A common challenge in studies of these systems involves change detection. Change point detection discerns when a temporal signal undergoes a change in distribution. These changes can be local or global, instantaneous or prolonged, obvious or subtle. Moreover, system-wide changes marking an interesting physical or chemical phenomenon (e.g. crystallization of a liquid) are often preceded by events (e.g. pre-nucleation clusters) that are localized and can occur anywhere at anytime in the system. For these reasons, detecting events in particle trajectories generated by molecular simulation is challenging and typically accomplished via ad hoc solutions unique to the behavior and system under study. Consequently, methods for event detection lack generality, and those used in one field are not easily used by scientists in other fields. Here we present a new Python-based tool, dupin, that allows for universal event detection from particle trajectory data irrespective of the system details. dupin works by creating a signal representing the simulation and partitioning the signal based on events (changes within the trajectory). This approach allows for studies where manual annotating of event boundaries would require a prohibitive amount of time. Furthermore, dupin can serve as a tool in automated and reproducible workflows. We demonstrate the application of dupin using three examples and discuss its applicability to a wider class of problems.

Program summary

Program Title: dupin

CPC Library link to program files: https://doi.org/10.17632/kjcn97zc46.1%

Developer's repository link:: https://github.com/glotzerlab/dupin

Licensing provisions: BSD 3-clause

Programming language: Python

Nature of problem: In the field of molecular simulations, detecting structural transitions or events within trajectories can be both challenging and time-consuming for larger studies due to the requirement of a manual approach. This issue is particularly pronounced in studies involving hundreds or thousands of simulations, where manual detection and analysis of transitions become infeasible. Our goal is to develop an automated, accurate and efficient method for detecting transition points in simulation trajectories, which both saves time and aids researchers in uncovering important events and their underlying causes in various systems. Additionally, we aim to facilitate new machine learning applications to important materials problems such as predicting and designing crystallization pathways, predicting defect formation, and describing the behavior of active matter, all of which involve structural transitions occurring over time. The developed method should be applicable to offline and online detection, enabling event-dependent triggers for advanced simulation/experimental protocols and efficient processing and storing of data.

Solution method: We develop a versatile python package called dupin for detecting molecular events and structural transitions in simulation trajectories. dupin's workflow pipeline includes three major stages: data preprocessing, data augmentation, and detection. The components of this pipeline collectively improve the accuracy and efficiency of identifying structural changes in particle trajectory data. In data preprocessing, we generate and aggregate data into a comprehensive representation of the system. Data augmentation techniques such as feature selection and dimensionality reduction counteract the noise arising from high-dimensional data and enhance computational performance. We detect change points within the trajectory indicating transition events using a cost-based event detection method. In dupin, we implement two cost functions based on piecewise linear fits, which offer different levels of sensitivity to sudden shifts and changes in the signal. The package can use any cost-based detection algorithm but has a special interface for the Python package ruptures. Regardless of detection algorithm, we use the cost function and “elbow” detection to determine the correct number of change points. The detection scheme can be applied both offline and online, enabling real-time analysis of molecular events as simulations progress. As an example, dupin may be used to trigger a high frequency storage of frames within a simulation upon nucleation and subsequent solidification of a liquid into a crystal. Our method demonstrates a high degree of accuracy in detecting transition points within simulation trajectories when provided with informative descriptors. By automating the detection process, our solution enables efficient change point detection for studies with large-scale simulations.

Additional comments including restrictions and unusual features: Our package, dupin, has great promise in detecting transition points within simulation trajectories with a high degree of accuracy; nonetheless, it is essential to note that it relies heavily on the selection of informative descriptors. The accuracy of the detection may be compromised if the chosen descriptors do not effectively capture the changes in a system's properties. However, this restriction can be mitigated by selecting a diverse range of descriptors and applying a feature selection tool to refine the signal.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用杜宾检测分子模拟中的事件变化点
我们的方法证明,在提供信息描述符的情况下,在仿真轨迹中检测转换点的准确度很高。通过自动检测过程,我们的解决方案能够为大规模模拟研究提供高效的变化点检测:我们的软件包 dupin 在高精度检测仿真轨迹中的变化点方面大有可为;但必须注意的是,它在很大程度上依赖于对信息描述符的选择。如果选择的描述符不能有效捕捉系统属性的变化,检测的准确性就会大打折扣。不过,可以通过选择多种描述符和应用特征选择工具来完善信号,从而减轻这种限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer Physics Communications
Computer Physics Communications 物理-计算机:跨学科应用
CiteScore
12.10
自引率
3.20%
发文量
287
审稿时长
5.3 months
期刊介绍: The focus of CPC is on contemporary computational methods and techniques and their implementation, the effectiveness of which will normally be evidenced by the author(s) within the context of a substantive problem in physics. Within this setting CPC publishes two types of paper. Computer Programs in Physics (CPiP) These papers describe significant computer programs to be archived in the CPC Program Library which is held in the Mendeley Data repository. The submitted software must be covered by an approved open source licence. Papers and associated computer programs that address a problem of contemporary interest in physics that cannot be solved by current software are particularly encouraged. Computational Physics Papers (CP) These are research papers in, but are not limited to, the following themes across computational physics and related disciplines. mathematical and numerical methods and algorithms; computational models including those associated with the design, control and analysis of experiments; and algebraic computation. Each will normally include software implementation and performance details. The software implementation should, ideally, be available via GitHub, Zenodo or an institutional repository.In addition, research papers on the impact of advanced computer architecture and special purpose computers on computing in the physical sciences and software topics related to, and of importance in, the physical sciences may be considered.
期刊最新文献
A novel model for direct numerical simulation of suspension dynamics with arbitrarily shaped convex particles Editorial Board Study α decay and proton emission based on data-driven symbolic regression Efficient determination of free energies of non-ideal solid solutions via hybrid Monte Carlo simulations 1D drift-kinetic numerical model based on semi-implicit particle-in-cell method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1