Molecular Design Using Signal Processing and Machine Learning: Time-Frequency-like Representation and Forward Design

A. Tchagang, A. Tewfik, Julio J. Vald'es
{"title":"Molecular Design Using Signal Processing and Machine Learning: Time-Frequency-like Representation and Forward Design","authors":"A. Tchagang, A. Tewfik, Julio J. Vald'es","doi":"10.21203/RS.3.RS-229094/V1","DOIUrl":null,"url":null,"abstract":"\n Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.","PeriodicalId":8439,"journal":{"name":"arXiv: Chemical Physics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Chemical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/RS.3.RS-229094/V1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用信号处理和机器学习的分子设计:类时频表示和正向设计
从密度泛函理论(DFTQM)等量子力学(QM)理论中获得的分子数据积累使机器学习(ML)加速新分子、药物和材料的发现成为可能。结合QM和ML (QM↔ML)的模型在以ML的高速传递QM的精度方面非常有效。在本研究中,我们表明,通过在QM↔ML管道中集成众所周知的信号处理(SP)技术(即短时傅里叶变换、连续小波分析和Wigner-Ville分布),我们获得了一种强大的机制(QM↔SP↔ML),可用于分子的表示、可视化和前向设计。更准确地说,在这项研究中,我们证明了分子的时频表示编码了它们的结构、几何、能量、电子和热力学性质。这是通过使用前向设计回路中的新表示作为DFTQM计算训练的深度卷积神经网络的输入来证明的,该神经网络输出分子的特性。在QM9数据集(由133,855个分子和16个性质组成)上进行的测试表明,新的QM↔SP↔ML模型能够预测分子的性质,其平均绝对误差(MAE)低于可接受的化学精度(即总能量MAE < 1 Kcal/mol,轨道能量MAE < 0.1 ev)。此外,与文献中描述的其他ML最先进的技术相比,新方法的性能相似或更好。总之,在这项研究中,我们证明了新的QM↔SP↔ML模型代表了分子正向设计的一种强大技术。本研究生成和使用的所有代码和数据均可作为支持材料。QM↔SP↔ML也见于以下网站:https://github.com/TABeau/QM-SP-ML。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Flexible model of water based on the dielectric and electromagnetic spectrum properties : TIP4P/$\epsilon$ Flex. Characterization of a Modular Flow Cell System for Electrocatalytic Experiments and Comparison to a Commercial RRDE System Predicting Gas-Particle Partitioning Coefficients of Atmospheric Molecules with Machine Learning Electron-stimulated desorption from molecular ices in the 0.15–2 keV regime (15‐crown‐5)BiI 3 as a Building Block for Halogen Bonded Supramolecular Aggregates
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1