Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
{"title":"Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties","authors":"Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst","doi":"arxiv-2309.09355","DOIUrl":null,"url":null,"abstract":"The application of machine learning (ML) techniques in computational\nchemistry has led to significant advances in predicting molecular properties,\naccelerating drug discovery, and material design. ML models can extract hidden\npatterns and relationships from complex and large datasets, allowing for the\nprediction of various chemical properties with high accuracy. The use of such\nmethods has enabled the discovery of molecules and materials that were\npreviously difficult to identify. This paper introduces a new ML model based on\ndeep learning techniques, such as a multilayer encoder and decoder\narchitecture, for classification tasks. We demonstrate the opportunities\noffered by our approach by applying it to various types of input data,\nincluding organic and inorganic compounds. In particular, we developed and\ntested the model using the Matbench and Moleculenet benchmarks, which include\ncrystal properties and drug design-related benchmarks. We also conduct a\ncomprehensive analysis of vector representations of chemical compounds,\nshedding light on the underlying patterns in molecular data. The models used in\nthis work exhibit a high degree of predictive power, underscoring the progress\nthat can be made with refined machine learning when applied to molecular and\nmaterial datasets. For instance, on the Tox21 dataset, we achieved an average\naccuracy of 96%, surpassing the previous best result by 10%. Our code is\npublicly available at https://github.com/dmamur/elembert.","PeriodicalId":501259,"journal":{"name":"arXiv - PHYS - Atomic and Molecular Clusters","volume":"101 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atomic and Molecular Clusters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.09355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The application of machine learning (ML) techniques in computational chemistry has led to significant advances in predicting molecular properties, accelerating drug discovery, and material design. ML models can extract hidden patterns and relationships from complex and large datasets, allowing for the prediction of various chemical properties with high accuracy. The use of such methods has enabled the discovery of molecules and materials that were previously difficult to identify. This paper introduces a new ML model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct a comprehensive analysis of vector representations of chemical compounds, shedding light on the underlying patterns in molecular data. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning when applied to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average accuracy of 96%, surpassing the previous best result by 10%. Our code is publicly available at https://github.com/dmamur/elembert.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从结构到性质:化学元素嵌入和用于准确预测化学性质的深度学习方法
机器学习(ML)技术在计算化学中的应用在预测分子性质、加速药物发现和材料设计方面取得了重大进展。ML模型可以从复杂的大型数据集中提取隐藏的模式和关系,从而可以高精度地预测各种化学性质。使用这些方法可以发现以前难以识别的分子和材料。本文介绍了一种新的基于深度学习技术的机器学习模型,如多层编码器和解码器架构,用于分类任务。通过将我们的方法应用于各种类型的输入数据,包括有机和无机化合物,我们展示了我们的方法所提供的机会。特别是,我们使用Matbench和Moleculenet基准开发和测试了该模型,其中包括晶体特性和药物设计相关基准。我们还对化合物的载体表示进行了全面的分析,揭示了分子数据中的潜在模式。在这项工作中使用的模型显示出高度的预测能力,强调了精细机器学习在应用于分子和材料数据集时可以取得的进展。例如,在Tox21数据集上,我们实现了96%的平均准确率,比之前的最佳结果高出10%。我们的代码可在https://github.com/dmamur/elembert上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Types of Size-Dependent Melting in Fe Nanoclusters: a Molecular Dynamics Study How to manipulate nanoparticle morphology with vacancies Collective states of α-sexithiophene chains inside boron nitride nanotubes Accelerated structure-stability energy-free calculator Structures and infrared spectroscopy of Au$_{10}$ cluster at different temperatures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1