Low dimensional fragment-based descriptors for property predictions in inorganic materials with machine learning

Md Mohaiminul Islam
{"title":"Low dimensional fragment-based descriptors for property predictions in inorganic materials with machine learning","authors":"Md Mohaiminul Islam","doi":"arxiv-2407.21146","DOIUrl":null,"url":null,"abstract":"In recent times, the use of machine learning in materials design and\ndiscovery has aided to accelerate the discovery of innovative materials with\nextraordinary properties, which otherwise would have been driven by a laborious\nand time-consuming trial-and-error process. In this study, a simple yet\npowerful fragment-based descriptor, Low Dimensional Fragment Descriptors\n(LDFD), is proposed to work in conjunction with machine learning models to\npredict important properties of a wide range of inorganic materials such as\nperovskite oxides, metal halide perovskites, alloys, semiconductor, and other\nmaterials system and can also be extended to work with interfaces. To predict\nproperties, the generation of descriptors requires only the structural formula\nof the materials and, in presence of identical structure in the dataset,\nadditional system properties as input. And the generation of descriptors\ninvolves few steps, encoding the formula in binary space and reduction of\ndimensionality, allowing easy implementation and prediction. To evaluate\ndescriptor performance, six known datasets with up to eight components were\ncompared. The method was applied to properties such as band gaps of perovskites\nand semiconductors, lattice constant of magnetic alloys, bulk/shear modulus of\nsuperhard alloys, critical temperature of superconductors, formation enthalpy\nand energy above hull convex of perovskite oxides. An advanced python-based\ndata mining tool matminer was utilized for the collection of data. The\nprediction accuracies are equivalent to the quality of the training data and\nshow comparable effectiveness as previous studies. This method should be\nextendable to any inorganic material systems which can be subdivided into\nlayers or crystal structures with more than one atom site, and with the\nprogress of data mining the performance should get better with larger and\nunbiased datasets.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"263 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.21146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent times, the use of machine learning in materials design and discovery has aided to accelerate the discovery of innovative materials with extraordinary properties, which otherwise would have been driven by a laborious and time-consuming trial-and-error process. In this study, a simple yet powerful fragment-based descriptor, Low Dimensional Fragment Descriptors (LDFD), is proposed to work in conjunction with machine learning models to predict important properties of a wide range of inorganic materials such as perovskite oxides, metal halide perovskites, alloys, semiconductor, and other materials system and can also be extended to work with interfaces. To predict properties, the generation of descriptors requires only the structural formula of the materials and, in presence of identical structure in the dataset, additional system properties as input. And the generation of descriptors involves few steps, encoding the formula in binary space and reduction of dimensionality, allowing easy implementation and prediction. To evaluate descriptor performance, six known datasets with up to eight components were compared. The method was applied to properties such as band gaps of perovskites and semiconductors, lattice constant of magnetic alloys, bulk/shear modulus of superhard alloys, critical temperature of superconductors, formation enthalpy and energy above hull convex of perovskite oxides. An advanced python-based data mining tool matminer was utilized for the collection of data. The prediction accuracies are equivalent to the quality of the training data and show comparable effectiveness as previous studies. This method should be extendable to any inorganic material systems which can be subdivided into layers or crystal structures with more than one atom site, and with the progress of data mining the performance should get better with larger and unbiased datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于片段的低维描述符,利用机器学习预测无机材料的性能
近来,机器学习在材料设计和发现中的应用帮助加速了具有非凡特性的创新材料的发现,否则这些材料的发现只能通过费力费时的试错过程来完成。本研究提出了一种简单而强大的基于片段的描述符--低维片段描述符(LDFD),它可以与机器学习模型结合使用,预测各种无机材料的重要性质,如过氧化物氧化物、金属卤化物过氧化物、合金、半导体和其他材料系统,还可以扩展到界面。要预测特性,描述符的生成只需要输入材料的结构式,如果数据集中存在相同的结构,还需要输入额外的系统特性。描述符的生成步骤很少,只需在二进制空间中对公式进行编码并降低维度,因此易于实现和预测。为了评估描述符的性能,我们对六个已知数据集进行了比较,这些数据集最多包含八个成分。该方法适用于包晶和半导体的带隙、磁性合金的晶格常数、超硬合金的体积/剪切模量、超导体的临界温度、包晶氧化物的形成焓和凸面以上的能量等性质。收集数据时使用了先进的基于 python- 的数据挖掘工具 matminer。预测精度与训练数据的质量相当,并显示出与以往研究相当的有效性。随着数据挖掘技术的进步,该方法的性能会随着数据集的增大和无偏性而提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PASS: An Asynchronous Probabilistic Processor for Next Generation Intelligence Astrometric Binary Classification Via Artificial Neural Networks XENONnT Analysis: Signal Reconstruction, Calibration and Event Selection Converting sWeights to Probabilities with Density Ratios Challenges and perspectives in recurrence analyses of event time series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1