{"title":"Low dimensional fragment-based descriptors for property predictions in inorganic materials with machine learning","authors":"Md Mohaiminul Islam","doi":"arxiv-2407.21146","DOIUrl":null,"url":null,"abstract":"In recent times, the use of machine learning in materials design and\ndiscovery has aided to accelerate the discovery of innovative materials with\nextraordinary properties, which otherwise would have been driven by a laborious\nand time-consuming trial-and-error process. In this study, a simple yet\npowerful fragment-based descriptor, Low Dimensional Fragment Descriptors\n(LDFD), is proposed to work in conjunction with machine learning models to\npredict important properties of a wide range of inorganic materials such as\nperovskite oxides, metal halide perovskites, alloys, semiconductor, and other\nmaterials system and can also be extended to work with interfaces. To predict\nproperties, the generation of descriptors requires only the structural formula\nof the materials and, in presence of identical structure in the dataset,\nadditional system properties as input. And the generation of descriptors\ninvolves few steps, encoding the formula in binary space and reduction of\ndimensionality, allowing easy implementation and prediction. To evaluate\ndescriptor performance, six known datasets with up to eight components were\ncompared. The method was applied to properties such as band gaps of perovskites\nand semiconductors, lattice constant of magnetic alloys, bulk/shear modulus of\nsuperhard alloys, critical temperature of superconductors, formation enthalpy\nand energy above hull convex of perovskite oxides. An advanced python-based\ndata mining tool matminer was utilized for the collection of data. The\nprediction accuracies are equivalent to the quality of the training data and\nshow comparable effectiveness as previous studies. This method should be\nextendable to any inorganic material systems which can be subdivided into\nlayers or crystal structures with more than one atom site, and with the\nprogress of data mining the performance should get better with larger and\nunbiased datasets.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"263 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Data Analysis, Statistics and Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.21146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent times, the use of machine learning in materials design and
discovery has aided to accelerate the discovery of innovative materials with
extraordinary properties, which otherwise would have been driven by a laborious
and time-consuming trial-and-error process. In this study, a simple yet
powerful fragment-based descriptor, Low Dimensional Fragment Descriptors
(LDFD), is proposed to work in conjunction with machine learning models to
predict important properties of a wide range of inorganic materials such as
perovskite oxides, metal halide perovskites, alloys, semiconductor, and other
materials system and can also be extended to work with interfaces. To predict
properties, the generation of descriptors requires only the structural formula
of the materials and, in presence of identical structure in the dataset,
additional system properties as input. And the generation of descriptors
involves few steps, encoding the formula in binary space and reduction of
dimensionality, allowing easy implementation and prediction. To evaluate
descriptor performance, six known datasets with up to eight components were
compared. The method was applied to properties such as band gaps of perovskites
and semiconductors, lattice constant of magnetic alloys, bulk/shear modulus of
superhard alloys, critical temperature of superconductors, formation enthalpy
and energy above hull convex of perovskite oxides. An advanced python-based
data mining tool matminer was utilized for the collection of data. The
prediction accuracies are equivalent to the quality of the training data and
show comparable effectiveness as previous studies. This method should be
extendable to any inorganic material systems which can be subdivided into
layers or crystal structures with more than one atom site, and with the
progress of data mining the performance should get better with larger and
unbiased datasets.