将基于机器学习的读取交叉结构-性质关系(RASPR)作为预测建模的新工具:预测染料敏化太阳能电池(DSSC)中某些类别有机染料的功率转换效率(PCE)。

IF 2.8 4区 医学 Q3 CHEMISTRY, MEDICINAL Molecular Informatics Pub Date : 2024-04-01 Epub Date: 2024-02-19 DOI:10.1002/minf.202300210
Souvik Pore, Arkaprava Banerjee, Kunal Roy
{"title":"将基于机器学习的读取交叉结构-性质关系(RASPR)作为预测建模的新工具:预测染料敏化太阳能电池(DSSC)中某些类别有机染料的功率转换效率(PCE)。","authors":"Souvik Pore, Arkaprava Banerjee, Kunal Roy","doi":"10.1002/minf.202300210","DOIUrl":null,"url":null,"abstract":"<p><p>The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.</p>","PeriodicalId":18853,"journal":{"name":"Molecular Informatics","volume":" ","pages":"e202300210"},"PeriodicalIF":2.8000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs).\",\"authors\":\"Souvik Pore, Arkaprava Banerjee, Kunal Roy\",\"doi\":\"10.1002/minf.202300210\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.</p>\",\"PeriodicalId\":18853,\"journal\":{\"name\":\"Molecular Informatics\",\"volume\":\" \",\"pages\":\"e202300210\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1002/minf.202300210\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/2/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/minf.202300210","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/2/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

在预测材料的各种特性时,应用各种基于硅的方法已成为实验方法的有效替代方法。最近,定量结构-性能关系(QSPR)和读取交叉(RA)方法的概念被融合在一起,开发出一种新兴的化学信息工具:读取交叉结构-性能关系(RASPR)。由于 RASPR 方法采用了各种基于相似性和误差的测量方法,因此既适用于大型数据集,也适用于小型数据集。另据观察,与相应的 QSPR 模型相比,RASPR 模型往往具有更强的外部预测能力。在本研究中,我们使用定量 RASPR(q-RASPR)方法对染料敏化太阳能电池(DSSC)中使用的有机染料的功率转换效率(PCE)进行了建模。我们使用了相对较大的有机染料类别--吩噻嗪类(n=207)、卟啉类(n=281)和三苯胺类(n=229)进行建模。我们将每个数据集分为三个不同组合的训练集和测试集,并利用训练集开发了三个不同的 QSPR 模型,其中包含结构和物理化学描述符,并用相应的测试集对其进行了验证。使用基于 Java 的工具 RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home),这些相应的模型描述符被用来计算 RASPR 描述符,然后通过将之前选择的结构和理化描述符与计算出的 RASPR 描述符汇集在一起进行数据融合。我们还采用了进一步的特征选择算法来建立最终的 RASPR PLS 模型。在此,我们还利用在 QSPR PLS 模型和 RASPR PLS 模型中选择的描述符开发了不同的机器学习(ML)模型,结果发现,使用 RASPR 描述符的模型在外部预测性方面优于仅使用结构和理化描述符的模型:吩噻嗪类化合物的 RMSEP 从 1.16-1.25 降至 1.07-1.18,卟啉类化合物的 RMSEP 从 1.60-1.79 降至 1.45-1.53,三苯胺类化合物的 RMSEP 从 1.27-1.54 降至 1.20-1.47。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Application of machine learning-based read-across structure-property relationship (RASPR) as a new tool for predictive modelling: Prediction of power conversion efficiency (PCE) for selected classes of organic dyes in dye-sensitized solar cells (DSSCs).

The application of various in-silico-based approaches for the prediction of various properties of materials has been an effective alternative to experimental methods. Recently, the concepts of Quantitative structure-property relationship (QSPR) and read-across (RA) methods were merged to develop a new emerging chemoinformatic tool: read-across structure-property relationship (RASPR). The RASPR method can be applicable to both large and small datasets as it uses various similarity and error-based measures. It has also been observed that RASPR models tend to have an increased external predictivity compared to the corresponding QSPR models. In this study, we have modeled the power conversion efficiency (PCE) of organic dyes used in dye-sensitized solar cells (DSSCs) by using the quantitative RASPR (q-RASPR) method. We have used relatively larger classes of organic dyes-Phenothiazines (n=207), Porphyrins (n=281), and Triphenylamines (n=229) for the modelling purpose. We have divided each of the datasets into training and test sets in 3 different combinations, and with the training sets we have developed three different QSPR models with structural and physicochemical descriptors and validated them with the corresponding test sets. These corresponding modeled descriptors were used to calculate the RASPR descriptors using a Java-based tool RASAR Descriptor Calculator v2.0 (https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home), and then data fusion was performed by pooling the previously selected structural and physicochemical descriptors with the calculated RASPR descriptors. Further feature selection algorithm was employed to develop the final RASPR PLS models. Here, we also developed different machine learning (ML) models with the descriptors selected in the QSPR PLS and RASPR PLS models, and it was found that models with RASPR descriptors superseded in external predictivity the models with only structural and physicochemical descriptors: RMSEP reduced for phenothiazines from 1.16-1.25 to 1.07-1.18, for porphyrins from 1.60-1.79 to 1.45-1.53, for triphenylamines from 1.27-1.54 to 1.20-1.47.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Informatics
Molecular Informatics CHEMISTRY, MEDICINAL-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.30
自引率
2.80%
发文量
70
审稿时长
3 months
期刊介绍: Molecular Informatics is a peer-reviewed, international forum for publication of high-quality, interdisciplinary research on all molecular aspects of bio/cheminformatics and computer-assisted molecular design. Molecular Informatics succeeded QSAR & Combinatorial Science in 2010. Molecular Informatics presents methodological innovations that will lead to a deeper understanding of ligand-receptor interactions, macromolecular complexes, molecular networks, design concepts and processes that demonstrate how ideas and design concepts lead to molecules with a desired structure or function, preferably including experimental validation. The journal''s scope includes but is not limited to the fields of drug discovery and chemical biology, protein and nucleic acid engineering and design, the design of nanomolecular structures, strategies for modeling of macromolecular assemblies, molecular networks and systems, pharmaco- and chemogenomics, computer-assisted screening strategies, as well as novel technologies for the de novo design of biologically active molecules. As a unique feature Molecular Informatics publishes so-called "Methods Corner" review-type articles which feature important technological concepts and advances within the scope of the journal.
期刊最新文献
Extended Activity Cliffs-Driven Approaches on Data Splitting for the Study of Bioactivity Machine Learning Predictions. BIOMX-DB: A web application for the BIOFACQUIM natural product database. Chemoinformatics for corrosion science: Data-driven modeling of corrosion inhibition by organic molecules. My 50 Years with Chemoinformatics. Pathway-based prediction of the therapeutic effects and mode of action of custom-made multiherbal medicines.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1