首页 > 最新文献

Artificial intelligence chemistry最新文献

英文 中文
A scientist's guide to AI-driven molecular discovery 科学家的人工智能驱动分子发现指南
Pub Date : 2026-01-23 DOI: 10.1016/j.aichem.2026.100107
Jakes Udabe
Artificial intelligence (AI) is increasingly steering the discovery of functional molecules and materials, but its progress with generative modeling is held back by the messy, mixed-up nature of the experimental data and a scarcity of high-quality ground truth. This review synthesizes recent advances in data curation, representation, and generative modeling for molecular and materials discovery, and proposes a practical four-stage workflow that integrates structured data capture, intelligent featurization, generative design, and closed-loop experimental validation. Core algorithmic families (supervised, semi-supervised, unsupervised, reinforcement learning) and specialized generative architectures (VAEs, GANs, diffusion models, graph-based models) are surveyed, and discuss how each maps to real-world discovery tasks. The enabling infrastructure (e.g.as electronic lab notebooks (ELNs), knowledge graphs, autonomous laboratories) is likewise analyzed and highlight best practices for reproducibility, uncertainty quantification, and ethical safeguards. Finally, a prioritized checklist was provided for researchers and laboratories to adopt AI-compatible infrastructure and describe open challenges (data standards, causal inference, accessibility) to guide future work.
人工智能(AI)正越来越多地引导着功能分子和材料的发现,但由于实验数据的混乱和混乱,以及缺乏高质量的基础事实,它在生成模型方面的进展受到了阻碍。本综述综合了分子和材料发现的数据管理、表示和生成建模方面的最新进展,并提出了一个实用的四阶段工作流,该工作流集成了结构化数据捕获、智能特征、生成设计和闭环实验验证。核心算法族(监督、半监督、无监督、强化学习)和专门的生成架构(VAEs、gan、扩散模型、基于图的模型)进行了调查,并讨论了每个算法族如何映射到现实世界的发现任务。支持基础设施(如电子实验室笔记本(eln)、知识图谱、自主实验室)同样也进行了分析,并强调了可重复性、不确定性量化和道德保障的最佳实践。最后,为研究人员和实验室提供了一份优先清单,以采用与人工智能兼容的基础设施,并描述开放式挑战(数据标准、因果推理、可访问性),以指导未来的工作。
{"title":"A scientist's guide to AI-driven molecular discovery","authors":"Jakes Udabe","doi":"10.1016/j.aichem.2026.100107","DOIUrl":"10.1016/j.aichem.2026.100107","url":null,"abstract":"<div><div>Artificial intelligence (AI) is increasingly steering the discovery of functional molecules and materials, but its progress with generative modeling is held back by the messy, mixed-up nature of the experimental data and a scarcity of high-quality ground truth. This review synthesizes recent advances in data curation, representation, and generative modeling for molecular and materials discovery, and proposes a practical four-stage workflow that integrates structured data capture, intelligent featurization, generative design, and closed-loop experimental validation. Core algorithmic families (supervised, semi-supervised, unsupervised, reinforcement learning) and specialized generative architectures (VAEs, GANs, diffusion models, graph-based models) are surveyed, and discuss how each maps to real-world discovery tasks. The enabling infrastructure (e.g.as electronic lab notebooks (ELNs), knowledge graphs, autonomous laboratories) is likewise analyzed and highlight best practices for reproducibility, uncertainty quantification, and ethical safeguards. Finally, a prioritized checklist was provided for researchers and laboratories to adopt AI-compatible infrastructure and describe open challenges (data standards, causal inference, accessibility) to guide future work.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100107"},"PeriodicalIF":0.0,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From potential to practice: The prospective and pitfalls of generative AI and deep learning in molecular simulations 从潜力到实践:分子模拟中生成人工智能和深度学习的前景和陷阱
Pub Date : 2026-01-23 DOI: 10.1016/j.aichem.2026.100108
Rahul D. Jawarkar , Prashant K. Deshmukh , Bhavesh Mandwale , Long Chaiou Ming
Generative AI and deep learning improve molecular simulations and drug development. Traditional computational methods like MD, MC, and QM/MM have been crucial in investigating biomolecular interactions and thermodynamics. However, processing power and speed restrict their scalability. This article provides a comprehensive review and comparative analysis of how advanced neural network architectures and generative AI models address these computational limitations. This review analyses how advanced neural network architectures and generative AI models satisfy these restrictions. Neural network potentials trained on high-quality quantum datasets achieve ab initio precision at low processing cost. We tested convolutional (CNNs), recurrent (RNNs), graph neural networks (GNNs), and transformers to evaluate how well they could describe molecular changes over time and predict structural changes. Researchers have investigated generative frameworks including variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models to develop medications with superior binding affinity and pharmacokinetic characteristics. The findings reveal that AI-driven modelling and physics-based simulations create a closed-loop system where MD or QM/MM simulations enhance AI-generated molecules repeatedly. This feedback loop speeds up hit-to-lead optimisation, increases ADMET prediction, and enhances protein folding and shape information. This paradigm shift from descriptive to predictive and generative frameworks using AI and molecular modelling improves computational drug discovery's scalability, interpretability, and creativity. AI is used as a computational tool and a collaborator to speed up molecular discovery. Overall, this manuscript serves as a critical review summarizing state-of-the-art progress, challenges, and future prospects at the interface of AI and molecular simulation research.
生成式人工智能和深度学习改进了分子模拟和药物开发。传统的计算方法,如MD、MC和QM/MM,在研究生物分子相互作用和热力学方面至关重要。然而,处理能力和速度限制了它们的可扩展性。本文对先进的神经网络架构和生成式人工智能模型如何解决这些计算限制进行了全面的回顾和比较分析。本文分析了先进的神经网络架构和生成式人工智能模型如何满足这些限制。在高质量量子数据集上训练的神经网络电位以低处理成本实现从头算精度。我们测试了卷积(cnn)、循环(rnn)、图神经网络(gnn)和变压器,以评估它们如何很好地描述分子随时间的变化并预测结构变化。研究人员已经研究了生成框架,包括变分自编码器(VAEs)、生成对抗网络(GANs)和扩散模型,以开发具有优异结合亲和力和药代动力学特征的药物。研究结果表明,人工智能驱动的建模和基于物理的模拟创建了一个闭环系统,其中MD或QM/MM模拟反复增强人工智能生成的分子。这种反馈循环加速了命中导向优化,增加了ADMET预测,并增强了蛋白质折叠和形状信息。这种使用人工智能和分子建模从描述性到预测性和生成性框架的范式转变提高了计算药物发现的可扩展性、可解释性和创造性。人工智能被用作计算工具和合作者,以加速分子发现。总的来说,这份手稿作为一个关键的审查,总结了人工智能和分子模拟研究界面的最新进展、挑战和未来前景。
{"title":"From potential to practice: The prospective and pitfalls of generative AI and deep learning in molecular simulations","authors":"Rahul D. Jawarkar ,&nbsp;Prashant K. Deshmukh ,&nbsp;Bhavesh Mandwale ,&nbsp;Long Chaiou Ming","doi":"10.1016/j.aichem.2026.100108","DOIUrl":"10.1016/j.aichem.2026.100108","url":null,"abstract":"<div><div>Generative AI and deep learning improve molecular simulations and drug development. Traditional computational methods like MD, MC, and QM/MM have been crucial in investigating biomolecular interactions and thermodynamics. However, processing power and speed restrict their scalability. This article provides a comprehensive review and comparative analysis of how advanced neural network architectures and generative AI models address these computational limitations. This review analyses how advanced neural network architectures and generative AI models satisfy these restrictions. Neural network potentials trained on high-quality quantum datasets achieve ab initio precision at low processing cost. We tested convolutional (CNNs), recurrent (RNNs), graph neural networks (GNNs), and transformers to evaluate how well they could describe molecular changes over time and predict structural changes. Researchers have investigated generative frameworks including variational autoencoders (VAEs), generative adversarial networks (GANs), and diffusion models to develop medications with superior binding affinity and pharmacokinetic characteristics. The findings reveal that AI-driven modelling and physics-based simulations create a closed-loop system where MD or QM/MM simulations enhance AI-generated molecules repeatedly. This feedback loop speeds up hit-to-lead optimisation, increases ADMET prediction, and enhances protein folding and shape information. This paradigm shift from descriptive to predictive and generative frameworks using AI and molecular modelling improves computational drug discovery's scalability, interpretability, and creativity. AI is used as a computational tool and a collaborator to speed up molecular discovery. Overall, this manuscript serves as a critical review summarizing state-of-the-art progress, challenges, and future prospects at the interface of AI and molecular simulation research.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100108"},"PeriodicalIF":0.0,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated green material and solvent discovery with chemistry- and physics-guided generative AI 利用化学和物理引导的生成式人工智能加速绿色材料和溶剂的发现
Pub Date : 2026-01-02 DOI: 10.1016/j.aichem.2025.100106
Eslam G. Al-Sakkari , Ahmed Ragab , Marzouk Benali , Olumoye Ajao , Daria C. Boffito , Hanane Dagdougui
Carbon capture, utilization and storage (CCUS), along with lignocellulosic biomass valorization (e.g., lignin, cellulose), are promising decarbonization strategies for hard-to-abate industries. Green solvents, such as deep eutectic solvents and ionic liquids, enable efficient CO₂ capture and selective lignin extraction, enhancing lignin depolymerization into high-value products. However, current molecular design tools are slow and computationally expensive, limiting green material innovation. This study introduces a novel data-driven framework for green material discovery using generative AI, including transformers, generative adversarial networks, and variational autoencoders. The generation process was guided by rule-based and physics- and chemistry-informed models for automatic labeling, with feedback loops to reduce invalid SMILES strings. The approach achieved 70 % molecular validity and 94 % novelty in generating new solvents for CO₂ capture and lignin applications. Model training averaged under one hour, and molecule generation took only seconds, significantly faster than traditional methods. Ensemble machine learning models assessed the environmental sustainability of candidates, and retrosynthesis analysis identified feasible, green synthesis pathways. This flexible, scalable methodology extends beyond solvent discovery to broader applications in process design and optimization, enabling the rapid generation of novel and cost-effective process configurations.
碳捕获、利用和储存(CCUS),以及木质纤维素生物质增值(如木质素、纤维素),是难以减排的行业有希望的脱碳策略。绿色溶剂,如深度共晶溶剂和离子液体,能够有效地捕获CO 2和选择性提取木质素,促进木质素解聚成高价值产品。然而,目前的分子设计工具速度慢,计算成本高,限制了绿色材料的创新。本研究介绍了一种新的数据驱动框架,用于使用生成式人工智能发现绿色材料,包括变压器、生成式对抗网络和变分自编码器。生成过程由基于规则和物理和化学的自动标记模型指导,并带有反馈回路以减少无效的SMILES字符串。该方法在产生用于CO₂捕获和木质素应用的新溶剂方面达到了70 %的分子有效性和94 %的新颖性。模型训练平均不到一小时,分子生成只需几秒钟,明显快于传统方法。集成机器学习模型评估了候选材料的环境可持续性,反合成分析确定了可行的绿色合成途径。这种灵活的、可扩展的方法超越了溶剂发现,扩展到工艺设计和优化中的更广泛应用,从而能够快速生成新颖且具有成本效益的工艺配置。
{"title":"Accelerated green material and solvent discovery with chemistry- and physics-guided generative AI","authors":"Eslam G. Al-Sakkari ,&nbsp;Ahmed Ragab ,&nbsp;Marzouk Benali ,&nbsp;Olumoye Ajao ,&nbsp;Daria C. Boffito ,&nbsp;Hanane Dagdougui","doi":"10.1016/j.aichem.2025.100106","DOIUrl":"10.1016/j.aichem.2025.100106","url":null,"abstract":"<div><div>Carbon capture, utilization and storage (CCUS), along with lignocellulosic biomass valorization (e.g., lignin, cellulose), are promising decarbonization strategies for hard-to-abate industries. Green solvents, such as deep eutectic solvents and ionic liquids, enable efficient CO₂ capture and selective lignin extraction, enhancing lignin depolymerization into high-value products. However, current molecular design tools are slow and computationally expensive, limiting green material innovation. This study introduces a novel data-driven framework for green material discovery using generative AI, including transformers, generative adversarial networks, and variational autoencoders. The generation process was guided by rule-based and physics- and chemistry-informed models for automatic labeling, with feedback loops to reduce invalid SMILES strings. The approach achieved 70 % molecular validity and 94 % novelty in generating new solvents for CO₂ capture and lignin applications. Model training averaged under one hour, and molecule generation took only seconds, significantly faster than traditional methods. Ensemble machine learning models assessed the environmental sustainability of candidates, and retrosynthesis analysis identified feasible, green synthesis pathways. This flexible, scalable methodology extends beyond solvent discovery to broader applications in process design and optimization, enabling the rapid generation of novel and cost-effective process configurations.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100106"},"PeriodicalIF":0.0,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating machine learning with electrochemical sensors for intelligent food safety monitoring 将机器学习与电化学传感器相结合,实现智能食品安全监测
Pub Date : 2025-12-23 DOI: 10.1016/j.aichem.2025.100105
Aaryashree , Arti Devi
The integration of machine learning (ML) with electrochemical sensors is transforming food safety and quality assessment by enabling quick, affordable, and highly sensitive detection of contaminants, adulterants, and spoilage indicators. Traditional electrochemical analysis faces challenges such as overlapping signals, nonlinear sensor responses, and matrix effects, which diminish accuracy and scalability. ML algorithms offer advanced data processing, feature extraction, and predictive modeling, significantly enhancing detection sensitivity, classification accuracy, and supporting real-time decision-making. This review explores the combined use of ML and electrochemical sensing in food analysis, focusing on key areas like pesticide and heavy metal detection, food authentication, shelf-life prediction, and microbial safety monitoring. It provides a comprehensive range of ML techniques, from basic algorithms like Support Vector Machines and Random Forests to advanced deep learning architectures, including Convolutional Neural Networks, Transformers, and Graph Neural Networks. Additionally, it highlights innovative applications and addresses critical challenges in real-world deployment, such as data scarcity, model generalizability, and the “black box” problem of interpretability. Strategies such as data augmentation, transfer learning, and explainable AI (XAI) are emerging as crucial solutions to enhance data availability and model transparency. The field is also advancing toward adaptive learning frameworks and integration with the Internet of Things (IoT), enabling continuous, networked monitoring throughout the food supply chain. By emphasizing both technical innovations and practical challenges, this review offers a solid foundation for researchers and professionals working at the intersection of electrochemical sensing, machine learning, and food safety analytics.
机器学习(ML)与电化学传感器的集成正在改变食品安全和质量评估,实现对污染物、掺假物和腐败指标的快速、经济、高灵敏度检测。传统的电化学分析面临着信号重叠、非线性传感器响应和矩阵效应等问题,降低了分析的准确性和可扩展性。机器学习算法提供了先进的数据处理、特征提取和预测建模,显著提高了检测灵敏度、分类准确性,并支持实时决策。本文综述了机器学习和电化学传感在食品分析中的结合应用,重点介绍了农药和重金属检测、食品认证、保质期预测和微生物安全监测等关键领域。它提供了全面的机器学习技术,从支持向量机和随机森林等基本算法到高级深度学习架构,包括卷积神经网络、变形金刚和图神经网络。此外,它还强调了创新的应用程序,并解决了实际部署中的关键挑战,例如数据稀缺性、模型通用性和可解释性的“黑箱”问题。数据增强、迁移学习和可解释人工智能(XAI)等策略正在成为增强数据可用性和模型透明度的关键解决方案。该领域也在朝着自适应学习框架和与物联网(IoT)的集成发展,从而实现整个食品供应链的连续、网络化监控。通过强调技术创新和实际挑战,本综述为电化学传感、机器学习和食品安全分析交叉领域的研究人员和专业人员提供了坚实的基础。
{"title":"Integrating machine learning with electrochemical sensors for intelligent food safety monitoring","authors":"Aaryashree ,&nbsp;Arti Devi","doi":"10.1016/j.aichem.2025.100105","DOIUrl":"10.1016/j.aichem.2025.100105","url":null,"abstract":"<div><div>The integration of machine learning (ML) with electrochemical sensors is transforming food safety and quality assessment by enabling quick, affordable, and highly sensitive detection of contaminants, adulterants, and spoilage indicators. Traditional electrochemical analysis faces challenges such as overlapping signals, nonlinear sensor responses, and matrix effects, which diminish accuracy and scalability. ML algorithms offer advanced data processing, feature extraction, and predictive modeling, significantly enhancing detection sensitivity, classification accuracy, and supporting real-time decision-making. This review explores the combined use of ML and electrochemical sensing in food analysis, focusing on key areas like pesticide and heavy metal detection, food authentication, shelf-life prediction, and microbial safety monitoring. It provides a comprehensive range of ML techniques, from basic algorithms like Support Vector Machines and Random Forests to advanced deep learning architectures, including Convolutional Neural Networks, Transformers, and Graph Neural Networks. Additionally, it highlights innovative applications and addresses critical challenges in real-world deployment, such as data scarcity, model generalizability, and the “black box” problem of interpretability. Strategies such as data augmentation, transfer learning, and explainable AI (XAI) are emerging as crucial solutions to enhance data availability and model transparency. The field is also advancing toward adaptive learning frameworks and integration with the Internet of Things (IoT), enabling continuous, networked monitoring throughout the food supply chain. By emphasizing both technical innovations and practical challenges, this review offers a solid foundation for researchers and professionals working at the intersection of electrochemical sensing, machine learning, and food safety analytics.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100105"},"PeriodicalIF":0.0,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational design and immunoinformatic evaluation of a multi-epitope vaccine candidate against Border Disease Virus 边境病病毒多表位候选疫苗的计算设计和免疫信息学评价
Pub Date : 2025-12-19 DOI: 10.1016/j.aichem.2025.100104
Salman Khan , Nisar Ahmad , Sami Ullah , Liaqat Ali , Sajjad Ahmad , Hina Fazal
Border Disease Virus (BDV), a Flaviviridae pestivirus, cause major reproductive and financial losses to small ruminants, and no licensed vaccine is currently available. In this study, a multi-epitope vaccine (MEV) against BDV was designed using immunoinformatics approach. The construct exhibited favorable physiochemical properties, including an aliphatic index of 68.02, solubility probability of 0.96, and overall stability. It contained multiple high-scoring linear and conformational B-cell epitopes and showed strong predicted binding to MHC class I/II molecules. Molecular docking with TLR-4 revealed stable interactions (binding score: − 312.73). Immune simulations indicated robust primary IgM and secondary IgG responses with memory B- and T-cell formation. Codon optimization confirmed high expression potential in E. coli, (CAI: 1.0; GC content: 61 %), and in-silico cloning indicate vector compatibility. These results suggest that the proposed MEV has potential to induce both humoral and cellular immunity. Further experimental validation is recommended to confirm safety, immunogenicity, and protective efficacy.
边境疾病病毒(BDV)是鼠疫黄病毒科的一种,对小型反刍动物造成重大的生殖和经济损失,目前尚无获得许可的疫苗。本研究采用免疫信息学方法设计了一种BDV多表位疫苗(MEV)。该结构物具有良好的理化性质,脂肪族指数为68.02,溶解度概率为0.96,总体稳定性好。它含有多个高分的线性和构象b细胞表位,并显示出与MHC I/II类分子的强预测结合。与TLR-4分子对接显示出稳定的相互作用(结合评分:−312.73)。免疫模拟显示,原发性IgM和继发性IgG反应与记忆性B细胞和t细胞形成有关。密码子优化证实了其在大肠杆菌中的高表达潜力(CAI: 1.0; GC含量:61 %),而在芯片上克隆证实了其载体兼容性。这些结果表明,MEV具有诱导体液免疫和细胞免疫的潜力。建议进一步实验验证,以确认安全性、免疫原性和保护效果。
{"title":"Computational design and immunoinformatic evaluation of a multi-epitope vaccine candidate against Border Disease Virus","authors":"Salman Khan ,&nbsp;Nisar Ahmad ,&nbsp;Sami Ullah ,&nbsp;Liaqat Ali ,&nbsp;Sajjad Ahmad ,&nbsp;Hina Fazal","doi":"10.1016/j.aichem.2025.100104","DOIUrl":"10.1016/j.aichem.2025.100104","url":null,"abstract":"<div><div>Border Disease Virus (BDV), a Flaviviridae pestivirus, cause major reproductive and financial losses to small ruminants, and no licensed vaccine is currently available. In this study, a multi-epitope vaccine (MEV) against BDV was designed using immunoinformatics approach. The construct exhibited favorable physiochemical properties, including an aliphatic index of 68.02, solubility probability of 0.96, and overall stability. It contained multiple high-scoring linear and conformational B-cell epitopes and showed strong predicted binding to MHC class I/II molecules. Molecular docking with TLR-4 revealed stable interactions (binding score: − 312.73). Immune simulations indicated robust primary IgM and secondary IgG responses with memory B- and T-cell formation. Codon optimization confirmed high expression potential in <em>E. coli</em>, (CAI: 1.0; GC content: 61 %), and in-silico cloning indicate vector compatibility. These results suggest that the proposed MEV has potential to induce both humoral and cellular immunity. Further experimental validation is recommended to confirm safety, immunogenicity, and protective efficacy.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100104"},"PeriodicalIF":0.0,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistical comparison and uncertainty analysis of graph neural networks and machine learning models for molecular property prediction in drug discovery 图神经网络与机器学习模型在药物发现分子性质预测中的统计比较与不确定性分析
Pub Date : 2025-12-17 DOI: 10.1016/j.aichem.2025.100103
Sarmad Waleed , Shams ul Islam , Muhammad Saleem , Ali Ahmed

Background:

Accurate prediction of molecular properties is essential for accelerating drug discovery. While graph neural networks (GNNs) have emerged as a powerful tool for this task, they have not been systematically benchmarked against traditional machine learning methods, particularly regarding the crucial aspects of predictive accuracy, interpretability, and uncertainty.

Objective:

To systematically evaluate state-of-the-art GNN architectures against classical machine learning methods for predicting key physicochemical properties. This study provides a multi-faceted comparison of model performance, statistical robustness, prediction uncertainty, and chemical interpretability.

Methods:

We implemented and compared seven models: Graph Attention Networks (GAT), Graph Convolutional Networks (GCN), Graph Isomorphism Networks (GIN), SimpleGNN, Support Vector Regression (SVR), Random Forest, and ElasticNet. These were evaluated across three MoleculeNet datasets: ESOL (aqueous solubility), FreeSolv (hydration free energy), and Lipophilicity (partition coefficient). The evaluation framework included rigorous statistical testing, bootstrap-based uncertainty quantification, and analysis of GAT attention mechanisms for chemical insight.

Results:

GAT consistently achieved superior performance, with test RMSE values of 0.1863 (ESOL), 0.1953 (FreeSolv), and 0.4922 (Lipophilicity), outperforming traditional methods by a significant margin. GNNs demonstrated substantial advantages over classical approaches, which showed considerably higher prediction errors. Furthermore, GAT provided the most reliable predictions with the lowest uncertainty and generated chemically relevant insights through its attention mechanism, successfully identifying key functional groups driving molecular properties.

Conclusions:

This systematic evaluation provides compelling evidence for the superiority of GNNs, particularly GAT, over traditional machine learning for molecular property prediction. GAT’s high accuracy, combined with its robust uncertainty quantification and chemical interpretability, establishes it as a preferred computational approach for pharmaceutical research and development.
背景:准确预测分子性质对加速药物发现至关重要。虽然图神经网络(gnn)已经成为这项任务的强大工具,但它们还没有与传统的机器学习方法进行系统的基准测试,特别是在预测准确性、可解释性和不确定性的关键方面。目的:系统地评估最先进的GNN架构与经典机器学习方法在预测关键物理化学性质方面的对比。本研究提供了模型性能、统计稳健性、预测不确定性和化学可解释性的多方面比较。方法:我们实现并比较了7个模型:图注意网络(GAT)、图卷积网络(GCN)、图同构网络(GIN)、SimpleGNN、支持向量回归(SVR)、随机森林和ElasticNet。通过三个MoleculeNet数据集进行评估:ESOL(水溶性)、FreeSolv(水合自由能)和亲脂性(分配系数)。评估框架包括严格的统计检验、基于bootstrap的不确定性量化和GAT对化学洞察力的注意机制分析。结果:GAT持续取得优异的性能,测试RMSE值分别为0.1863 (ESOL)、0.1953 (FreeSolv)和0.4922(亲脂性),显著优于传统方法。与经典方法相比,GNNs表现出了实质性的优势,后者显示出相当高的预测误差。此外,GAT以最低的不确定性提供了最可靠的预测,并通过其注意机制产生了化学相关的见解,成功地识别了驱动分子性质的关键官能团。结论:该系统评估提供了令人信服的证据,证明GNNs,特别是GAT,在分子性质预测方面优于传统机器学习。GAT的高精度,结合其强大的不确定度量化和化学可解释性,使其成为药物研究和开发的首选计算方法。
{"title":"Statistical comparison and uncertainty analysis of graph neural networks and machine learning models for molecular property prediction in drug discovery","authors":"Sarmad Waleed ,&nbsp;Shams ul Islam ,&nbsp;Muhammad Saleem ,&nbsp;Ali Ahmed","doi":"10.1016/j.aichem.2025.100103","DOIUrl":"10.1016/j.aichem.2025.100103","url":null,"abstract":"<div><h3>Background:</h3><div>Accurate prediction of molecular properties is essential for accelerating drug discovery. While graph neural networks (GNNs) have emerged as a powerful tool for this task, they have not been systematically benchmarked against traditional machine learning methods, particularly regarding the crucial aspects of predictive accuracy, interpretability, and uncertainty.</div></div><div><h3>Objective:</h3><div>To systematically evaluate state-of-the-art GNN architectures against classical machine learning methods for predicting key physicochemical properties. This study provides a multi-faceted comparison of model performance, statistical robustness, prediction uncertainty, and chemical interpretability.</div></div><div><h3>Methods:</h3><div>We implemented and compared seven models: Graph Attention Networks (GAT), Graph Convolutional Networks (GCN), Graph Isomorphism Networks (GIN), SimpleGNN, Support Vector Regression (SVR), Random Forest, and ElasticNet. These were evaluated across three MoleculeNet datasets: ESOL (aqueous solubility), FreeSolv (hydration free energy), and Lipophilicity (partition coefficient). The evaluation framework included rigorous statistical testing, bootstrap-based uncertainty quantification, and analysis of GAT attention mechanisms for chemical insight.</div></div><div><h3>Results:</h3><div>GAT consistently achieved superior performance, with test RMSE values of 0.1863 (ESOL), 0.1953 (FreeSolv), and 0.4922 (Lipophilicity), outperforming traditional methods by a significant margin. GNNs demonstrated substantial advantages over classical approaches, which showed considerably higher prediction errors. Furthermore, GAT provided the most reliable predictions with the lowest uncertainty and generated chemically relevant insights through its attention mechanism, successfully identifying key functional groups driving molecular properties.</div></div><div><h3>Conclusions:</h3><div>This systematic evaluation provides compelling evidence for the superiority of GNNs, particularly GAT, over traditional machine learning for molecular property prediction. GAT’s high accuracy, combined with its robust uncertainty quantification and chemical interpretability, establishes it as a preferred computational approach for pharmaceutical research and development.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100103"},"PeriodicalIF":0.0,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145798022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep attention for interpretable detection of organic pollutants in water using colloidal SERS 胶体SERS对水中有机污染物可解释性检测的深入关注
Pub Date : 2025-12-06 DOI: 10.1016/j.aichem.2025.100102
Anvar Kunanbayev , Hirotsugu Hiramatsu , Wei-Liang Chen , Yu-Chun Huang , Yu-Ming Chang , Stefano Rini
This paper presents an innovative approach to the interpretable detection of organic pollutants dissolved in water – carbendazim, thiacloprid, and acetamiprid – by leveraging self-attention mechanisms to within the deep neural network (DNN) used for detection. The core contribution of our work is demonstrating how attention mechanisms can significantly enhance the interpretability and performance of pollutant detection in water using colloidal Surface-Enhanced Raman Spectroscopy (SERS) measurements. The cornerstone of our methodology is the optimization of the measurement process, aimed not merely at acquiring high-quality signals but at securing a high volume of data that embodies the full spectrum of measurement variability.
This optimization includes the development of a measurement protocol that involves (i) the fabrication of colloidal silver nanoparticles utilizing the method proposed by Leopold and Lendl, (ii) the aging of the colloidal mixture with the analytes for a predetermined period, and (iii) the SERS measurement settings. Each step is carefully calibrated to maximize the SERS response sensitivity and reproducibility for the detection of the targeted analytes. Building upon this optimized measurement framework, the paper introduces a deep learning algorithm with an embedded attention mechanism designed to focus on the most relevant spectral features for pollutant detection. Unlike traditional machine learning methods, which often lack interpretability, the proposed attention model provides clear insights into which features are deemed most important for the detection task, thereby offering a direct interpretation of the decision-making process of the neural network.
本文提出了一种创新的方法来解释检测溶解在水中的有机污染物-多菌灵,噻虫啉和啶虫啉-利用自关注机制在深度神经网络(DNN)内进行检测。我们工作的核心贡献是证明了注意机制如何显著提高使用胶体表面增强拉曼光谱(SERS)测量水中污染物检测的可解释性和性能。我们方法的基石是测量过程的优化,不仅旨在获取高质量的信号,而且还旨在确保体现测量变异性全谱的大量数据。该优化包括开发一种测量方案,该方案涉及(i)利用Leopold和Lendl提出的方法制备胶体银纳米粒子,(ii)胶体混合物与分析物在预定时间内老化,以及(iii) SERS测量设置。每一步都经过仔细校准,以最大限度地提高SERS响应灵敏度和重现性,以检测目标分析物。在此优化的测量框架的基础上,本文引入了一种深度学习算法,该算法具有嵌入式注意力机制,旨在专注于污染物检测中最相关的光谱特征。与通常缺乏可解释性的传统机器学习方法不同,所提出的注意力模型提供了对哪些特征被认为对检测任务最重要的清晰见解,从而提供了对神经网络决策过程的直接解释。
{"title":"Deep attention for interpretable detection of organic pollutants in water using colloidal SERS","authors":"Anvar Kunanbayev ,&nbsp;Hirotsugu Hiramatsu ,&nbsp;Wei-Liang Chen ,&nbsp;Yu-Chun Huang ,&nbsp;Yu-Ming Chang ,&nbsp;Stefano Rini","doi":"10.1016/j.aichem.2025.100102","DOIUrl":"10.1016/j.aichem.2025.100102","url":null,"abstract":"<div><div>This paper presents an innovative approach to the interpretable detection of organic pollutants dissolved in water – carbendazim, thiacloprid, and acetamiprid – by leveraging self-attention mechanisms to within the deep neural network (DNN) used for detection. The core contribution of our work is demonstrating how attention mechanisms can significantly enhance the interpretability and performance of pollutant detection in water using colloidal Surface-Enhanced Raman Spectroscopy (SERS) measurements. The cornerstone of our methodology is the optimization of the measurement process, aimed not merely at acquiring high-quality signals but at securing a high volume of data that embodies the full spectrum of measurement variability.</div><div>This optimization includes the development of a measurement protocol that involves (i) the fabrication of colloidal silver nanoparticles utilizing the method proposed by Leopold and Lendl, (ii) the aging of the colloidal mixture with the analytes for a predetermined period, and (iii) the SERS measurement settings. Each step is carefully calibrated to maximize the SERS response sensitivity and reproducibility for the detection of the targeted analytes. Building upon this optimized measurement framework, the paper introduces a deep learning algorithm with an embedded attention mechanism designed to focus on the most relevant spectral features for pollutant detection. Unlike traditional machine learning methods, which often lack interpretability, the proposed attention model provides clear insights into which features are deemed most important for the detection task, thereby offering a direct interpretation of the decision-making process of the neural network.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100102"},"PeriodicalIF":0.0,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145798021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative study of machine learning methods for accurate prediction of logP and pKb logP和pKb准确预测的机器学习方法比较研究
Pub Date : 2025-12-01 DOI: 10.1016/j.aichem.2025.100101
Juda Baikété , Alhadji Malloum , Jeanet Conradie
Machine learning (ML) has become a powerful tool for predicting molecular physicochemical properties. It finds applications in various research and development sectors, such as materials science, pharmaceutical chemistry, and environmental science. However, systematic comparisons between different types of properties remain limited. In this study, we developed two structured datasets: a logP dataset containing 1117 molecules and a pKb dataset containing 1268 molecules. For logP, each molecule is represented by 623 molecular descriptors generated exclusively by RDKit/Mordred, while a combination of 150 quantum chemistry descriptors from DFT calculations and molecular fingerprints derived from RDKit is used for pKb. Several ML algorithms were evaluated using an identical workflow, and the relevance of the descriptors was analyzed using SHAP, followed by feature pruning based on correlation. For the logP dataset, the LightGradBoost model achieved an R2 of 0.94, an RMSE of 0.31, and an MAE of 0.42 on the independent test set, accurately reproducing experimental logP values in the range of −11.6 to 1.58. For pKb prediction, Random Forest (RF) proved most accurate, with an MAE of 1.69 and an RMSE of 1.68, with predicted values covering the entire range of experimental pKb values (−37 to 29.2). Our results indicate that, while RDKit/Mordred descriptors can predict logP with high accuracy, pKb remains a more challenging property to model, even when incorporating high-level DFT descriptors. The study therefore proposes a unified framework for the comparative evaluation of cross-property machine learning models and highlights the influence of the type of descriptor and the choice of algorithm on performance for chemically distinct properties.
机器学习(ML)已经成为预测分子物理化学性质的有力工具。它可以应用于各种研究和开发部门,如材料科学、药物化学和环境科学。然而,不同类型的属性之间的系统比较仍然有限。在这项研究中,我们开发了两个结构化数据集:包含1117个分子的logP数据集和包含1268个分子的pKb数据集。对于logP,每个分子由RDKit/Mordred单独生成的623个分子描述符表示,而pKb则使用来自DFT计算和RDKit衍生的分子指纹的150个量子化学描述符的组合。使用相同的工作流评估几种ML算法,使用SHAP分析描述符的相关性,然后基于相关性进行特征修剪。对于logP数据集,LightGradBoost模型在独立测试集上的R2为0.94,RMSE为0.31,MAE为0.42,准确地再现了−11.6至1.58范围内的实验logP值。对于pKb的预测,随机森林(Random Forest, RF)被证明是最准确的,MAE为1.69,RMSE为1.68,预测值覆盖了整个实验pKb值的范围(- 37至29.2)。我们的研究结果表明,虽然RDKit/Mordred描述符可以高精度地预测logP,但pKb仍然是一个更具挑战性的属性,即使在合并高级DFT描述符时也是如此。因此,该研究提出了一个统一的框架,用于跨属性机器学习模型的比较评估,并强调了描述符类型和算法选择对化学不同属性性能的影响。
{"title":"Comparative study of machine learning methods for accurate prediction of logP and pKb","authors":"Juda Baikété ,&nbsp;Alhadji Malloum ,&nbsp;Jeanet Conradie","doi":"10.1016/j.aichem.2025.100101","DOIUrl":"10.1016/j.aichem.2025.100101","url":null,"abstract":"<div><div>Machine learning (ML) has become a powerful tool for predicting molecular physicochemical properties. It finds applications in various research and development sectors, such as materials science, pharmaceutical chemistry, and environmental science. However, systematic comparisons between different types of properties remain limited. In this study, we developed two structured datasets: a logP dataset containing 1117 molecules and a pKb dataset containing 1268 molecules. For logP, each molecule is represented by 623 molecular descriptors generated exclusively by RDKit/Mordred, while a combination of 150 quantum chemistry descriptors from DFT calculations and molecular fingerprints derived from RDKit is used for pKb. Several ML algorithms were evaluated using an identical workflow, and the relevance of the descriptors was analyzed using SHAP, followed by feature pruning based on correlation. For the logP dataset, the LightGradBoost model achieved an <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.94, an RMSE of 0.31, and an MAE of 0.42 on the independent test set, accurately reproducing experimental logP values in the range of −11.6 to 1.58. For pKb prediction, Random Forest (RF) proved most accurate, with an MAE of 1.69 and an RMSE of 1.68, with predicted values covering the entire range of experimental pKb values (−37 to 29.2). Our results indicate that, while RDKit/Mordred descriptors can predict logP with high accuracy, pKb remains a more challenging property to model, even when incorporating high-level DFT descriptors. The study therefore proposes a unified framework for the comparative evaluation of cross-property machine learning models and highlights the influence of the type of descriptor and the choice of algorithm on performance for chemically distinct properties.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"4 1","pages":"Article 100101"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145651810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electrostatic map for the hydration-induced frequency shifts of the NH stretching mode of peptide constructed using some machine learning models 利用一些机器学习模型构建了水合作用下肽的NH拉伸模式频移的静电图
Pub Date : 2025-11-14 DOI: 10.1016/j.aichem.2025.100098
Hajime Torii , Mikito Tsujimoto , Yukichi Kitamura
Elucidating the ways how the frequencies of vibrational modes are modulated by interactions with other moieties in a system is important for better utilization of the modes as probes to the structures and dynamics of condensed-phase systems. Here, such an analysis is carried out for the hydration-induced frequency shifts of the NH stretching mode of peptide. It is shown that, with a help of machine learning, it is possible to describe the frequency shifts as maps employing the electric fields operating on the atomic sites (H, N, C, and O) as descriptors. By comparing the results obtained for different combinations of descriptor sets and machine learning models, suitable ways to construct the maps of good performance are clarified. The nature of the electrostatic response of this mode, especially how the electric fields on the four atomic sites are involved in controlling the frequency shifts, is discussed.
阐明振动模态的频率是如何通过与系统中其他部分的相互作用来调制的,这对于更好地利用模态作为凝聚态系统结构和动力学的探针是很重要的。在这里,我们对水合作用引起的肽的NH伸展模式的频移进行了这样的分析。结果表明,在机器学习的帮助下,可以将频移描述为使用在原子位置(H, N, C和O)上运行的电场作为描述符的图。通过比较描述符集和机器学习模型的不同组合得到的结果,明确了构建性能良好的映射的合适方法。讨论了这种模式的静电响应的性质,特别是四个原子位上的电场如何参与控制频移。
{"title":"Electrostatic map for the hydration-induced frequency shifts of the NH stretching mode of peptide constructed using some machine learning models","authors":"Hajime Torii ,&nbsp;Mikito Tsujimoto ,&nbsp;Yukichi Kitamura","doi":"10.1016/j.aichem.2025.100098","DOIUrl":"10.1016/j.aichem.2025.100098","url":null,"abstract":"<div><div>Elucidating the ways how the frequencies of vibrational modes are modulated by interactions with other moieties in a system is important for better utilization of the modes as probes to the structures and dynamics of condensed-phase systems. Here, such an analysis is carried out for the hydration-induced frequency shifts of the NH stretching mode of peptide. It is shown that, with a help of machine learning, it is possible to describe the frequency shifts as maps employing the electric fields operating on the atomic sites (H, N, C, and O) as descriptors. By comparing the results obtained for different combinations of descriptor sets and machine learning models, suitable ways to construct the maps of good performance are clarified. The nature of the electrostatic response of this mode, especially how the electric fields on the four atomic sites are involved in controlling the frequency shifts, is discussed.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 2","pages":"Article 100098"},"PeriodicalIF":0.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145579145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural networks for neurocomputing circuits: A computational study of tolerance to noise and activation function non-uniformity when machine learning materials properties 神经计算电路的神经网络:机器学习材料特性时对噪声和激活函数非均匀性的容忍度的计算研究
Pub Date : 2025-11-14 DOI: 10.1016/j.aichem.2025.100099
Ye min Thant , Methawee Nukunudompanich , Chu-Chen Chueh , Manabu Ihara , Sergei Manzhos
Dedicated analog neurocomputing circuits are promising for high-throughput, low power consumption applications of machine learning (ML) and for applications where implementing a digital computer is unwieldy (remote locations; small, mobile, and autonomous devices, extreme conditions, etc.). Neural networks (NN) implemented in such circuits, however, must contend with circuit noise and the non-uniform shapes of the neuron activation function (NAF) due to the dispersion of performance characteristics of circuit elements (such as transistors or diodes implementing the neurons). We present a computational study of the impact of circuit noise and NAF inhomogeneity in regression problems as a function of NN architecture and training regimes. We focus on one application that requires high-throughput ML: materials informatics, using as representative problem ML of formation energies vs. lowest-energy isomer of peri-condensed hydrocarbons, formation energies and band gaps of double perovskites, and zero point vibrational energies of molecules from QM9 dataset. We show that in these applications, NNs generally possess low noise tolerance with the model accuracy rapidly degrading with noise level. Single-hidden layer NNs, and NNs with larger-than-optimal sizes are somewhat more noise-tolerant. Models that show less overfitting (not necessarily the lowest test set error) are more noise-tolerant. Importantly, we demonstrate that the effect of activation function inhomogeneity can be palliated by retraining the NN using practically realized shapes of NAFs.
专用模拟神经计算电路有望用于机器学习(ML)的高通量,低功耗应用以及实现数字计算机难以实现的应用(远程位置,小型,移动和自主设备,极端条件等)。然而,在这种电路中实现的神经网络(NN)必须应对电路噪声和由于电路元件(如实现神经元的晶体管或二极管)性能特性的分散而导致的神经元激活函数(NAF)的非均匀形状。我们提出了电路噪声和NAF非均匀性对回归问题的影响的计算研究,作为神经网络结构和训练制度的函数。我们专注于一个需要高通量机器学习的应用:材料信息学,使用具有代表性的问题机器学习:半凝聚碳氢化合物的地层能量与最低能量异构体,双钙钛矿的地层能量和带隙,以及QM9数据集中分子的零点振动能量。研究表明,在这些应用中,神经网络通常具有较低的噪声容忍度,模型精度随噪声水平迅速下降。单隐藏层神经网络和比最优尺寸更大的神经网络在某种程度上更耐噪声。过度拟合较少的模型(不一定是最低的测试集误差)对噪声的容忍度更高。重要的是,我们证明了激活函数不均匀性的影响可以通过使用实际实现的神经网络形状来重新训练神经网络来缓解。
{"title":"Neural networks for neurocomputing circuits: A computational study of tolerance to noise and activation function non-uniformity when machine learning materials properties","authors":"Ye min Thant ,&nbsp;Methawee Nukunudompanich ,&nbsp;Chu-Chen Chueh ,&nbsp;Manabu Ihara ,&nbsp;Sergei Manzhos","doi":"10.1016/j.aichem.2025.100099","DOIUrl":"10.1016/j.aichem.2025.100099","url":null,"abstract":"<div><div>Dedicated analog neurocomputing circuits are promising for high-throughput, low power consumption applications of machine learning (ML) and for applications where implementing a digital computer is unwieldy (remote locations; small, mobile, and autonomous devices, extreme conditions, etc.). Neural networks (NN) implemented in such circuits, however, must contend with circuit noise and the non-uniform shapes of the neuron activation function (NAF) due to the dispersion of performance characteristics of circuit elements (such as transistors or diodes implementing the neurons). We present a computational study of the impact of circuit noise and NAF inhomogeneity in regression problems as a function of NN architecture and training regimes. We focus on one application that requires high-throughput ML: materials informatics, using as representative problem ML of formation energies vs. lowest-energy isomer of peri-condensed hydrocarbons, formation energies and band gaps of double perovskites, and zero point vibrational energies of molecules from QM9 dataset. We show that in these applications, NNs generally possess low noise tolerance with the model accuracy rapidly degrading with noise level. Single-hidden layer NNs, and NNs with larger-than-optimal sizes are somewhat more noise-tolerant. Models that show less overfitting (not necessarily the lowest test set error) are more noise-tolerant. Importantly, we demonstrate that the effect of activation function inhomogeneity can be palliated by retraining the NN using practically realized shapes of NAFs.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 2","pages":"Article 100099"},"PeriodicalIF":0.0,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145527993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Artificial intelligence chemistry
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1