用于纯组件特性估计的改进型机器学习模型

IF 10.1 1区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY Engineering Pub Date : 2024-08-01 DOI:10.1016/j.eng.2023.08.024

Xinyu Cao , Ming Gong , Anjan Tula , Xi Chen , Rafiqul Gani , Venkat Venkatasubramanian

{"title":"用于纯组件特性估计的改进型机器学习模型","authors":"Xinyu Cao , Ming Gong , Anjan Tula , Xi Chen , Rafiqul Gani , Venkat Venkatasubramanian","doi":"10.1016/j.eng.2023.08.024","DOIUrl":null,"url":null,"abstract":"<div><p>Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.</p></div>","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"39 ","pages":"Pages 61-73"},"PeriodicalIF":10.1000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2095809924001590/pdfft?md5=1467de2f6cb3888be2501c5f8217cd9b&pid=1-s2.0-S2095809924001590-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An Improved Machine Learning Model for Pure Component Property Estimation\",\"authors\":\"Xinyu Cao , Ming Gong , Anjan Tula , Xi Chen , Rafiqul Gani , Venkat Venkatasubramanian\",\"doi\":\"10.1016/j.eng.2023.08.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.</p></div>\",\"PeriodicalId\":11783,\"journal\":{\"name\":\"Engineering\",\"volume\":\"39 \",\"pages\":\"Pages 61-73\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2095809924001590/pdfft?md5=1467de2f6cb3888be2501c5f8217cd9b&pid=1-s2.0-S2095809924001590-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2095809924001590\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095809924001590","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

在进行工艺设计和产品设计等工作时，有关化学物质理化性质的信息是一个重要的先决条件。然而，大量数据的缺乏和高昂的实验成本阻碍了这些性质预测技术的发展。此外，准确性和预测能力仍然限制了大多数性质估计方法的范围和适用性。本文提出了一种新的基于高斯过程的建模框架，旨在利用组贡献方法管理与分子结构表征相关的离散高维输入空间。使用扭曲函数将离散输入映射到连续域，以调整不同化合物之间的相关性。在构建过程中，还应用了先验选择技术，包括先验激发和先验预测检查，以便从先前的研究成果中为模型提供更多信息。该框架使用不同规模的数据集对 20 种纯成分特性进行了评估。在 20 个纯组件属性中的 18 个属性中，与其他已发布的模型相比，无论是否使用机器学习，新模型的准确性和预测能力都有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Improved Machine Learning Model for Pure Component Property Estimation

Information on the physicochemical properties of chemical species is an important prerequisite when performing tasks such as process design and product design. However, the lack of extensive data and high experimental costs hinder the development of prediction techniques for these properties. Moreover, accuracy and predictive capabilities still limit the scope and applicability of most property estimation methods. This paper proposes a new Gaussian process-based modeling framework that aims to manage a discrete and high-dimensional input space related to molecular structure representation with the group-contribution approach. A warping function is used to map discrete input into a continuous domain in order to adjust the correlation between different compounds. Prior selection techniques, including prior elicitation and prior predictive checking, are also applied during the building procedure to provide the model with more information from previous research findings. The framework is assessed using datasets of varying sizes for 20 pure component properties. For 18 out of the 20 pure component properties, the new models are found to give improved accuracy and predictive power in comparison with other published models, with and without machine learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Engineering Environmental Science-Environmental Engineering

自引率

1.60%

发文量

335

审稿时长

35 days

期刊介绍： Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.