Critical assessment of machine learning prediction of biomass pyrolysis

IF 7.5 1区 工程技术 Q2 ENERGY & FUELS Fuel Pub Date : 2025-03-18 DOI:10.1016/j.fuel.2025.135000
Antonio Elia Pascarella , Antonio Coppola , Stefano Marrone , Roberto Chirone , Carlo Sansone , Piero Salatino
{"title":"Critical assessment of machine learning prediction of biomass pyrolysis","authors":"Antonio Elia Pascarella ,&nbsp;Antonio Coppola ,&nbsp;Stefano Marrone ,&nbsp;Roberto Chirone ,&nbsp;Carlo Sansone ,&nbsp;Piero Salatino","doi":"10.1016/j.fuel.2025.135000","DOIUrl":null,"url":null,"abstract":"<div><div>Biomass pyrolysis is a complex process, quite challenging to model physically and Modern AI methods could improve its prediction and characterization. However, AI model construction requires high-quality datasets. Existing datasets in literature, usually only a few hundred records, are inadequate for robust AI applications.</div><div>A first goal of the study was to make best use of the currently available body of experimental data on fixed bed non-catalytic biomass pyrolysis by comprehensively compiling available data from nearly 160 sources into a new dataset of 1137 records. Each record was carefully standardized to overcome inconsistencies in terminology and lack of uniformity among different sources. This extended dataset (including biomass properties, pyrolysis operating conditions, and bioliquid yield), integrating previous ones, is intended to promote community-based data sharing. The compiled dataset was characterized by remarkable data sparsity, due to lack of completeness of the original data.</div><div>A second goal was benchmarking different regression and data imputation models to assess the predictive ability of ML applied to the collected dataset. The most accurate estimates were obtained by leveraging a subset of about 500 instances without missing values, resulting in a Mean Absolute Error (MAE) of 2.28. Application of ML to the entire dataset with imputed missing data yielded a less accurate estimate (MAE = 3.45), a feature that underlines the criticality of missing data imputation, and of the sparsity of the dataset.</div><div>A third and mostly relevant goal was the critical assessment of Explainable Artificial Intelligence (XAI) techniques that come into play when ML is aimed at evaluating the importance and directional trends of selected features. XAI tools, namely Partial Dependence Plots (PDP) and SHAP, have been applied to the dataset to assess their trustworthiness to support mechanistic inference of the importance and directional trends of key biomass properties and process operational parameters on pyrolysis yields. The result of this analysis is far from satisfactory. Significant discrepancies across studies, inconsistencies among different methods and somewhat erratic trends in PDP plots reflect the challenge in achieving consistent mechanistic insights from purely data-driven approaches, suggesting the adoption of physics-informed machine learning embodying physico-chemical relationships to improved Explainable AI.</div></div>","PeriodicalId":325,"journal":{"name":"Fuel","volume":"394 ","pages":"Article 135000"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fuel","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016236125007252","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

Abstract

Biomass pyrolysis is a complex process, quite challenging to model physically and Modern AI methods could improve its prediction and characterization. However, AI model construction requires high-quality datasets. Existing datasets in literature, usually only a few hundred records, are inadequate for robust AI applications.
A first goal of the study was to make best use of the currently available body of experimental data on fixed bed non-catalytic biomass pyrolysis by comprehensively compiling available data from nearly 160 sources into a new dataset of 1137 records. Each record was carefully standardized to overcome inconsistencies in terminology and lack of uniformity among different sources. This extended dataset (including biomass properties, pyrolysis operating conditions, and bioliquid yield), integrating previous ones, is intended to promote community-based data sharing. The compiled dataset was characterized by remarkable data sparsity, due to lack of completeness of the original data.
A second goal was benchmarking different regression and data imputation models to assess the predictive ability of ML applied to the collected dataset. The most accurate estimates were obtained by leveraging a subset of about 500 instances without missing values, resulting in a Mean Absolute Error (MAE) of 2.28. Application of ML to the entire dataset with imputed missing data yielded a less accurate estimate (MAE = 3.45), a feature that underlines the criticality of missing data imputation, and of the sparsity of the dataset.
A third and mostly relevant goal was the critical assessment of Explainable Artificial Intelligence (XAI) techniques that come into play when ML is aimed at evaluating the importance and directional trends of selected features. XAI tools, namely Partial Dependence Plots (PDP) and SHAP, have been applied to the dataset to assess their trustworthiness to support mechanistic inference of the importance and directional trends of key biomass properties and process operational parameters on pyrolysis yields. The result of this analysis is far from satisfactory. Significant discrepancies across studies, inconsistencies among different methods and somewhat erratic trends in PDP plots reflect the challenge in achieving consistent mechanistic insights from purely data-driven approaches, suggesting the adoption of physics-informed machine learning embodying physico-chemical relationships to improved Explainable AI.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生物质热解的机器学习预测的关键评估
生物质热解是一个复杂的过程,物理建模非常具有挑战性,现代人工智能方法可以改善其预测和表征。然而,人工智能模型的构建需要高质量的数据集。文献中现有的数据集,通常只有几百条记录,不足以用于强大的人工智能应用。该研究的第一个目标是通过将来自近160个来源的现有数据综合汇编成一个包含1137条记录的新数据集,最大限度地利用目前可用的固定床非催化生物质热解实验数据体。每个记录都经过仔细的标准化,以克服术语上的不一致和不同来源之间缺乏一致性。这个扩展的数据集(包括生物质特性、热解操作条件和生物液体产量)整合了之前的数据集,旨在促进基于社区的数据共享。由于原始数据缺乏完整性,编译后的数据具有显著的数据稀疏性。第二个目标是对不同的回归和数据输入模型进行基准测试,以评估应用于收集的数据集的机器学习的预测能力。最准确的估计是通过利用大约500个没有缺失值的实例子集获得的,其平均绝对误差(MAE)为2.28。将ML应用于包含缺失数据的整个数据集产生了较不准确的估计(MAE = 3.45),这一特征强调了缺失数据输入的重要性,以及数据集的稀疏性。第三个也是最相关的目标是对可解释人工智能(XAI)技术的关键评估,当ML旨在评估选定特征的重要性和方向趋势时,XAI技术就会发挥作用。XAI工具,即部分依赖图(PDP)和SHAP,已应用于数据集,以评估其可信度,以支持关键生物质特性和工艺操作参数对热解产量的重要性和方向趋势的机制推断。这个分析的结果远不能令人满意。研究之间的显著差异、不同方法之间的不一致性以及PDP图中有些不稳定的趋势反映了从纯数据驱动的方法中获得一致的机制见解所面临的挑战,这表明采用物理信息的机器学习体现了物理-化学关系,以改进可解释的人工智能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Fuel
Fuel 工程技术-工程:化工
CiteScore
12.80
自引率
20.30%
发文量
3506
审稿时长
64 days
期刊介绍: The exploration of energy sources remains a critical matter of study. For the past nine decades, fuel has consistently held the forefront in primary research efforts within the field of energy science. This area of investigation encompasses a wide range of subjects, with a particular emphasis on emerging concerns like environmental factors and pollution.
期刊最新文献
A comprehensive review on Lewis acid functionalized electrocatalysts for water splitting Lignin particles morphology: A neglected factor in cellulase hydrolysis Single-walled carbon Nanotube-Encapsulated polyoxometalates for Wide-Range humidity PEM fuel cells Catalytic tar cracking over calcium oxide-based bifunctional materials during biomass chemical looping gasification: Experimental and DFT approaches Anchoring MoOx nanodots in N-doped porous carbon via a biomimetic strategy: enhanced supercapacitor performance
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1