ChemTastesPredictor: An ensemble of machine learning classifiers to predict the taste of molecular tastants

IF 3.7 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-03-12 DOI:10.1016/j.chemolab.2025.105380
Cristian Rojas , Mónica Abril-González , Davide Ballabio , Fernando García
{"title":"ChemTastesPredictor: An ensemble of machine learning classifiers to predict the taste of molecular tastants","authors":"Cristian Rojas ,&nbsp;Mónica Abril-González ,&nbsp;Davide Ballabio ,&nbsp;Fernando García","doi":"10.1016/j.chemolab.2025.105380","DOIUrl":null,"url":null,"abstract":"<div><div>The sense of taste plays a critical role in food science, since it directly impacts food consumption, human nutrition, and overall health. Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. This study describes the development of <em>ChemTastesPredictor</em> designed to predict the taste of 4075 molecular tastants included in the extended version of <em>ChemTastesDB</em> (<span><span>https://zenodo.org/records/14963136</span><svg><path></path></svg></span>). To the best of our knowledge, this represents the largest dataset with a broad-based chemical space used to calibrate machine learning (ML) models for taste prediction based on molecular descriptors and fingerprints. For validation, datasets were randomly split into training and test sets in a 75:25 ratio, ensuring balanced class distributions. In binary classification tasks, the Random Forest classifier demonstrated the highest predictive performance for sweet/bitter (<em>NER</em> = 0.928 and <em>F-score</em> = 0.927) and bitter/non-bitter (<em>NER</em> = 0.902 and <em>F-score</em> = 0.903) classification. Adaptive Boosting excelled in the prediction of sweet/non-sweet (<em>NER</em> = 0.861 and <em>F-score</em> = 0.862). The <em>N</em>-Nearest Neighbors classifier emerged as the optimal classifier for umami/non-umami (<em>NER</em> = 0.957 and <em>F-score</em> = 0.860) and sweet/bitter/umami (<em>NER</em> = 0.870 and <em>F-score</em> = 0.843). These models may be useful in the development and analysis of new chemical tastants.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105380"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925000656","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The sense of taste plays a critical role in food science, since it directly impacts food consumption, human nutrition, and overall health. Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. This study describes the development of ChemTastesPredictor designed to predict the taste of 4075 molecular tastants included in the extended version of ChemTastesDB (https://zenodo.org/records/14963136). To the best of our knowledge, this represents the largest dataset with a broad-based chemical space used to calibrate machine learning (ML) models for taste prediction based on molecular descriptors and fingerprints. For validation, datasets were randomly split into training and test sets in a 75:25 ratio, ensuring balanced class distributions. In binary classification tasks, the Random Forest classifier demonstrated the highest predictive performance for sweet/bitter (NER = 0.928 and F-score = 0.927) and bitter/non-bitter (NER = 0.902 and F-score = 0.903) classification. Adaptive Boosting excelled in the prediction of sweet/non-sweet (NER = 0.861 and F-score = 0.862). The N-Nearest Neighbors classifier emerged as the optimal classifier for umami/non-umami (NER = 0.957 and F-score = 0.860) and sweet/bitter/umami (NER = 0.870 and F-score = 0.843). These models may be useful in the development and analysis of new chemical tastants.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
味觉在食品科学中起着至关重要的作用,因为它直接影响着食品消费、人类营养和整体健康。根据分子味觉剂的化学结构和机器学习分类器来预测其味道的计算模型是食品信息学领域不断发展的有力工具。本研究介绍了ChemTastesPredictor的开发过程,该模型旨在预测ChemTastesDB扩展版(https://zenodo.org/records/14963136)中包含的4075种分子味素的味道。据我们所知,这是最大的数据集,具有广泛的化学空间,用于校准基于分子描述符和指纹的味觉预测机器学习(ML)模型。在验证时,数据集按 75:25 的比例随机分成训练集和测试集,以确保类的均衡分布。在二元分类任务中,随机森林分类器对甜/苦(NER = 0.928,F-score = 0.927)和苦/非苦(NER = 0.902,F-score = 0.903)分类的预测性能最高。自适应提升法在甜味/非甜味预测方面表现出色(NER = 0.861,F-score = 0.862)。N-Nearest Neighbors 分类器是预测鲜味/非鲜味(NER = 0.957,F-score = 0.860)和甜味/苦味/鲜味(NER = 0.870,F-score = 0.843)的最佳分类器。这些模型可能有助于开发和分析新的化学味素。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
期刊最新文献
A selective genetic algorithm - PLS-DA approach based on untargeted LC-HRMS: Application to complex biomass samples Editorial Board Stacking density estimation and its oversampling method for continuously imbalanced data in chemometrics ChemTastesPredictor: An ensemble of machine learning classifiers to predict the taste of molecular tastants Correlations between the constituent molecules, crystal structures, and dielectric constants in organic crystals
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1