Cristian Rojas , Mónica Abril-González , Davide Ballabio , Fernando García
{"title":"ChemTastesPredictor: An ensemble of machine learning classifiers to predict the taste of molecular tastants","authors":"Cristian Rojas , Mónica Abril-González , Davide Ballabio , Fernando García","doi":"10.1016/j.chemolab.2025.105380","DOIUrl":null,"url":null,"abstract":"<div><div>The sense of taste plays a critical role in food science, since it directly impacts food consumption, human nutrition, and overall health. Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. This study describes the development of <em>ChemTastesPredictor</em> designed to predict the taste of 4075 molecular tastants included in the extended version of <em>ChemTastesDB</em> (<span><span>https://zenodo.org/records/14963136</span><svg><path></path></svg></span>). To the best of our knowledge, this represents the largest dataset with a broad-based chemical space used to calibrate machine learning (ML) models for taste prediction based on molecular descriptors and fingerprints. For validation, datasets were randomly split into training and test sets in a 75:25 ratio, ensuring balanced class distributions. In binary classification tasks, the Random Forest classifier demonstrated the highest predictive performance for sweet/bitter (<em>NER</em> = 0.928 and <em>F-score</em> = 0.927) and bitter/non-bitter (<em>NER</em> = 0.902 and <em>F-score</em> = 0.903) classification. Adaptive Boosting excelled in the prediction of sweet/non-sweet (<em>NER</em> = 0.861 and <em>F-score</em> = 0.862). The <em>N</em>-Nearest Neighbors classifier emerged as the optimal classifier for umami/non-umami (<em>NER</em> = 0.957 and <em>F-score</em> = 0.860) and sweet/bitter/umami (<em>NER</em> = 0.870 and <em>F-score</em> = 0.843). These models may be useful in the development and analysis of new chemical tastants.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105380"},"PeriodicalIF":3.7000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743925000656","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The sense of taste plays a critical role in food science, since it directly impacts food consumption, human nutrition, and overall health. Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. This study describes the development of ChemTastesPredictor designed to predict the taste of 4075 molecular tastants included in the extended version of ChemTastesDB (https://zenodo.org/records/14963136). To the best of our knowledge, this represents the largest dataset with a broad-based chemical space used to calibrate machine learning (ML) models for taste prediction based on molecular descriptors and fingerprints. For validation, datasets were randomly split into training and test sets in a 75:25 ratio, ensuring balanced class distributions. In binary classification tasks, the Random Forest classifier demonstrated the highest predictive performance for sweet/bitter (NER = 0.928 and F-score = 0.927) and bitter/non-bitter (NER = 0.902 and F-score = 0.903) classification. Adaptive Boosting excelled in the prediction of sweet/non-sweet (NER = 0.861 and F-score = 0.862). The N-Nearest Neighbors classifier emerged as the optimal classifier for umami/non-umami (NER = 0.957 and F-score = 0.860) and sweet/bitter/umami (NER = 0.870 and F-score = 0.843). These models may be useful in the development and analysis of new chemical tastants.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.