Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine content

IF 1 Q3 MULTIDISCIPLINARY SCIENCES Data in Brief Pub Date : 2025-02-01 DOI:10.1016/j.dib.2024.111243
Gentil A. Collazos-Escobar , Andrés F. Bahamón-Monje , Nelson Gutiérrez-Guzmán
{"title":"Mid-infrared spectra of dried and roasted cocoa (Theobroma cacao L.): A dataset for machine learning-based classification of cocoa varieties and prediction of theobromine and caffeine content","authors":"Gentil A. Collazos-Escobar ,&nbsp;Andrés F. Bahamón-Monje ,&nbsp;Nelson Gutiérrez-Guzmán","doi":"10.1016/j.dib.2024.111243","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (<em>Theobroma cacao</em> L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.</div></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":"58 ","pages":"Article 111243"},"PeriodicalIF":1.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11748727/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924012058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents a comprehensive dataset of mid-infrared spectra for dried and roasted cocoa beans (Theobroma cacao L.), along with their corresponding theobromine and caffeine content. Infrared data were acquired using Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) spectroscopy, while High-Performance Liquid Chromatography (HPLC) was employed to accurately quantify theobromine and caffeine in the dried cocoa beans. The theobromine/caffeine relationship served as a robust chemical marker for distinguishing between different cocoa varieties. This dataset provides a basis for further research, enabling the integration of mid-infrared spectral data with HPLC (as a standard) to fine-tune machine learning and deep learning models that could be used to simultaneously predict the theobromine and caffeine content, as well as cocoa variety in both dried and roasted cocoa samples using a non-destructive approach based on spectral data. The tools developed from this dataset could significantly advance automated processes in the cocoa industry and support decision-making on an industrial scale, facilitating real-time quality control of cocoa-based products, improving cocoa variety classification, and optimizing bean selection, blending strategies, and product formulation, while reducing the need for labor-intensive and costly quantification methods. The dataset is organized into Excel sheets and structured according to experimental conditions and replicates, providing a valuable framework for further analysis, model development, and calibration of multivariate statistical models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
干燥和烘焙可可(Theobroma cocoa L.)的中红外光谱:一个基于机器学习的可可品种分类和可可碱和咖啡因含量预测数据集。
本文介绍了干燥和烘焙可可豆(Theobroma cacao L.)的中红外光谱综合数据集,以及相应的可可碱和咖啡因含量。红外数据采用衰减全反射-傅里叶变换红外光谱法(ATR-FTIR)获取,高效液相色谱法(HPLC)精确定量可可干中的可可碱和咖啡因。可可碱/咖啡因的关系是区分不同可可品种的强有力的化学标记。该数据集为进一步研究提供了基础,使中红外光谱数据与HPLC(作为标准)的集成能够微调机器学习和深度学习模型,该模型可用于同时预测可可碱和咖啡因含量,以及干燥和烘焙可可样品中的可可品种,使用基于光谱数据的非破坏性方法。从该数据集开发的工具可以显著推进可可产业的自动化过程,支持工业规模的决策,促进可可产品的实时质量控制,改进可可品种分类,优化豆类选择,混合策略和产品配方,同时减少对劳动密集型和昂贵的量化方法的需求。数据集被组织成Excel表格,并根据实验条件和重复进行结构化,为进一步分析、模型开发和多元统计模型的校准提供了有价值的框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Data in Brief
Data in Brief MULTIDISCIPLINARY SCIENCES-
CiteScore
3.10
自引率
0.00%
发文量
996
审稿时长
70 days
期刊介绍: Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.
期刊最新文献
Maternal health risk factors dataset: Clinical parameters and insights from rural Bangladesh Dataset of vocabulary in Uzbek primary education: Extraction and analysis in case of the school corpus CoAt-Set: Transformed coordinated attack dataset for collaborative intrusion detection simulation Data on hydrodynamic flow and aspiration mechanisms in a patient-specific pharyngolaryngeal model with variable epiglottis angles Dataset and analysis of automated and manual methods to differentiate wide QRS complex tachycardias
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1