利用蒙特卡洛丢弃法增强拉曼光谱分类的人工智能决策置信度

IF 5.7 2区 化学 Q1 CHEMISTRY, ANALYTICAL Analytica Chimica Acta Pub Date : 2024-10-16 DOI:10.1016/j.aca.2024.343346
Jhonatan Contreras , Thomas Bocklitz
{"title":"利用蒙特卡洛丢弃法增强拉曼光谱分类的人工智能决策置信度","authors":"Jhonatan Contreras ,&nbsp;Thomas Bocklitz","doi":"10.1016/j.aca.2024.343346","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Machine learning algorithms for bacterial strain identification using Raman spectroscopy have been widely used in microbiology. During the training phase, existing datasets are augmented and used to optimize model architecture and hyperparameters. After training, it is presumed that the models have reached their peak performance and are used for inference without being further enhanced. Our methodology combines Monte Carlo Dropout (MCD) with convolutional neural networks (CNNs) by utilizing dropout during the inference phase, which enables to measure the model uncertainty, a critical but often ignored aspect in deep learning models.</div></div><div><h3>Results</h3><div>We categorize unseen input data into two subsets based on the uncertainty of their prediction by employing MCD and defining the threshold using the Gaussian Mixture Model (GMM). The final prediction is obtained on the subset of testing data that exhibits lower model uncertainty, thereby enhancing the reliability of the results. To validate our method, we applied it to two Raman spectra datasets. As a result, we have observed an increase in accuracy of 9 % for Dataset 1 (from 83.10 % to 92.10 %) and 12.82 % for Dataset 2 (from 83.86 % to 96.68 %). These improvements were observed within specific subsets of the data: 826 out of 1206 spectra in Dataset 1 and 1700 out of 3000 spectra in Dataset 2. This demonstrates the effectiveness of our approach in improving prediction accuracy by focusing on data with lower uncertainty.</div></div><div><h3>Significance</h3><div>Different from routine prediction based on mere probabilities, we believe this uncertainty-guided prediction is more effective to ensure a high prediction rate rather than the prediction on the entire dataset. By guiding the decision-making of a model on higher-confidence subsets, our methodology can enhance the accuracy of classification in critical areas like disease diagnosis and safety monitoring. This targeted approach is to advance microbial identification and produces more trustworthy predictions.</div></div>","PeriodicalId":240,"journal":{"name":"Analytica Chimica Acta","volume":"1332 ","pages":"Article 343346"},"PeriodicalIF":5.7000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing decision confidence in AI using Monte Carlo dropout for Raman spectra classification\",\"authors\":\"Jhonatan Contreras ,&nbsp;Thomas Bocklitz\",\"doi\":\"10.1016/j.aca.2024.343346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Machine learning algorithms for bacterial strain identification using Raman spectroscopy have been widely used in microbiology. During the training phase, existing datasets are augmented and used to optimize model architecture and hyperparameters. After training, it is presumed that the models have reached their peak performance and are used for inference without being further enhanced. Our methodology combines Monte Carlo Dropout (MCD) with convolutional neural networks (CNNs) by utilizing dropout during the inference phase, which enables to measure the model uncertainty, a critical but often ignored aspect in deep learning models.</div></div><div><h3>Results</h3><div>We categorize unseen input data into two subsets based on the uncertainty of their prediction by employing MCD and defining the threshold using the Gaussian Mixture Model (GMM). The final prediction is obtained on the subset of testing data that exhibits lower model uncertainty, thereby enhancing the reliability of the results. To validate our method, we applied it to two Raman spectra datasets. As a result, we have observed an increase in accuracy of 9 % for Dataset 1 (from 83.10 % to 92.10 %) and 12.82 % for Dataset 2 (from 83.86 % to 96.68 %). These improvements were observed within specific subsets of the data: 826 out of 1206 spectra in Dataset 1 and 1700 out of 3000 spectra in Dataset 2. This demonstrates the effectiveness of our approach in improving prediction accuracy by focusing on data with lower uncertainty.</div></div><div><h3>Significance</h3><div>Different from routine prediction based on mere probabilities, we believe this uncertainty-guided prediction is more effective to ensure a high prediction rate rather than the prediction on the entire dataset. By guiding the decision-making of a model on higher-confidence subsets, our methodology can enhance the accuracy of classification in critical areas like disease diagnosis and safety monitoring. This targeted approach is to advance microbial identification and produces more trustworthy predictions.</div></div>\",\"PeriodicalId\":240,\"journal\":{\"name\":\"Analytica Chimica Acta\",\"volume\":\"1332 \",\"pages\":\"Article 343346\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytica Chimica Acta\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0003267024011474\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytica Chimica Acta","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003267024011474","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

摘要

背景利用拉曼光谱进行细菌菌株鉴定的机器学习算法已广泛应用于微生物学领域。在训练阶段,现有数据集会被扩充并用于优化模型结构和超参数。训练结束后,假定模型已达到其性能峰值,就可用于推断而无需进一步增强。我们的方法通过在推理阶段利用丢弃,将蒙特卡罗丢弃(MCD)与卷积神经网络(CNN)相结合,从而能够测量模型的不确定性,这是深度学习模型中一个关键但经常被忽视的方面。结果我们通过使用 MCD 和高斯混合模型(GMM)定义阈值,根据预测的不确定性将未见输入数据分为两个子集。最终预测结果是在模型不确定性较低的测试数据子集上获得的,从而提高了结果的可靠性。为了验证我们的方法,我们将其应用于两个拉曼光谱数据集。结果,我们发现数据集 1 的准确率提高了 9%(从 83.10% 提高到 92.10%),数据集 2 的准确率提高了 12.82%(从 83.86% 提高到 96.68%)。这些改进是在特定的数据子集中观察到的:数据集 1 中 1206 个光谱中的 826 个和数据集 2 中 3000 个光谱中的 1700 个。这证明了我们的方法在通过关注不确定性较低的数据来提高预测准确率方面的有效性。意义与单纯基于概率的常规预测不同,我们认为这种以不确定性为导向的预测比对整个数据集进行预测更能有效地确保高预测率。通过引导模型在高置信度子集上做出决策,我们的方法可以提高疾病诊断和安全监控等关键领域的分类准确性。这种有针对性的方法将推动微生物鉴定,并产生更可信的预测结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing decision confidence in AI using Monte Carlo dropout for Raman spectra classification

Background

Machine learning algorithms for bacterial strain identification using Raman spectroscopy have been widely used in microbiology. During the training phase, existing datasets are augmented and used to optimize model architecture and hyperparameters. After training, it is presumed that the models have reached their peak performance and are used for inference without being further enhanced. Our methodology combines Monte Carlo Dropout (MCD) with convolutional neural networks (CNNs) by utilizing dropout during the inference phase, which enables to measure the model uncertainty, a critical but often ignored aspect in deep learning models.

Results

We categorize unseen input data into two subsets based on the uncertainty of their prediction by employing MCD and defining the threshold using the Gaussian Mixture Model (GMM). The final prediction is obtained on the subset of testing data that exhibits lower model uncertainty, thereby enhancing the reliability of the results. To validate our method, we applied it to two Raman spectra datasets. As a result, we have observed an increase in accuracy of 9 % for Dataset 1 (from 83.10 % to 92.10 %) and 12.82 % for Dataset 2 (from 83.86 % to 96.68 %). These improvements were observed within specific subsets of the data: 826 out of 1206 spectra in Dataset 1 and 1700 out of 3000 spectra in Dataset 2. This demonstrates the effectiveness of our approach in improving prediction accuracy by focusing on data with lower uncertainty.

Significance

Different from routine prediction based on mere probabilities, we believe this uncertainty-guided prediction is more effective to ensure a high prediction rate rather than the prediction on the entire dataset. By guiding the decision-making of a model on higher-confidence subsets, our methodology can enhance the accuracy of classification in critical areas like disease diagnosis and safety monitoring. This targeted approach is to advance microbial identification and produces more trustworthy predictions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Analytica Chimica Acta
Analytica Chimica Acta 化学-分析化学
CiteScore
10.40
自引率
6.50%
发文量
1081
审稿时长
38 days
期刊介绍: Analytica Chimica Acta has an open access mirror journal Analytica Chimica Acta: X, sharing the same aims and scope, editorial team, submission system and rigorous peer review. Analytica Chimica Acta provides a forum for the rapid publication of original research, and critical, comprehensive reviews dealing with all aspects of fundamental and applied modern analytical chemistry. The journal welcomes the submission of research papers which report studies concerning the development of new and significant analytical methodologies. In determining the suitability of submitted articles for publication, particular scrutiny will be placed on the degree of novelty and impact of the research and the extent to which it adds to the existing body of knowledge in analytical chemistry.
期刊最新文献
Flexible disk ultramicroelectrode: facile preparation and high-resolution scanning electrochemical microscopy imaging Quantification of five antineoplastic agents in swab samples using UPLC-ESI-MS/MS: method development and validation Modular comparison of untargeted metabolomics processing steps Deep eutectic solution elution assisted ligand affinity assay: a useful tool for the active coumarins screening from Fructus cnidii Serially Coupled Columns Enhance Rapid Separation and Predictive Interaction Understanding of 93 Fentanyl Analogs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1