Machine learning in forensic toxicology: Applications, experiences, and future directions

IF 1.8 Q4 TOXICOLOGY Toxicologie Analytique et Clinique Pub Date : 2025-03-01 DOI:10.1016/j.toxac.2025.01.014
Michael Scholz
{"title":"Machine learning in forensic toxicology: Applications, experiences, and future directions","authors":"Michael Scholz","doi":"10.1016/j.toxac.2025.01.014","DOIUrl":null,"url":null,"abstract":"<div><div>Giving a basic overview of principles of machine learning and its pitfalls together with real world successful examples. This should help improve technological literacy of machine learning within the forensic toxicologist community.</div><div>The demands on a forensic toxicologist are changing rapidly. In the past, it was sufficient to operate a GC-MS or LC-MS device with often extremely user-unfriendly software to obtain a result. Then the evaluation of a case could begin. However, as analytical instruments have become faster, more sensitive, versatile and powerful, forensic toxicology has evolved in parallel. This development has been accompanied by a rapid increase in the volume of data. This trend is particularly evident in high-resolution mass spectrometry and non-targeted search analysis, in which a large number of substances can be detected in complex biological samples. Forensic toxicologists are no longer interested only in prescription or illegal drugs, but in the totality of all small molecules in the human body (the so-called metabolome). Under certain circumstances, changes in the metabolome can provide clues to drug use, cause of death, drunk or even drowsy driving. It is obvious that these huge amounts of data can no longer be analyzed manually.</div><div>Machine learning (ML), a subfield of artificial intelligence, has proven to be extremely powerful and promising in tackling large, complex, and high-dimensional data sets. ML can make predictions, find patterns, or classify data. The three-machine learning types are supervised, unsupervised, and reinforcement learning. It has emerged over the last decade, and consists of many different learning algorithms (e.g. Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes and others). Currently, these algorithms are finding their way into forensic toxicology. However, this transformative technology is not without its challenges. While the underlying principles of ML are easy to understand, there are a lot of pitfalls to avoid ensuring that ML can actually improve results in forensic toxicology. There are so many easy-to-make mistakes that can cause an ML model to appear to perform well, when in reality it does not.</div><div>The most common pitfalls are: inadequate or non-representative training data, poor quality of data or overfitting and underfitting. It is of the utmost importance to correctly split datasets, train algorithms, and validate results. Another problem that severely impacts machine-learning algorithms is the curse of dimensionality, a phenomenon where the efficiency and effectiveness of algorithms deteriorate as the dimensionality of the data increases exponentially. Consequently, the skilled forensic toxicologist must employ dimensionality reduction techniques such as selection of the most relevant features from the original dataset while discarding irrelevant or redundant ones (feature selection). This reduces the dimensionality of the data, simplifying the model and improving its efficiency. One can also transform the original high-dimensional data into a lower-dimensional space by creating new features that capture the essential information (feature extraction). It also helps to scale the features to a similar range to prevent certain features from dominating others, especially in distance-based algorithms. To further ensure robustness in the model training process, missing data should be addressed appropriately through imputation or deletion.</div><div>Examples of successful implementation of ML in forensic toxicology: the combination of machine learning and (high-resolution) mass spectrometry offers incredible synergy that can be harnessed to optimize workflows by detection of sample adulteration, improve detection of difficult analyte groups (e.g. synthetic cannabinoid receptor agonists, SCRAs), and optimize processing of high-dimensional data sets. This approach can help with even the most complex problems in our field, such as detecting the effects of sleepiness on the metabolome and establishing biomarkers of sleepiness.</div></div>","PeriodicalId":23170,"journal":{"name":"Toxicologie Analytique et Clinique","volume":"37 1","pages":"Pages S14-S15"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxicologie Analytique et Clinique","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352007825000149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Giving a basic overview of principles of machine learning and its pitfalls together with real world successful examples. This should help improve technological literacy of machine learning within the forensic toxicologist community.
The demands on a forensic toxicologist are changing rapidly. In the past, it was sufficient to operate a GC-MS or LC-MS device with often extremely user-unfriendly software to obtain a result. Then the evaluation of a case could begin. However, as analytical instruments have become faster, more sensitive, versatile and powerful, forensic toxicology has evolved in parallel. This development has been accompanied by a rapid increase in the volume of data. This trend is particularly evident in high-resolution mass spectrometry and non-targeted search analysis, in which a large number of substances can be detected in complex biological samples. Forensic toxicologists are no longer interested only in prescription or illegal drugs, but in the totality of all small molecules in the human body (the so-called metabolome). Under certain circumstances, changes in the metabolome can provide clues to drug use, cause of death, drunk or even drowsy driving. It is obvious that these huge amounts of data can no longer be analyzed manually.
Machine learning (ML), a subfield of artificial intelligence, has proven to be extremely powerful and promising in tackling large, complex, and high-dimensional data sets. ML can make predictions, find patterns, or classify data. The three-machine learning types are supervised, unsupervised, and reinforcement learning. It has emerged over the last decade, and consists of many different learning algorithms (e.g. Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes and others). Currently, these algorithms are finding their way into forensic toxicology. However, this transformative technology is not without its challenges. While the underlying principles of ML are easy to understand, there are a lot of pitfalls to avoid ensuring that ML can actually improve results in forensic toxicology. There are so many easy-to-make mistakes that can cause an ML model to appear to perform well, when in reality it does not.
The most common pitfalls are: inadequate or non-representative training data, poor quality of data or overfitting and underfitting. It is of the utmost importance to correctly split datasets, train algorithms, and validate results. Another problem that severely impacts machine-learning algorithms is the curse of dimensionality, a phenomenon where the efficiency and effectiveness of algorithms deteriorate as the dimensionality of the data increases exponentially. Consequently, the skilled forensic toxicologist must employ dimensionality reduction techniques such as selection of the most relevant features from the original dataset while discarding irrelevant or redundant ones (feature selection). This reduces the dimensionality of the data, simplifying the model and improving its efficiency. One can also transform the original high-dimensional data into a lower-dimensional space by creating new features that capture the essential information (feature extraction). It also helps to scale the features to a similar range to prevent certain features from dominating others, especially in distance-based algorithms. To further ensure robustness in the model training process, missing data should be addressed appropriately through imputation or deletion.
Examples of successful implementation of ML in forensic toxicology: the combination of machine learning and (high-resolution) mass spectrometry offers incredible synergy that can be harnessed to optimize workflows by detection of sample adulteration, improve detection of difficult analyte groups (e.g. synthetic cannabinoid receptor agonists, SCRAs), and optimize processing of high-dimensional data sets. This approach can help with even the most complex problems in our field, such as detecting the effects of sleepiness on the metabolome and establishing biomarkers of sleepiness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
相关文献
来源期刊
CiteScore
0.90
自引率
33.30%
发文量
393
审稿时长
47 days
期刊最新文献
Editorial Board Editorial Board Dent versus mèche de cheveux. Quelle matrice utiliser pour documenter une exposition répétée ? À propos d’un cas impliquant l’aripiprazole Urinary tissue inhibitor metalloproteinase-2 and insulin-like growth factor-binding protein-7 as early predictors of the development and prognosis of acute kidney injury in acutely poisoned patients Décès en lien avec des pharmacobézoards : à propos de deux cas
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1