{"title":"Machine learning in forensic toxicology: Applications, experiences, and future directions","authors":"Michael Scholz","doi":"10.1016/j.toxac.2025.01.014","DOIUrl":null,"url":null,"abstract":"<div><div>Giving a basic overview of principles of machine learning and its pitfalls together with real world successful examples. This should help improve technological literacy of machine learning within the forensic toxicologist community.</div><div>The demands on a forensic toxicologist are changing rapidly. In the past, it was sufficient to operate a GC-MS or LC-MS device with often extremely user-unfriendly software to obtain a result. Then the evaluation of a case could begin. However, as analytical instruments have become faster, more sensitive, versatile and powerful, forensic toxicology has evolved in parallel. This development has been accompanied by a rapid increase in the volume of data. This trend is particularly evident in high-resolution mass spectrometry and non-targeted search analysis, in which a large number of substances can be detected in complex biological samples. Forensic toxicologists are no longer interested only in prescription or illegal drugs, but in the totality of all small molecules in the human body (the so-called metabolome). Under certain circumstances, changes in the metabolome can provide clues to drug use, cause of death, drunk or even drowsy driving. It is obvious that these huge amounts of data can no longer be analyzed manually.</div><div>Machine learning (ML), a subfield of artificial intelligence, has proven to be extremely powerful and promising in tackling large, complex, and high-dimensional data sets. ML can make predictions, find patterns, or classify data. The three-machine learning types are supervised, unsupervised, and reinforcement learning. It has emerged over the last decade, and consists of many different learning algorithms (e.g. Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes and others). Currently, these algorithms are finding their way into forensic toxicology. However, this transformative technology is not without its challenges. While the underlying principles of ML are easy to understand, there are a lot of pitfalls to avoid ensuring that ML can actually improve results in forensic toxicology. There are so many easy-to-make mistakes that can cause an ML model to appear to perform well, when in reality it does not.</div><div>The most common pitfalls are: inadequate or non-representative training data, poor quality of data or overfitting and underfitting. It is of the utmost importance to correctly split datasets, train algorithms, and validate results. Another problem that severely impacts machine-learning algorithms is the curse of dimensionality, a phenomenon where the efficiency and effectiveness of algorithms deteriorate as the dimensionality of the data increases exponentially. Consequently, the skilled forensic toxicologist must employ dimensionality reduction techniques such as selection of the most relevant features from the original dataset while discarding irrelevant or redundant ones (feature selection). This reduces the dimensionality of the data, simplifying the model and improving its efficiency. One can also transform the original high-dimensional data into a lower-dimensional space by creating new features that capture the essential information (feature extraction). It also helps to scale the features to a similar range to prevent certain features from dominating others, especially in distance-based algorithms. To further ensure robustness in the model training process, missing data should be addressed appropriately through imputation or deletion.</div><div>Examples of successful implementation of ML in forensic toxicology: the combination of machine learning and (high-resolution) mass spectrometry offers incredible synergy that can be harnessed to optimize workflows by detection of sample adulteration, improve detection of difficult analyte groups (e.g. synthetic cannabinoid receptor agonists, SCRAs), and optimize processing of high-dimensional data sets. This approach can help with even the most complex problems in our field, such as detecting the effects of sleepiness on the metabolome and establishing biomarkers of sleepiness.</div></div>","PeriodicalId":23170,"journal":{"name":"Toxicologie Analytique et Clinique","volume":"37 1","pages":"Pages S14-S15"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Toxicologie Analytique et Clinique","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352007825000149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TOXICOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Giving a basic overview of principles of machine learning and its pitfalls together with real world successful examples. This should help improve technological literacy of machine learning within the forensic toxicologist community.
The demands on a forensic toxicologist are changing rapidly. In the past, it was sufficient to operate a GC-MS or LC-MS device with often extremely user-unfriendly software to obtain a result. Then the evaluation of a case could begin. However, as analytical instruments have become faster, more sensitive, versatile and powerful, forensic toxicology has evolved in parallel. This development has been accompanied by a rapid increase in the volume of data. This trend is particularly evident in high-resolution mass spectrometry and non-targeted search analysis, in which a large number of substances can be detected in complex biological samples. Forensic toxicologists are no longer interested only in prescription or illegal drugs, but in the totality of all small molecules in the human body (the so-called metabolome). Under certain circumstances, changes in the metabolome can provide clues to drug use, cause of death, drunk or even drowsy driving. It is obvious that these huge amounts of data can no longer be analyzed manually.
Machine learning (ML), a subfield of artificial intelligence, has proven to be extremely powerful and promising in tackling large, complex, and high-dimensional data sets. ML can make predictions, find patterns, or classify data. The three-machine learning types are supervised, unsupervised, and reinforcement learning. It has emerged over the last decade, and consists of many different learning algorithms (e.g. Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines, Naive Bayes and others). Currently, these algorithms are finding their way into forensic toxicology. However, this transformative technology is not without its challenges. While the underlying principles of ML are easy to understand, there are a lot of pitfalls to avoid ensuring that ML can actually improve results in forensic toxicology. There are so many easy-to-make mistakes that can cause an ML model to appear to perform well, when in reality it does not.
The most common pitfalls are: inadequate or non-representative training data, poor quality of data or overfitting and underfitting. It is of the utmost importance to correctly split datasets, train algorithms, and validate results. Another problem that severely impacts machine-learning algorithms is the curse of dimensionality, a phenomenon where the efficiency and effectiveness of algorithms deteriorate as the dimensionality of the data increases exponentially. Consequently, the skilled forensic toxicologist must employ dimensionality reduction techniques such as selection of the most relevant features from the original dataset while discarding irrelevant or redundant ones (feature selection). This reduces the dimensionality of the data, simplifying the model and improving its efficiency. One can also transform the original high-dimensional data into a lower-dimensional space by creating new features that capture the essential information (feature extraction). It also helps to scale the features to a similar range to prevent certain features from dominating others, especially in distance-based algorithms. To further ensure robustness in the model training process, missing data should be addressed appropriately through imputation or deletion.
Examples of successful implementation of ML in forensic toxicology: the combination of machine learning and (high-resolution) mass spectrometry offers incredible synergy that can be harnessed to optimize workflows by detection of sample adulteration, improve detection of difficult analyte groups (e.g. synthetic cannabinoid receptor agonists, SCRAs), and optimize processing of high-dimensional data sets. This approach can help with even the most complex problems in our field, such as detecting the effects of sleepiness on the metabolome and establishing biomarkers of sleepiness.