Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data

IF 3.2 2区 医学 Q2 GENETICS & HEREDITY Forensic Science International-Genetics Pub Date : 2024-05-22 DOI:10.1016/j.fsigen.2024.103061
Hyung-Eun An , Min-Ho Mun , Adeel Malik , Chang-Bae Kim
{"title":"Development of a two-layer machine learning model for the forensic application of legal and illegal poppy classification based on sequence data","authors":"Hyung-Eun An ,&nbsp;Min-Ho Mun ,&nbsp;Adeel Malik ,&nbsp;Chang-Bae Kim","doi":"10.1016/j.fsigen.2024.103061","DOIUrl":null,"url":null,"abstract":"<div><p>Poppies are beneficial plants with a variety of applications, including medicinal, edible, ornamental, and industrial purposes. Some <em>Papaver</em> species are forensically significant plants because they contain opium, a narcotic substance. Internationally trafficked species of illegal poppies are being identified by DNA barcoding employing multiple markers in response to their forensic value. However, effective markers for precise species identification of legal and illegal poppies are still under discussion, with research on illegal poppies focusing on <em>Papaver somniferum</em> L., and species identification studies of <em>Papaver bracteatum</em> and <em>Papaver setigerum</em> DC. still lacking. As a result, in order to evaluate the performance of genetic markers and classify their DNA sequences in the genus <em>Papaver</em>, this study developed the first machine learning-based two-layer model, in which the first layer classifies legal and illegal poppies from the given sequence and the second layer identifies species of illegal poppies using their sequences. We constructed the dataset and investigated biological features from four markers, internal transcribed spacer 1 (ITS1), internal transcribed spacer 2 (ITS2), transfer RNA Leucine (trnL), transfer RNA Leucine - transfer RNA Phenylalanine intergenic spacer (trnL–trnF intergenic spacer) and their combination, using four machine learning algorithms, K-nearest neighbor (KNN), Naïve Bayes (NB), extreme gradient boost (XGBoost) and Random Forest (RF). According to our findings, for Layer 1 to classify legal and illegal poppies, KNN-based models using combined ITS region achieved the greatest performance of accuracy 0.846 and 0.889 using training and test sets, respectively. Additionally, for Layer 2 to identify illegal poppy species, KNN-based models using combined ITS region achieved the best performance of 0.833 and 1.000 for using training and test sets, respectively. To validate the model, the combined ITS region, which includes ITS 1 and 2 sequences, from blind poppy samples were used as a case study, with the Layer 1 correctly classifying legal and illegal poppies with over 0.830 accuracy. Layer 2 correctly identified <em>P. setigerum</em> DC., however, only one of the three <em>P. somniferum</em> L. species was accurately identified. Nevertheless, our research shows that machine learning can be used to classify and identify legal and illegal poppy species using DNA barcodes which can then be used as an efficient and effective forensic tool for improved law enforcement and a safer society.</p></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1872497324000553/pdfft?md5=b07331f0615a4fbf6f5400482d9c7b44&pid=1-s2.0-S1872497324000553-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497324000553","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Poppies are beneficial plants with a variety of applications, including medicinal, edible, ornamental, and industrial purposes. Some Papaver species are forensically significant plants because they contain opium, a narcotic substance. Internationally trafficked species of illegal poppies are being identified by DNA barcoding employing multiple markers in response to their forensic value. However, effective markers for precise species identification of legal and illegal poppies are still under discussion, with research on illegal poppies focusing on Papaver somniferum L., and species identification studies of Papaver bracteatum and Papaver setigerum DC. still lacking. As a result, in order to evaluate the performance of genetic markers and classify their DNA sequences in the genus Papaver, this study developed the first machine learning-based two-layer model, in which the first layer classifies legal and illegal poppies from the given sequence and the second layer identifies species of illegal poppies using their sequences. We constructed the dataset and investigated biological features from four markers, internal transcribed spacer 1 (ITS1), internal transcribed spacer 2 (ITS2), transfer RNA Leucine (trnL), transfer RNA Leucine - transfer RNA Phenylalanine intergenic spacer (trnL–trnF intergenic spacer) and their combination, using four machine learning algorithms, K-nearest neighbor (KNN), Naïve Bayes (NB), extreme gradient boost (XGBoost) and Random Forest (RF). According to our findings, for Layer 1 to classify legal and illegal poppies, KNN-based models using combined ITS region achieved the greatest performance of accuracy 0.846 and 0.889 using training and test sets, respectively. Additionally, for Layer 2 to identify illegal poppy species, KNN-based models using combined ITS region achieved the best performance of 0.833 and 1.000 for using training and test sets, respectively. To validate the model, the combined ITS region, which includes ITS 1 and 2 sequences, from blind poppy samples were used as a case study, with the Layer 1 correctly classifying legal and illegal poppies with over 0.830 accuracy. Layer 2 correctly identified P. setigerum DC., however, only one of the three P. somniferum L. species was accurately identified. Nevertheless, our research shows that machine learning can be used to classify and identify legal and illegal poppy species using DNA barcodes which can then be used as an efficient and effective forensic tool for improved law enforcement and a safer society.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
开发基于序列数据的合法和非法罂粟分类法医应用双层机器学习模型
罂粟是一种有益植物,具有多种用途,包括药用、食用、观赏和工业用途。某些罂粟品种因含有麻醉物质鸦片而具有重要的法医学意义。鉴于非法罂粟的法医价值,国际贩运的非法罂粟物种正在通过采用多种标记的 DNA 条形码进行鉴定。然而,对合法罂粟和非法罂粟进行精确物种鉴定的有效标记仍在讨论之中,非法罂粟的研究主要集中在 Papaver somniferum L.,而对 Papaver bracteatum 和 Papaver setigerum DC.的物种鉴定研究仍然缺乏。因此,为了评估遗传标记的性能并对其在罂粟属中的 DNA 序列进行分类,本研究首次开发了基于机器学习的双层模型,其中第一层根据给定序列对合法和非法罂粟进行分类,第二层利用其序列识别非法罂粟的物种。我们构建了数据集,并研究了内部转录间隔序列 1(ITS1)、内部转录间隔序列 2(ITS2)、转运核糖核酸亮氨酸(trnL)这四个标记的生物学特征、转运 RNA 亮氨酸-转运 RNA 苯丙氨酸基因间距(trnL-trnF 基因间距)以及它们的组合,使用四种机器学习算法:K-近邻(KNN)、奈夫贝叶斯(NB)、极梯度提升(XGBoost)和随机森林(RF)。根据我们的研究结果,在第 1 层对合法和非法罂粟进行分类时,使用综合 ITS 区域的 KNN 模型在训练集和测试集上分别取得了 0.846 和 0.889 的最高准确率。此外,对于识别非法罂粟物种的第 2 层,基于 KNN 的模型使用组合 ITS 区域,在使用训练集和测试集时分别取得了 0.833 和 1.000 的最佳性能。为了验证该模型,我们使用了来自罂粟盲样的组合 ITS 区域(包括 ITS 1 和 2 序列)作为案例研究,第 1 层以超过 0.830 的准确率正确地对合法和非法罂粟进行了分类。第 2 层正确识别了 P. setigerum DC.,但在三个 P. somniferum L.物种中只有一个被准确识别。尽管如此,我们的研究表明,机器学习可用于利用 DNA 条形码对合法和非法罂粟品种进行分类和识别,然后将其作为一种高效和有效的法医工具,用于改进执法工作和提高社会安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
期刊最新文献
Phylogeography of Y-chromosome haplogroup I-P37.2 in Serbian population groups originating from distinct parts of the Balkan Peninsula A preliminary study on detecting human DNA in aquatic environments: Potential of eDNA in forensics Demonstration of potential DNA contamination introduced by laboratory consumables using Fluorescein Human identification of single hair shaft using a mass spectrometry mRNA typing system Large-scale selection of highly informative microhaplotypes for ancestry inference and population specific informativeness
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1