从 DNA 混合物中获得的 "低 "LRs：关于概率基因分型软件的校准和分辨性能。

IF 3.2 2区医学 Q2 GENETICS & HEREDITY Forensic Science International-Genetics Pub Date : 2024-07-27 DOI:10.1016/j.fsigen.2024.103099

M. McCarthy-Allen , Ø. Bleka , R. Ypma , P. Gill , C. Benschop

{"title":"从 DNA 混合物中获得的 \"低 \"LRs：关于概率基因分型软件的校准和分辨性能。","authors":"M. McCarthy-Allen , Ø. Bleka , R. Ypma , P. Gill , C. Benschop","doi":"10.1016/j.fsigen.2024.103099","DOIUrl":null,"url":null,"abstract":"<div><p>The validity of a probabilistic genotyping (PG) system is typically demonstrated by following international guidelines for the developmental and internal validation of PG software. These guidelines mainly focus on discriminatory power. Very few studies have reported with metrics that depend on calibration of likelihood ratio (LR) systems. In this study, discriminatory power as well as various calibration metrics, such as Empirical Cross-Entropy (ECE) plots, pool adjacent violator (PAV) plots, log likelihood ratio cost (Cllr and Cllr<sup><em>cal</em></sup>), fiducial calibration discrepancy plots, and Turing’ expectation were examined using the publicly-available PROVEDIt dataset. The aim was to gain deeper insight into the performance of a variety of PG software in the ‘lower’ LR ranges (∼LR 1–10,000), with focus on DNAStatistX and EuroForMix which use maximum likelihood estimation (MLE). This may be a driving force for the end users to reconsider current LR thresholds for reporting. In previous studies, overstated ‘low’ LRs were observed for these PG software. However, applying (arbitrarily) high LR thresholds for reporting wastes relevant evidential value. This study demonstrates, based on calibration performance, that previously reported LR thresholds can be lowered or even discarded. Considering LRs >1, there was no evidence for miscalibration performance above LR ∼1000 when using Fst 0.01. Below this LR value, miscalibration was observed. Calibration performance generally improved with the use of Fst 0.03, but the extent of this was dependent on the dataset: results ranged from miscalibration up to LR ∼100 to no evidence of miscalibration alike PG software using different methods to model peak height, HMC and STRmix. This study demonstrates that practitioners using MLE-based models should be careful when low LR ranges are reported, though applying arbitrarily high LR thresholds is discouraged. This study also highlights various calibration metrics that are useful in understanding the performance of a PG system.</p></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":"73 ","pages":"Article 103099"},"PeriodicalIF":3.2000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"‘Low’ LRs obtained from DNA mixtures: On calibration and discrimination performance of probabilistic genotyping software\",\"authors\":\"M. McCarthy-Allen , Ø. Bleka , R. Ypma , P. Gill , C. Benschop\",\"doi\":\"10.1016/j.fsigen.2024.103099\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The validity of a probabilistic genotyping (PG) system is typically demonstrated by following international guidelines for the developmental and internal validation of PG software. These guidelines mainly focus on discriminatory power. Very few studies have reported with metrics that depend on calibration of likelihood ratio (LR) systems. In this study, discriminatory power as well as various calibration metrics, such as Empirical Cross-Entropy (ECE) plots, pool adjacent violator (PAV) plots, log likelihood ratio cost (Cllr and Cllr<sup><em>cal</em></sup>), fiducial calibration discrepancy plots, and Turing’ expectation were examined using the publicly-available PROVEDIt dataset. The aim was to gain deeper insight into the performance of a variety of PG software in the ‘lower’ LR ranges (∼LR 1–10,000), with focus on DNAStatistX and EuroForMix which use maximum likelihood estimation (MLE). This may be a driving force for the end users to reconsider current LR thresholds for reporting. In previous studies, overstated ‘low’ LRs were observed for these PG software. However, applying (arbitrarily) high LR thresholds for reporting wastes relevant evidential value. This study demonstrates, based on calibration performance, that previously reported LR thresholds can be lowered or even discarded. Considering LRs >1, there was no evidence for miscalibration performance above LR ∼1000 when using Fst 0.01. Below this LR value, miscalibration was observed. Calibration performance generally improved with the use of Fst 0.03, but the extent of this was dependent on the dataset: results ranged from miscalibration up to LR ∼100 to no evidence of miscalibration alike PG software using different methods to model peak height, HMC and STRmix. This study demonstrates that practitioners using MLE-based models should be careful when low LR ranges are reported, though applying arbitrarily high LR thresholds is discouraged. This study also highlights various calibration metrics that are useful in understanding the performance of a PG system.</p></div>\",\"PeriodicalId\":50435,\"journal\":{\"name\":\"Forensic Science International-Genetics\",\"volume\":\"73 \",\"pages\":\"Article 103099\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Forensic Science International-Genetics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1872497324000954\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497324000954","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

摘要

概率基因分型（PG）系统的有效性通常通过遵循 PG 软件开发和内部验证的国际准则来证明。这些指南主要关注判别能力。很少有研究报告使用依赖于似然比（LR）系统校准的指标。在本研究中，我们使用公开的 PROVEDIt 数据集检验了判别能力和各种校准指标，如经验交叉熵（ECE）图、池相邻违规者（PAV）图、对数似然比成本（Cllr 和 Cllrcal）、fiducial 校准差异图和图灵期望。目的是更深入地了解各种 PG 软件在 "较低 "LR 范围（∼LR 1-10,000）内的性能，重点是使用最大似然估计（MLE）的 DNAStatistX 和 EuroForMix。这可能会促使最终用户重新考虑当前的 LR 报告阈值。在以前的研究中，这些 PG 软件的 LR 被夸大为 "低"。然而，应用（任意的）高 LR 阈值进行报告会浪费相关的证据价值。本研究根据校准结果表明，以前报告的 LR 阈值可以降低甚至放弃。当使用 Fst 0.01 时，考虑到 LR >1 时，没有证据表明 LR ∼1000 以上会出现误判。低于此 LR 值时，则会出现误校准。使用 Fst 0.03 时，校准性能普遍提高，但提高的程度取决于数据集：结果从 LR ∼ 100 时的校准失准到 PG 软件使用不同方法对峰高、HMC 和 STRmix 建模时的无校准失准证据不等。本研究表明，使用基于 MLE 模型的从业人员在报告低 LR 范围时应小心谨慎，但不鼓励应用任意高的 LR 阈值。本研究还强调了有助于了解 PG 系统性能的各种校准指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

‘Low’ LRs obtained from DNA mixtures: On calibration and discrimination performance of probabilistic genotyping software

The validity of a probabilistic genotyping (PG) system is typically demonstrated by following international guidelines for the developmental and internal validation of PG software. These guidelines mainly focus on discriminatory power. Very few studies have reported with metrics that depend on calibration of likelihood ratio (LR) systems. In this study, discriminatory power as well as various calibration metrics, such as Empirical Cross-Entropy (ECE) plots, pool adjacent violator (PAV) plots, log likelihood ratio cost (Cllr and Cllr^cal), fiducial calibration discrepancy plots, and Turing’ expectation were examined using the publicly-available PROVEDIt dataset. The aim was to gain deeper insight into the performance of a variety of PG software in the ‘lower’ LR ranges (∼LR 1–10,000), with focus on DNAStatistX and EuroForMix which use maximum likelihood estimation (MLE). This may be a driving force for the end users to reconsider current LR thresholds for reporting. In previous studies, overstated ‘low’ LRs were observed for these PG software. However, applying (arbitrarily) high LR thresholds for reporting wastes relevant evidential value. This study demonstrates, based on calibration performance, that previously reported LR thresholds can be lowered or even discarded. Considering LRs >1, there was no evidence for miscalibration performance above LR ∼1000 when using Fst 0.01. Below this LR value, miscalibration was observed. Calibration performance generally improved with the use of Fst 0.03, but the extent of this was dependent on the dataset: results ranged from miscalibration up to LR ∼100 to no evidence of miscalibration alike PG software using different methods to model peak height, HMC and STRmix. This study demonstrates that practitioners using MLE-based models should be careful when low LR ranges are reported, though applying arbitrarily high LR thresholds is discouraged. This study also highlights various calibration metrics that are useful in understanding the performance of a PG system.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Forensic Science International-Genetics 生物-医学：法

CiteScore

7.50

自引率

32.30%

发文量

132

审稿时长

11.3 weeks

期刊介绍： Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.