Single-cell investigative genetics: Single-cell data produces genotype distributions concentrated at the true genotype across all mixture complexities

IF 3.2 2区 医学 Q2 GENETICS & HEREDITY Forensic Science International-Genetics Pub Date : 2023-12-19 DOI:10.1016/j.fsigen.2023.103000
Catherine M. Grgicak , Qhawe Bhembe , Klaas Slooten , Nidhi C. Sheth , Ken R. Duffy , Desmond S. Lun
{"title":"Single-cell investigative genetics: Single-cell data produces genotype distributions concentrated at the true genotype across all mixture complexities","authors":"Catherine M. Grgicak ,&nbsp;Qhawe Bhembe ,&nbsp;Klaas Slooten ,&nbsp;Nidhi C. Sheth ,&nbsp;Ken R. Duffy ,&nbsp;Desmond S. Lun","doi":"10.1016/j.fsigen.2023.103000","DOIUrl":null,"url":null,"abstract":"<div><p><span><span>In the absence of a suspect the forensic aim is investigative, and the focus is one of discerning what genotypes best explain the evidence. In traditional systems, the list of candidate genotypes may become vast if the sample contains DNA from many donors or the information from a minor contributor is swamped by that of major contributors, leading to lower evidential value for a true donor’s contribution and, as a result, possibly overlooked or inefficient investigative leads. Recent developments in single-cell analysis offer a way forward, by producing data capable of discriminating genotypes. This is accomplished by first clustering single-cell data by similarity without reference to a known genotype. With good clustering it is reasonable to assume that the </span>scEPGs in a cluster are of a single contributor. With that assumption we determine the probability of a cluster’s content given each possible genotype at each locus, which is then used to determine the posterior probability mass distribution for all genotypes by application of Bayes’ rule. A decision criterion is then applied such that the sum of the ranked probabilities of all genotypes falling in the set is at least </span><span><math><mrow><mn>1</mn><mo>−</mo><mi>α</mi></mrow></math></span><span><span>. This is the credible genotype set and is used to inform database search criteria. Within this work we demonstrate the salience of single-cell analysis by performance testing a set of 630 previously constructed admixtures containing up to 5 donors of balanced and unbalanced contributions. We use scEPGs that were generated by isolating single cells, employing a direct-to-PCR extraction treatment, amplifying </span>STRs that are compliant with existing national databases and applying post-PCR treatments that elicit a detection limit of one DNA copy. We determined that, for these test data, 99.3% of the true genotypes are included in the 99.8% credible set, regardless of the number of donors that comprised the mixture. We also determined that the most probable genotype was the true genotype for 97% of the loci when the number of cells in a cluster was at least two. Since efficient investigative leads will be borne by posterior mass distributions that are narrow and concentrated at the true genotype, we report that, for this test set, 47,900 (86%) loci returned only one credible genotype and of these 47,551 (99%) were the true genotype. When determining the LR for true contributors, 91% of the clusters rendered LR&gt;10</span><sup>18</sup>, showing the potential of single-cell data to positively affect investigative reporting.</p></div>","PeriodicalId":50435,"journal":{"name":"Forensic Science International-Genetics","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Genetics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1872497323001758","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

In the absence of a suspect the forensic aim is investigative, and the focus is one of discerning what genotypes best explain the evidence. In traditional systems, the list of candidate genotypes may become vast if the sample contains DNA from many donors or the information from a minor contributor is swamped by that of major contributors, leading to lower evidential value for a true donor’s contribution and, as a result, possibly overlooked or inefficient investigative leads. Recent developments in single-cell analysis offer a way forward, by producing data capable of discriminating genotypes. This is accomplished by first clustering single-cell data by similarity without reference to a known genotype. With good clustering it is reasonable to assume that the scEPGs in a cluster are of a single contributor. With that assumption we determine the probability of a cluster’s content given each possible genotype at each locus, which is then used to determine the posterior probability mass distribution for all genotypes by application of Bayes’ rule. A decision criterion is then applied such that the sum of the ranked probabilities of all genotypes falling in the set is at least 1α. This is the credible genotype set and is used to inform database search criteria. Within this work we demonstrate the salience of single-cell analysis by performance testing a set of 630 previously constructed admixtures containing up to 5 donors of balanced and unbalanced contributions. We use scEPGs that were generated by isolating single cells, employing a direct-to-PCR extraction treatment, amplifying STRs that are compliant with existing national databases and applying post-PCR treatments that elicit a detection limit of one DNA copy. We determined that, for these test data, 99.3% of the true genotypes are included in the 99.8% credible set, regardless of the number of donors that comprised the mixture. We also determined that the most probable genotype was the true genotype for 97% of the loci when the number of cells in a cluster was at least two. Since efficient investigative leads will be borne by posterior mass distributions that are narrow and concentrated at the true genotype, we report that, for this test set, 47,900 (86%) loci returned only one credible genotype and of these 47,551 (99%) were the true genotype. When determining the LR for true contributors, 91% of the clusters rendered LR>1018, showing the potential of single-cell data to positively affect investigative reporting.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单细胞研究遗传学:单细胞数据产生的基因型分布集中在所有复杂混合物的真实基因型上
在没有嫌疑人的情况下,法医的目的是调查,重点是辨别哪些候选基因型最能解释证据。在传统系统中,如果样本包含来自许多捐献者的 DNA,或来自次要捐献者的信息被主要捐献者的信息所淹没,那么候选基因型列表可能会变得非常庞大,从而导致真正捐献者的证据价值降低,结果可能是调查线索被忽视或效率低下。要做到这一点,首先要在不参考已知基因型的情况下根据相似性对单细胞数据进行聚类。在聚类效果良好的情况下,我们可以合理地假定聚类中的 scEPGs 都是单一的贡献者。根据这一假设,我们可以确定在每个基因位点的每种可能基因型下,聚类内容的概率,然后应用贝叶斯法则确定所有基因型的后验概率质量分布。然后应用一个决策标准,使属于该集合的所有基因型的排序概率之和至少为 1-α。在这项工作中,我们通过对一组 630 个先前构建的混合物进行性能测试,证明了单细胞分析的显著性,这组混合物包含多达 5 个均衡和不均衡贡献的供体。我们使用的 scEPGs 是通过分离单细胞、采用直接到 PCR 提取处理、扩增符合现有国家数据库的 STR 并应用 PCR 后处理生成的,其检测限为一个 DNA 拷贝。我们确定,对于这些测试数据,99.3% 的真实基因型都包含在 99.8% 可信度集合中,而与混合物中供体的数量无关。我们还确定,当一个群组中的细胞数至少为两个时,97% 的基因位点的最可能基因型就是真实基因型。由于有效的调查线索将由真实基因型窄且集中的后质量分布承担,我们报告说,在这个测试集中,有 47,900 个位点(86%)只返回一个可信的基因型,其中 47,551 个位点(99%)是真实基因型。在确定真实贡献者的 logLR 时,91% 的聚类呈现 logLR>1018,显示了单细胞数据对调查报告产生积极影响的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.50
自引率
32.30%
发文量
132
审稿时长
11.3 weeks
期刊介绍: Forensic Science International: Genetics is the premier journal in the field of Forensic Genetics. This branch of Forensic Science can be defined as the application of genetics to human and non-human material (in the sense of a science with the purpose of studying inherited characteristics for the analysis of inter- and intra-specific variations in populations) for the resolution of legal conflicts. The scope of the journal includes: Forensic applications of human polymorphism. Testing of paternity and other family relationships, immigration cases, typing of biological stains and tissues from criminal casework, identification of human remains by DNA testing methodologies. Description of human polymorphisms of forensic interest, with special interest in DNA polymorphisms. Autosomal DNA polymorphisms, mini- and microsatellites (or short tandem repeats, STRs), single nucleotide polymorphisms (SNPs), X and Y chromosome polymorphisms, mtDNA polymorphisms, and any other type of DNA variation with potential forensic applications. Non-human DNA polymorphisms for crime scene investigation. Population genetics of human polymorphisms of forensic interest. Population data, especially from DNA polymorphisms of interest for the solution of forensic problems. DNA typing methodologies and strategies. Biostatistical methods in forensic genetics. Evaluation of DNA evidence in forensic problems (such as paternity or immigration cases, criminal casework, identification), classical and new statistical approaches. Standards in forensic genetics. Recommendations of regulatory bodies concerning methods, markers, interpretation or strategies or proposals for procedural or technical standards. Quality control. Quality control and quality assurance strategies, proficiency testing for DNA typing methodologies. Criminal DNA databases. Technical, legal and statistical issues. General ethical and legal issues related to forensic genetics.
期刊最新文献
TigerBase: A DNA registration system to enhance enforcement and compliance testing of captive tiger facilities Editorial Board Dense SNP-based analyses complement forensic anthropology biogeographical ancestry assessments Uncertainty in the number of contributor estimation methods applied to a Y-STR profile Shotgun DNA sequencing for human identification: Dynamic SNP selection and likelihood ratio calculations accounting for errors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1