Maximum allele count (MAC) and total allele count (TAC) methods are widely used for estimating the number of contributors (NoC) of autosomal short tandem repeat (STR) profile in many forensic laboratories. In this study, we applied NoC estimation methods to mixed Y-STR profiles and evaluated its uncertainty and performance. For the MAC method, as recent Y-STR typing kits involve single- and multi-copy loci, we defined “MAC-single” for use across only single-copy loci and “MAC-multi” for use across only multi-copy loci. We generated a dataset containing 120,000 Y-STR profiles for a one to six-person mixture in silico based on previously reported haplotype frequencies of 27 Y-STR loci in Yfiler Plus for the U.S. population (reported by NIST) and the Henan Han population. The dataset was randomly split into a training set and a test set. The training set was used to construct a TAC distribution (TAC curve), whereas the test set was used to calculate the performance metrics (accuracy, precision, recall, and F1-score). In addition, the effect of the upper limit of NoC considered for estimation on overall accuracy was evaluated. The overall accuracies of MAC-single, MAC-multi, and TAC methods when the upper limit of NoC was set to six-person were 0.7920, 0.4329, and 0.7877 for the U.S. population and 0.8207, 0.4609, and 0.8385 for the Henan Han population. Our results suggest that the MAC-single and TAC methods can estimate the NoC for mixed Y-STR profiles with high levels of accuracy.
Identification of unidentified human remains (UHRs) is crucial yet challenging, especially with traditional forensic techniques. Forensic anthropological examinations can yield ancestry estimations; however, the utility of these estimates is limited by the data points that can be collected from partial remains, complexities of admixture, and variation of phenotypic expression due to environmental effects. While it is generally known that anthropological estimates can be imprecise, the performance of these methods has not been studied at scale. Genome-wide SNP testing is an orthogonal approach for estimating ancestry and offers a unique opportunity to measure the magnitude of anthropological ancestry misattribution. Genomic ancestry inference leverages principal component analysis (PCA) and model-based clustering approaches. This study compares anthropologically determined ancestry with those estimated using genome-wide SNP markers. A dataset of 611 UHR samples with publicly available ancestry assessments from National Missing and Unidentified Persons System (NamUs) was analyzed. The genetic ancestry approach, validated against reference population samples, offers robust ancestry calculations for major population groups. Inconsistency between anthropological and genomic ancestry assignments were observed, particularly for admixed populations. Although forensic anthropological examinations remain valuable, their limitations emphasize the need for refinement and enhancement through the augmentation of SNP-based analyses. Further validation studies are crucial to define the uncertainty associated with both anthropological and genome-based ancestry estimates to resolve cases and aid law enforcement investigations. Additionally, current policy and practices for reporting ancestry for UHRs should be revisited to reduce potential misinformation.
Short Tandem Repeats (STRs) are the most widespread markers in forensic genetics. However, STR stutter peaks can mask alleles from a minor contributor when analysing mixtures, hindering the interpretation of complex profiles. In this study we compared the performance of a previously described panel of microhaplotypes (MHs), an alternative type of forensic marker, against a standard STR kit. The parameters evaluated included: capability of determining the minimum number of contributors in the mixture; percentages of allele drop-outs and drop-ins; retrieval of alleles belonging to the minor contributor, and estimation of likelihood ratio (LR) values. In addition, the capacity of EuroForMix software to estimate each donor’s percentage of contribution was tested, as well as the impact on results when using manually, or automatically prepared libraries. The MH panel showed better performance than STRs for the detection of 2-contributor mixtures, but the lower degree of polymorphism per MH marker hindered the task of deconvolution with multiple contributors. MHs presented higher drop-in rates and lower drop-out rates, a higher capability to recover the minor contributor’s alleles and provided higher LR values than STRs, likely due to the much higher number of loci combined in the panel. Estimations of contributor ratios using EuroForMix showed promising results and marginal differences were found in these values between manually and automatically prepared libraries. Overall, results showed that the mixture detection performance of the MH panel was better or equal to the standard forensic autosomal STR panel, indicating microhaplotypes are informative markers for this purpose.
Shotgun sequencing is a DNA analysis method that potentially determines the nucleotide sequence of every DNA fragment in a sample, unlike PCR-based genotyping methods that is widely used in forensic genetics and targets predefined short tandem repeats (STRs) or predefined single nucleotide polymorphisms (SNPs). Shotgun DNA sequencing is particularly useful for highly degraded low-quality DNA samples, such as ancient samples or those from crime scenes. Here, we developed a statistical model for human identification using shotgun sequencing data and developed formulas for calculating the evidential weight as a likelihood ratio (). The model uses a dynamic set of binary SNP loci and takes the error rate from shotgun sequencing into consideration in a probabilistic manner. To our knowledge, the method is the first to make this possible. Results from replicated shotgun sequencing of buccal swabs (high-quality samples) and hair samples (low-quality samples) were arranged in a genotype-call confusion matrix to estimate the calling error probability by maximum likelihood and Bayesian inference. Different genotype quality filters may be applied to account for genotyping errors. An error probability of zero resulted in the commonly used formula for the weight of evidence. Error probabilities above zero reduced the contribution of matching genotypes and increased the in the case of a mismatch between the genotypes of the trace and the person of interest. In the latter scenario, the increased from zero (occurring when the error probability was zero) to low positive values, which allow for the possibility that the mismatch may be due to genotyping errors. We developed an open-source R package, wgsLR, which implements the method, including estimation of the calling error probability and calculation of values. The R package includes all formulas used in this paper and the functionalities to generate the formulas.
Domestic animals, such as cats and dogs, are present in the majority of Australian households. Recently, questions regarding the possibility that domestic animals can serve as silent witnesses, from whom evidence can be collected, or act as vectors of contamination and transfer, have started to be raised. Yet, little is known regarding the transfer and prevalence of human DNA to and from cats. This study investigated if cats are reservoirs and vectors for human DNA transfer. Twenty cats from 15 households were sampled from 4 different areas (head (fur), back (fur), left (skin) and right (fur)) to obtain information on the background DNA that may be found on an animal. Further, transfer of human DNA to and from an animal, after a short patting contact, was tested. Human DNA was found to be prevalent on all cats. Of the areas sampled, most DNA was collected from the top of the fur from the back followed by the head and right/fur. No or very low quantities of human DNA was recovered from the left (skin) area. Most of the human DNA originated from the owners, but DNA from others was also often present (47 % of samples). Further, the transfer tests demonstrated that human DNA transferred readily to (detected in 45 % of samples) and from (detected in 80 % of samples) cats during patting. These results show that animals can act as reservoirs of human DNA and vectors for human DNA transfer that may need to be considered during evaluative DNA reporting. Furthermore, if an interaction between an animal and a perpetrator is suspected, consideration should be given to collecting DNA evidence from suspected contact areas on an animal.
Minors (subjects under the legal age, established at this study at 18 years) benefit from a series of legal rights created to protect them and guarantee their welfare. However, throughout the world there are many minors who have no way to prove they are underaged, leading to a great interest in predicting legal age with the highest possible accuracy. Current methods, mainly involving X-ray analysis, are highly invasive, so new methods to predict legal age are being studied, such as DNA methylation. To further such studies, we created two age prediction models based on five epigenetic markers: cg21572722 (ELOVL2), cg02228185 (ASPA), cg06639320 (FHL2), cg19283806 (CCDC102B) and cg07082267 (no associated gene), that were analysed in blood samples to determine possible limitations regarding DNA methylation as an effective tool for legal age estimation. A wide age range prediction model was created using a broad set of samples (14–94 years) yielding a mean absolute error (MAE) of ±4.32 years. A second model, the constrained age prediction model, was created using a reduced range of samples (14–25 years) yielding an MAE of ±1.54 years. Both models, in addition to Horvath’s Skin & Blood epigenetic clock, were evaluated using a test set comprising 732 pairs of 18-year-old twins (N=426 monozygotic (MZ) and N=306 dizygotic (DZ) pairs), representing a relevant age of study. Through analysis of the two former age prediction models, we found that constraining the age of the samples forming the training set around the desired age of study significantly reduced the prediction error (from MAE: ±4.07 and ±4.27 years for MZ and DZ twins, respectively; to ±1.31 and ±1.3 years). However, despite low prediction errors, DNA methylation models are still prone to classify same-aged individuals in different categories (minors or adults), despite each sample belonging to the same twin pair. Additional evaluation of Horvath’s Skin & Blood model (391 CpGs) led to similar results in terms of age prediction errors than if using only five epigenetic markers (MAE: ±1.87 and ±1.99 years for MZ and DZ twins, respectively).
The identification of body fluids is an important area of forensic genetics. In particular, the susceptibility to degradation of casework samples is of crucial importance, as the traces can often be exposed to different environmental conditions over a long period of time. RNAs especially are used as molecular markers for the identification of body fluids in forensics. Messenger RNAs (mRNAs) show an increased susceptibility to degradation, e.g. under humidity and UV radiation but are highly body fluid-specific. The shorter micro RNAs (miRNAs), however, are less susceptible to degradation, but only a few body fluid-specific markers could be investigated. In this study, a self-developed mRNA/miRNA multiplex assay for capillary electrophoresis from a preliminary study was further adapted and validated. The approach was applied to casework samples, animal samples, and a storage study. The advantages and disadvantages of the mRNA/miRNA assay were investigated in order to review a possible application for forensic casework. Some miRNA markers were also detected in animal samples, which once again underlines the possible non-specificity of miRNAs. In the storage study, the different markers were detected for different lengths of time depending on the body fluid examined. For almost all body fluids, the miRNA markers were still detectable after a period of 35 days under environmental conditions compared to the mRNA markers. The mRNA peaks were often already clearly reduced or no longer detectable after 14 days. The results show the advantage of the new mRNA/miRNA assay compared to established mRNA approaches, especially for older and degraded samples, but the assay has its limitations due to the limited number of specific miRNA markers.
The unique features of the X chromosome can be crucial to complement autosomal profiling or to disentangle complex kinship problems, providing in some cases a similar or even greater power than autosomes in paternity/maternity investigations. While theoretical and informatics approaches for pairwise X-linked kinship analyses are well established for euploid individuals, these are still lacking for individuals with an X chromosome aneuploidy. To trigger the fulfilment of this gap, this research presents a mathematical framework that enables the quantification of DNA evidence in pairwise kinship analyses, involving two non-inbred individuals, one of whom with a non-mosaic X chromosome aneuploidy: Trisomy X (47, XXX), Klinefelter (47, XXY) or Turner (45, X0) syndrome. As previously developed for a regular number of chromosomes, this approach relies on the probability of related individuals sharing identical-by-descent (IBD) alleles at one specific locus and it can be applied to any set of independently transmitted markers, with no gametic association in the population. The kinship hypotheses mostly considered in forensic casework are specifically addressed in this work, but the reasoning and procedure can be applied to virtually any pairwise kinship problem under the referred assumptions. Algebraic formulae for joint genotypic probabilities cover all the possible genotypic configurations and pedigrees. Compared with the analyses assuming individuals with a regular number of chromosomes, complicating factors rely on the different possibilities for both the parental origin of the error (either maternal or paternal), and the type of error occurred (either meiotic or post-zygotic mitotic). These imply that a non-inbred female with Triple X or a male with Klinefelter syndrome may carry two IBD alleles at the same locus. Thus, and contrarily to what occurs for the standard case, IBD partitions depend not only on the kinship hypothesis under analysis but also on the genotypic configuration of the analyzed individuals. For some cases, parameters of interest can be inferred, while for others recommended values based on the available literature are provided. This work is the starting point to analyze X-chromosomal data under the scope of kinship problems, involving individuals with aneuploidies, as it will enhance the quantification of the DNA evidence not only in forensics but also in the medical genetics field. We hope it will trigger the development of approaches including other complicating factors, as a greater number of individuals, possibility of the occurrence of mutations and/or silent alleles, as well as the analysis of linked markers.
A pilot study was performed using two different DNA technology platforms conducted by two laboratories to analyze DNA extracted from 83-year-old, human male skeletal remains from 16 individuals, of which there are no other viable means to identify these war victims. The workflow of the more recent developed ForenSeq Kintelligence Kit and next generation sequencing was compared to that of the standard capillary electrophoresis – short tandem repeat (STR) method (Power Plex ESX17 and Y23 Systems). The findings indicate that greater amount of useful genetic data can be gained with the Kintelligence system across the range of samples under study and particularly for samples in which partial or no STR profiles are obtained. SNP data are more likely to be obtained from degraded samples, like the ones analyzed in this study. Moreover, high volume SNP data are suitable for long distance kinship associations and genetic genealogy databases to develop more investigative leads for future kinship and missing persons cases, a process not feasible by STR typing.