{"title":"提交人的答复","authors":"William C. Thompson JD, PhD","doi":"10.1111/1556-4029.15519","DOIUrl":null,"url":null,"abstract":"<p>Editor,</p><p>The article I published in February 2023 discussed a case in which two probabilistic genotyping (PG) programs were used to analyze the same DNA mixture using the same data file [<span>1</span>]. The mixture was found on a plastic bag containing illegal drugs. Whether the defendant was a contributor to this DNA mixture became an issue in the case.</p><p>Answering this question posed a technical challenge because there were at least two contributors and the total quantity of DNA in the mixture was only 92 pg. The major contributor was female, which ruled out the male defendant, so the key question is whether the defendant could have been a minor contributor. The amount of DNA from the minor contributor (or contributors) was very low. An analysis of peak heights suggested at least a 4:1 ratio between the major and minor contributor(s). The quantity of male DNA in the mixture was estimated to be only 6.9 pg.</p><p>Two different PG programs, STRmix™ and TrueAllele® (TA), were used to compare the defendant's DNA profile to the mixture. Both programs produced exculpatory findings supporting the hypothesis that the defendant was not a contributor, although the strength of support differed dramatically: TA produced likelihood ratios (LRs) as high as 16.7 million, whereas STRMix produced LRs ranging from 24 to 5 [<span>1</span>]. My article discussed differences between the programs that might explain the different LRs and questioned whether any such findings are sufficiently trustworthy and reliable to be used in court. It also questioned the way these findings were reported.</p><p>Individuals associated with both STRMix [<span>2</span>] and TA [<span>3</span>] have now responded to my article. They presented new data that helped explain why the LRs produced by the programs were so different. The difference arose largely from the use of different analytic thresholds: TA took account of many low-level (<40 rfu) peaks that were ignored by STRMix. I commend both groups for doing empirical studies to help explain why the two programs produced such different findings.</p><p>It is still unclear, however, which of the reported findings is more trustworthy, or indeed whether either should be trusted. While it is now clear that the LRs produced by STRMix were less extreme because the analyst applied an analytic threshold, those who responded to my article appear to disagree about whether such a threshold is necessary or helpful. The group led by John Buckleton, one of the creators of STRMix, expressed concern about reliance on low-level peaks: “Most of us are wary of very low peak heights. This feeling of discomfort is developed from a large body of experience noting the pernicious effects of artifacts that pass the analysis stage” [<span>2</span>]. Whether a lower threshold increases or decreases accuracy can only be determined, they argue, by testing the accuracy of the PG program across a range of ATs with known samples of the type in question. I agree with this assessment and I believe that a key source of uncertainty about the value of PG results in the case I discussed is that relatively little research of this type has been done.</p><p>By contrast, the group led by Mark Perlin, one of the creators of TA, sees no need for analytic thresholds, saying that “TA is a fully Bayesian system capable of looking at all the peak data…” [<span>3</span>]. While I certainly accept that TA is capable of producing LRs based on low-level data, I question whether such results are always reliable and trustworthy. Computer scientists are familiar with the expression “garbage in-garbage out.” It is unclear whether, and at what point, the LRs produced by TA become garbage because the low-level peaks fed into the program are unreliable. As John Butler has emphasized: “a primary purpose for validation studies … is to push the system until it fails in order to understand the potential limitations” [<span>4</span>]. While I acknowledged that existing validation studies show that TA works well under a broad range of circumstances [<span>1</span>], I also pointed out that there has been relatively little research testing the accuracy of TA for identifying low-level mixture contributors like the secondary contributor to the sample discussed in my article.</p><p>Moreover, some of that research suggests that TA is NOT reliable for such samples. For example, my article cited research showing that TA produced falsely exculpatory LRs for some known contributors, including the majority of individuals who contributed less than 10 pg to a mixture in one study [<span>5</span>]. Perlin et al. [<span>3</span>] do not respond to this evidence; they offer no arguments for ignoring it. Instead, they simply say that I cited only three of “eight peer-reviewed studies validating TA interpretations for mixtures containing 2 to 10 unknown contributors” [<span>3</span>].</p><p>Based on this statement a casual reader might assume that the eight validation studies Perlin et al. [<span>3</span>] cited actually tested TA's accuracy for low-level samples like the one discussed in my article and found that TA was accurate. That would be a mistake. These articles do not establish the trustworthiness of findings like the one discussed in my article. In fact, they establish the opposite. They show that exculpatory results of this type are often NOT accurate and hence cannot be trusted.</p><p>Three of the eight validation studies cited by Perlin et al. [<span>3</span>] are unhelpful for assessing the accuracy of TA because they involve reanalysis of casework samples for which ground truth regarding the identity of contributors is uncertain [<span>6-8</span>]; a fourth study [<span>9</span>] examined the accuracy of a different analytical technique than the one used in the case I discussed. So, aside from the study by Greenspoon et al. [<span>5</span>], which I discussed in my article, there are three studies that might provide additional information on whether TA results like the one discussed in my article are trustworthy. These three studies examined TA results for mixtures with known contributors where at least some of the contributors accounted for a very low quantity of DNA.</p><p>A key issue I will consider when discussing these studies is the “sensitivity” of TA—that is, the percentage of cases in which TA produces an LR that supports the hypothesis that a known contributor was indeed a contributor. That means an LR greater than one, or a LogLR greater than zero. I will also consider the false exculpation rate, which is the rate at which TA produces LRs for known contributors that incorrectly support the non-contributor hypothesis. “Sensitivity” and the false exculpation rate are complementary—if “sensitivity” is 90% then the false exculpation rate is 10%. The false exculpation rate falls to zero only when “sensitivity” is 100%.</p><p>One of the validation studies cited by Perlin et al. [<span>3</span>] examined the LRs that TA assigned to known contributors to mixtures containing up to 10 individuals [<span>10</span>]. It reported “sensitivity” of 92%, which means that 92% of the LogLRs assigned to true contributors were positive. This also means, of course, that 8% of the assigned LogLRs were negative and hence falsely exculpatory. According to the authors, known individuals who contributed more than 100 pg of DNA to the mixture were always assigned positive LogLRs. For this group “sensitivity” was 100%. Consequently, the falsely exculpatory LRs must have been concentrated among donors who contributed less than 100 pg. However, the authors do not report their findings in a manner that allows a breakdown of the rate of false exculpations against the quantity of DNA in the mixed sample, nor the amount contributed by particular donors.</p><p>Fortunately, another TA validation study allows exactly such a breakdown [<span>10</span>]. It examined the LRs that TA assigned to known contributors in laboratory-synthesized mixtures of 2, 3, 4 or 5 individuals. This study shows that TA frequently assigns falsely exculpatory LRs to known contributors and that this problem is particularly acute when there are more contributors and when the quantity of DNA is lower.</p><p>To help readers understand the rate of false exculpations in this study [<span>11</span>] I have prepared tables showing those findings. Table 1 shows the number, proportion and percentage of known contributors to whom TA assigned falsely exculpatory LRs (LogLR < 0) broken down by the total quantity of DNA in the mixture (1 ng or 200 pg) and the number of contributors. The data presented in Table 1 are taken entirely from figs 4 and 5 of [<span>11</span>].</p><p>Table 1 shows, for example, that TA computed 20 LRs for known contributors to two-person mixtures containing 1 ng of DNA, and none of those LRs was falsely exculpatory. By comparison, TA computed 40 LRs for known contributors to 4-person mixtures that contained 200 pg of DNA, and 15 of those LRs, or 37.5% were falsely exculpatory.</p><p>Table 2 shows additional evidence from the same study. It is a breakdown of the rate of falsely exculpatory findings by mixture weight, showing that the risk of a falsely exculpatory finding increases rapidly as the percentage of DNA the individual contributed to the mixture decreases.</p><p>The third validation study cited by Perlin et al. [<span>12</span>] examined TA results for 40 laboratory-prepared known-source two-person mixtures (with one male and one female contributor). For most of these samples, the quantity of DNA contributed by the male donor greatly exceeds the quantity of male DNA in the mixture analyzed in the case discussed in my article. For two of these samples, however, the total quantity of DNA was only 125 pg and the male donor contributed only 10% of that (approximately 12.5 pg). For one of those samples, a TA analysis that assumed both contributors are unknown produced an incriminating LR; for the other TA produced a falsely exculpatory LR.</p><p>To put these data in context, recall that the total amount of DNA in the mixture discussed in my article was only 92 pg and that the minor contributor accounted for, at best, about 25% of this amount, and perhaps much less. The findings reported by Perlin et al. [<span>11</span>] make it clear that TA often produces falsely exculpatory LRs for contributors of this type. Moreover, these falsely exculpatory LRs are often of the same magnitude as those produced by TA in that case (see tab. 6 and figs 4 and 5 of Perlin et al. [<span>11</span>]). Under these circumstances it is beyond my understanding how Perlin and his colleagues can argue that exculpatory results like those discussed in my article are trustworthy.</p><p>Given the high rates of falsely exculpatory findings that TA has been shown to produce for low-level DNA contributors, I think it is very clear that findings like the one discussed in my article cannot pass muster under the <i>Daubert</i> standard and should be inadmissible. Furthermore, I think it is irresponsible for forensic scientists to present such findings in court. As mentioned in my article, forensic scientists need to know “when to punt”—that is, “when to decline the opportunity to move forward with questionable or problematic evidence” [<span>1</span>].</p><p>In response to my suggestion that forensic scientists refrain from presenting such problematic findings, Perlin and his colleagues respond with what sounds like an advertising slogan—TA always allows forensic scientists to “go for the goal.” It is difficult for me to see how that could be true in cases like the one I discussed unless the goal is helping guilty defendants escape justice. My article called on forensic scientists to establish standards or guidelines for when such evidence is reliable enough to be used in a legal proceeding. The response from Perlin et al. [<span>3</span>] shows why such standards are necessary.</p><p>My article raised a number of additional concerns about TA that Perlin and his colleagues failed to address in any meaningful way. I pointed out, for example, that the mixture weights assigned by TA differ from those assigned by STRMix, and are difficult to square with the findings of the biological assay. The Amelogenin findings, estimates of the quantity of male DNA relative to total DNA, and examination of peak heights at loci with four detectable peaks all point to a mixture percentage for the male contributor (or contributors) far lower than TA's estimate [<span>1</span>]. In response, Perlin et al. say only that the TA results are more trustworthy because TA examines more of the data, including low-level peaks. In the absence of empirical evidence that TA is more accurate than other method for assigning mixture percentages, this argument is vacuous and circular.</p><p>I also raised concerns about the assumption that the mixture in question had two and only two contributors, rather than a higher number. The validation studies cited by Perlin et al. [<span>10, 11</span>] show convincingly that DNA analysts often underestimate the number of contributors when dealing with mixtures like the one discussed in my article. That may be an important reason why TA assigns a falsely exculpatory LR to so many low-level known contributors. As I explained in my article “…a true donor who is a minor contributor may be assigned an exculpatory LR because his genotype, in combination with the genotype of the primary donor, is a poor fit with the observed data under the assumption of two contributors. But the poor fit may occur because the mixture also contains the DNA of an unrecognized third donor who is responsible for some of the observed peaks” [<span>1</span>]. Perlin et al. [<span>1</span>] offer no response to this argument and no explanation for why LRs were reported only under the assumption of two contributors when the possibility of a higher number cannot be ruled out. In my article [<span>1</span>], and in my response to Buckleton and his colleagues [<span>13</span>], I argued that the forensic science and legal communities should pay more attention to the way laboratories handle uncertainty about such matters as the number of contributors to a DNA mixture, particularly when the results reported depend critically on assumptions that may be incorrect. This is another issue on which I believe forensic scientists need guidelines and standards.</p><p>Finally, I argued that Cybergenetics (the company that markets TA) has been presenting TA LRs in an extraordinarily biased and misleading manner in reports and testimony. I explained carefully why the statement about LRs that Cybergenetics uses as boilerplate is likely to be misinterpreted. I called on Cybergenetics to “immediately cease using this misleading language and find a better way to explain its findings” [<span>1</span>]. As I have explained elsewhere [<span>14</span>], I believe that there are two key requirements for reporting language in forensic science. First, the language must be justifiable scientifically—it must be technically correct. Second, it must communicate effectively with a lay audience—it must be the kind of statement that lay people will understand correctly. Whatever its scientific merits, I believe that Cybergenetics' LR statement fails badly on the second criterion. It is very likely to be misinterpreted. Perlin et al. respond by arguing that their statement is technically correct (at least as they interpret it), but they do not address my major concern about the potential for misinterpretation. Whether statements like the one Cybergenetics is using are adequate and appropriate is yet another issue where guidelines and standards from the forensic science community would be helpful.</p><p>Indeed, a major lesson from my article, and the discussion that has followed, is the need for organizations like OSAC and SWGDAM to address such issues. To expect competing for-profit companies to refrain from overclaiming and to fully disclose all uncertainties surrounding their findings is apparently expecting too much. To expect courts to regulate these matters as part of their review of admissibility apparently is also expecting too much [<span>15</span>]. If these matters are to be addressed at all, they will need to be addressed by the forensic science community through the standards development process.</p>","PeriodicalId":15743,"journal":{"name":"Journal of forensic sciences","volume":null,"pages":null},"PeriodicalIF":1.5000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1556-4029.15519","citationCount":"0","resultStr":"{\"title\":\"Author's response\",\"authors\":\"William C. Thompson JD, PhD\",\"doi\":\"10.1111/1556-4029.15519\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Editor,</p><p>The article I published in February 2023 discussed a case in which two probabilistic genotyping (PG) programs were used to analyze the same DNA mixture using the same data file [<span>1</span>]. The mixture was found on a plastic bag containing illegal drugs. Whether the defendant was a contributor to this DNA mixture became an issue in the case.</p><p>Answering this question posed a technical challenge because there were at least two contributors and the total quantity of DNA in the mixture was only 92 pg. The major contributor was female, which ruled out the male defendant, so the key question is whether the defendant could have been a minor contributor. The amount of DNA from the minor contributor (or contributors) was very low. An analysis of peak heights suggested at least a 4:1 ratio between the major and minor contributor(s). The quantity of male DNA in the mixture was estimated to be only 6.9 pg.</p><p>Two different PG programs, STRmix™ and TrueAllele® (TA), were used to compare the defendant's DNA profile to the mixture. Both programs produced exculpatory findings supporting the hypothesis that the defendant was not a contributor, although the strength of support differed dramatically: TA produced likelihood ratios (LRs) as high as 16.7 million, whereas STRMix produced LRs ranging from 24 to 5 [<span>1</span>]. My article discussed differences between the programs that might explain the different LRs and questioned whether any such findings are sufficiently trustworthy and reliable to be used in court. It also questioned the way these findings were reported.</p><p>Individuals associated with both STRMix [<span>2</span>] and TA [<span>3</span>] have now responded to my article. They presented new data that helped explain why the LRs produced by the programs were so different. The difference arose largely from the use of different analytic thresholds: TA took account of many low-level (<40 rfu) peaks that were ignored by STRMix. I commend both groups for doing empirical studies to help explain why the two programs produced such different findings.</p><p>It is still unclear, however, which of the reported findings is more trustworthy, or indeed whether either should be trusted. While it is now clear that the LRs produced by STRMix were less extreme because the analyst applied an analytic threshold, those who responded to my article appear to disagree about whether such a threshold is necessary or helpful. The group led by John Buckleton, one of the creators of STRMix, expressed concern about reliance on low-level peaks: “Most of us are wary of very low peak heights. This feeling of discomfort is developed from a large body of experience noting the pernicious effects of artifacts that pass the analysis stage” [<span>2</span>]. Whether a lower threshold increases or decreases accuracy can only be determined, they argue, by testing the accuracy of the PG program across a range of ATs with known samples of the type in question. I agree with this assessment and I believe that a key source of uncertainty about the value of PG results in the case I discussed is that relatively little research of this type has been done.</p><p>By contrast, the group led by Mark Perlin, one of the creators of TA, sees no need for analytic thresholds, saying that “TA is a fully Bayesian system capable of looking at all the peak data…” [<span>3</span>]. While I certainly accept that TA is capable of producing LRs based on low-level data, I question whether such results are always reliable and trustworthy. Computer scientists are familiar with the expression “garbage in-garbage out.” It is unclear whether, and at what point, the LRs produced by TA become garbage because the low-level peaks fed into the program are unreliable. As John Butler has emphasized: “a primary purpose for validation studies … is to push the system until it fails in order to understand the potential limitations” [<span>4</span>]. While I acknowledged that existing validation studies show that TA works well under a broad range of circumstances [<span>1</span>], I also pointed out that there has been relatively little research testing the accuracy of TA for identifying low-level mixture contributors like the secondary contributor to the sample discussed in my article.</p><p>Moreover, some of that research suggests that TA is NOT reliable for such samples. For example, my article cited research showing that TA produced falsely exculpatory LRs for some known contributors, including the majority of individuals who contributed less than 10 pg to a mixture in one study [<span>5</span>]. Perlin et al. [<span>3</span>] do not respond to this evidence; they offer no arguments for ignoring it. Instead, they simply say that I cited only three of “eight peer-reviewed studies validating TA interpretations for mixtures containing 2 to 10 unknown contributors” [<span>3</span>].</p><p>Based on this statement a casual reader might assume that the eight validation studies Perlin et al. [<span>3</span>] cited actually tested TA's accuracy for low-level samples like the one discussed in my article and found that TA was accurate. That would be a mistake. These articles do not establish the trustworthiness of findings like the one discussed in my article. In fact, they establish the opposite. They show that exculpatory results of this type are often NOT accurate and hence cannot be trusted.</p><p>Three of the eight validation studies cited by Perlin et al. [<span>3</span>] are unhelpful for assessing the accuracy of TA because they involve reanalysis of casework samples for which ground truth regarding the identity of contributors is uncertain [<span>6-8</span>]; a fourth study [<span>9</span>] examined the accuracy of a different analytical technique than the one used in the case I discussed. So, aside from the study by Greenspoon et al. [<span>5</span>], which I discussed in my article, there are three studies that might provide additional information on whether TA results like the one discussed in my article are trustworthy. These three studies examined TA results for mixtures with known contributors where at least some of the contributors accounted for a very low quantity of DNA.</p><p>A key issue I will consider when discussing these studies is the “sensitivity” of TA—that is, the percentage of cases in which TA produces an LR that supports the hypothesis that a known contributor was indeed a contributor. That means an LR greater than one, or a LogLR greater than zero. I will also consider the false exculpation rate, which is the rate at which TA produces LRs for known contributors that incorrectly support the non-contributor hypothesis. “Sensitivity” and the false exculpation rate are complementary—if “sensitivity” is 90% then the false exculpation rate is 10%. The false exculpation rate falls to zero only when “sensitivity” is 100%.</p><p>One of the validation studies cited by Perlin et al. [<span>3</span>] examined the LRs that TA assigned to known contributors to mixtures containing up to 10 individuals [<span>10</span>]. It reported “sensitivity” of 92%, which means that 92% of the LogLRs assigned to true contributors were positive. This also means, of course, that 8% of the assigned LogLRs were negative and hence falsely exculpatory. According to the authors, known individuals who contributed more than 100 pg of DNA to the mixture were always assigned positive LogLRs. For this group “sensitivity” was 100%. Consequently, the falsely exculpatory LRs must have been concentrated among donors who contributed less than 100 pg. However, the authors do not report their findings in a manner that allows a breakdown of the rate of false exculpations against the quantity of DNA in the mixed sample, nor the amount contributed by particular donors.</p><p>Fortunately, another TA validation study allows exactly such a breakdown [<span>10</span>]. It examined the LRs that TA assigned to known contributors in laboratory-synthesized mixtures of 2, 3, 4 or 5 individuals. This study shows that TA frequently assigns falsely exculpatory LRs to known contributors and that this problem is particularly acute when there are more contributors and when the quantity of DNA is lower.</p><p>To help readers understand the rate of false exculpations in this study [<span>11</span>] I have prepared tables showing those findings. Table 1 shows the number, proportion and percentage of known contributors to whom TA assigned falsely exculpatory LRs (LogLR < 0) broken down by the total quantity of DNA in the mixture (1 ng or 200 pg) and the number of contributors. The data presented in Table 1 are taken entirely from figs 4 and 5 of [<span>11</span>].</p><p>Table 1 shows, for example, that TA computed 20 LRs for known contributors to two-person mixtures containing 1 ng of DNA, and none of those LRs was falsely exculpatory. By comparison, TA computed 40 LRs for known contributors to 4-person mixtures that contained 200 pg of DNA, and 15 of those LRs, or 37.5% were falsely exculpatory.</p><p>Table 2 shows additional evidence from the same study. It is a breakdown of the rate of falsely exculpatory findings by mixture weight, showing that the risk of a falsely exculpatory finding increases rapidly as the percentage of DNA the individual contributed to the mixture decreases.</p><p>The third validation study cited by Perlin et al. [<span>12</span>] examined TA results for 40 laboratory-prepared known-source two-person mixtures (with one male and one female contributor). For most of these samples, the quantity of DNA contributed by the male donor greatly exceeds the quantity of male DNA in the mixture analyzed in the case discussed in my article. For two of these samples, however, the total quantity of DNA was only 125 pg and the male donor contributed only 10% of that (approximately 12.5 pg). For one of those samples, a TA analysis that assumed both contributors are unknown produced an incriminating LR; for the other TA produced a falsely exculpatory LR.</p><p>To put these data in context, recall that the total amount of DNA in the mixture discussed in my article was only 92 pg and that the minor contributor accounted for, at best, about 25% of this amount, and perhaps much less. The findings reported by Perlin et al. [<span>11</span>] make it clear that TA often produces falsely exculpatory LRs for contributors of this type. Moreover, these falsely exculpatory LRs are often of the same magnitude as those produced by TA in that case (see tab. 6 and figs 4 and 5 of Perlin et al. [<span>11</span>]). Under these circumstances it is beyond my understanding how Perlin and his colleagues can argue that exculpatory results like those discussed in my article are trustworthy.</p><p>Given the high rates of falsely exculpatory findings that TA has been shown to produce for low-level DNA contributors, I think it is very clear that findings like the one discussed in my article cannot pass muster under the <i>Daubert</i> standard and should be inadmissible. Furthermore, I think it is irresponsible for forensic scientists to present such findings in court. As mentioned in my article, forensic scientists need to know “when to punt”—that is, “when to decline the opportunity to move forward with questionable or problematic evidence” [<span>1</span>].</p><p>In response to my suggestion that forensic scientists refrain from presenting such problematic findings, Perlin and his colleagues respond with what sounds like an advertising slogan—TA always allows forensic scientists to “go for the goal.” It is difficult for me to see how that could be true in cases like the one I discussed unless the goal is helping guilty defendants escape justice. My article called on forensic scientists to establish standards or guidelines for when such evidence is reliable enough to be used in a legal proceeding. The response from Perlin et al. [<span>3</span>] shows why such standards are necessary.</p><p>My article raised a number of additional concerns about TA that Perlin and his colleagues failed to address in any meaningful way. I pointed out, for example, that the mixture weights assigned by TA differ from those assigned by STRMix, and are difficult to square with the findings of the biological assay. The Amelogenin findings, estimates of the quantity of male DNA relative to total DNA, and examination of peak heights at loci with four detectable peaks all point to a mixture percentage for the male contributor (or contributors) far lower than TA's estimate [<span>1</span>]. In response, Perlin et al. say only that the TA results are more trustworthy because TA examines more of the data, including low-level peaks. In the absence of empirical evidence that TA is more accurate than other method for assigning mixture percentages, this argument is vacuous and circular.</p><p>I also raised concerns about the assumption that the mixture in question had two and only two contributors, rather than a higher number. The validation studies cited by Perlin et al. [<span>10, 11</span>] show convincingly that DNA analysts often underestimate the number of contributors when dealing with mixtures like the one discussed in my article. That may be an important reason why TA assigns a falsely exculpatory LR to so many low-level known contributors. As I explained in my article “…a true donor who is a minor contributor may be assigned an exculpatory LR because his genotype, in combination with the genotype of the primary donor, is a poor fit with the observed data under the assumption of two contributors. But the poor fit may occur because the mixture also contains the DNA of an unrecognized third donor who is responsible for some of the observed peaks” [<span>1</span>]. Perlin et al. [<span>1</span>] offer no response to this argument and no explanation for why LRs were reported only under the assumption of two contributors when the possibility of a higher number cannot be ruled out. In my article [<span>1</span>], and in my response to Buckleton and his colleagues [<span>13</span>], I argued that the forensic science and legal communities should pay more attention to the way laboratories handle uncertainty about such matters as the number of contributors to a DNA mixture, particularly when the results reported depend critically on assumptions that may be incorrect. This is another issue on which I believe forensic scientists need guidelines and standards.</p><p>Finally, I argued that Cybergenetics (the company that markets TA) has been presenting TA LRs in an extraordinarily biased and misleading manner in reports and testimony. I explained carefully why the statement about LRs that Cybergenetics uses as boilerplate is likely to be misinterpreted. I called on Cybergenetics to “immediately cease using this misleading language and find a better way to explain its findings” [<span>1</span>]. As I have explained elsewhere [<span>14</span>], I believe that there are two key requirements for reporting language in forensic science. First, the language must be justifiable scientifically—it must be technically correct. Second, it must communicate effectively with a lay audience—it must be the kind of statement that lay people will understand correctly. Whatever its scientific merits, I believe that Cybergenetics' LR statement fails badly on the second criterion. It is very likely to be misinterpreted. Perlin et al. respond by arguing that their statement is technically correct (at least as they interpret it), but they do not address my major concern about the potential for misinterpretation. Whether statements like the one Cybergenetics is using are adequate and appropriate is yet another issue where guidelines and standards from the forensic science community would be helpful.</p><p>Indeed, a major lesson from my article, and the discussion that has followed, is the need for organizations like OSAC and SWGDAM to address such issues. To expect competing for-profit companies to refrain from overclaiming and to fully disclose all uncertainties surrounding their findings is apparently expecting too much. To expect courts to regulate these matters as part of their review of admissibility apparently is also expecting too much [<span>15</span>]. If these matters are to be addressed at all, they will need to be addressed by the forensic science community through the standards development process.</p>\",\"PeriodicalId\":15743,\"journal\":{\"name\":\"Journal of forensic sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2024-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/1556-4029.15519\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of forensic sciences\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/1556-4029.15519\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, LEGAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of forensic sciences","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/1556-4029.15519","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, LEGAL","Score":null,"Total":0}
The article I published in February 2023 discussed a case in which two probabilistic genotyping (PG) programs were used to analyze the same DNA mixture using the same data file [1]. The mixture was found on a plastic bag containing illegal drugs. Whether the defendant was a contributor to this DNA mixture became an issue in the case.
Answering this question posed a technical challenge because there were at least two contributors and the total quantity of DNA in the mixture was only 92 pg. The major contributor was female, which ruled out the male defendant, so the key question is whether the defendant could have been a minor contributor. The amount of DNA from the minor contributor (or contributors) was very low. An analysis of peak heights suggested at least a 4:1 ratio between the major and minor contributor(s). The quantity of male DNA in the mixture was estimated to be only 6.9 pg.
Two different PG programs, STRmix™ and TrueAllele® (TA), were used to compare the defendant's DNA profile to the mixture. Both programs produced exculpatory findings supporting the hypothesis that the defendant was not a contributor, although the strength of support differed dramatically: TA produced likelihood ratios (LRs) as high as 16.7 million, whereas STRMix produced LRs ranging from 24 to 5 [1]. My article discussed differences between the programs that might explain the different LRs and questioned whether any such findings are sufficiently trustworthy and reliable to be used in court. It also questioned the way these findings were reported.
Individuals associated with both STRMix [2] and TA [3] have now responded to my article. They presented new data that helped explain why the LRs produced by the programs were so different. The difference arose largely from the use of different analytic thresholds: TA took account of many low-level (<40 rfu) peaks that were ignored by STRMix. I commend both groups for doing empirical studies to help explain why the two programs produced such different findings.
It is still unclear, however, which of the reported findings is more trustworthy, or indeed whether either should be trusted. While it is now clear that the LRs produced by STRMix were less extreme because the analyst applied an analytic threshold, those who responded to my article appear to disagree about whether such a threshold is necessary or helpful. The group led by John Buckleton, one of the creators of STRMix, expressed concern about reliance on low-level peaks: “Most of us are wary of very low peak heights. This feeling of discomfort is developed from a large body of experience noting the pernicious effects of artifacts that pass the analysis stage” [2]. Whether a lower threshold increases or decreases accuracy can only be determined, they argue, by testing the accuracy of the PG program across a range of ATs with known samples of the type in question. I agree with this assessment and I believe that a key source of uncertainty about the value of PG results in the case I discussed is that relatively little research of this type has been done.
By contrast, the group led by Mark Perlin, one of the creators of TA, sees no need for analytic thresholds, saying that “TA is a fully Bayesian system capable of looking at all the peak data…” [3]. While I certainly accept that TA is capable of producing LRs based on low-level data, I question whether such results are always reliable and trustworthy. Computer scientists are familiar with the expression “garbage in-garbage out.” It is unclear whether, and at what point, the LRs produced by TA become garbage because the low-level peaks fed into the program are unreliable. As John Butler has emphasized: “a primary purpose for validation studies … is to push the system until it fails in order to understand the potential limitations” [4]. While I acknowledged that existing validation studies show that TA works well under a broad range of circumstances [1], I also pointed out that there has been relatively little research testing the accuracy of TA for identifying low-level mixture contributors like the secondary contributor to the sample discussed in my article.
Moreover, some of that research suggests that TA is NOT reliable for such samples. For example, my article cited research showing that TA produced falsely exculpatory LRs for some known contributors, including the majority of individuals who contributed less than 10 pg to a mixture in one study [5]. Perlin et al. [3] do not respond to this evidence; they offer no arguments for ignoring it. Instead, they simply say that I cited only three of “eight peer-reviewed studies validating TA interpretations for mixtures containing 2 to 10 unknown contributors” [3].
Based on this statement a casual reader might assume that the eight validation studies Perlin et al. [3] cited actually tested TA's accuracy for low-level samples like the one discussed in my article and found that TA was accurate. That would be a mistake. These articles do not establish the trustworthiness of findings like the one discussed in my article. In fact, they establish the opposite. They show that exculpatory results of this type are often NOT accurate and hence cannot be trusted.
Three of the eight validation studies cited by Perlin et al. [3] are unhelpful for assessing the accuracy of TA because they involve reanalysis of casework samples for which ground truth regarding the identity of contributors is uncertain [6-8]; a fourth study [9] examined the accuracy of a different analytical technique than the one used in the case I discussed. So, aside from the study by Greenspoon et al. [5], which I discussed in my article, there are three studies that might provide additional information on whether TA results like the one discussed in my article are trustworthy. These three studies examined TA results for mixtures with known contributors where at least some of the contributors accounted for a very low quantity of DNA.
A key issue I will consider when discussing these studies is the “sensitivity” of TA—that is, the percentage of cases in which TA produces an LR that supports the hypothesis that a known contributor was indeed a contributor. That means an LR greater than one, or a LogLR greater than zero. I will also consider the false exculpation rate, which is the rate at which TA produces LRs for known contributors that incorrectly support the non-contributor hypothesis. “Sensitivity” and the false exculpation rate are complementary—if “sensitivity” is 90% then the false exculpation rate is 10%. The false exculpation rate falls to zero only when “sensitivity” is 100%.
One of the validation studies cited by Perlin et al. [3] examined the LRs that TA assigned to known contributors to mixtures containing up to 10 individuals [10]. It reported “sensitivity” of 92%, which means that 92% of the LogLRs assigned to true contributors were positive. This also means, of course, that 8% of the assigned LogLRs were negative and hence falsely exculpatory. According to the authors, known individuals who contributed more than 100 pg of DNA to the mixture were always assigned positive LogLRs. For this group “sensitivity” was 100%. Consequently, the falsely exculpatory LRs must have been concentrated among donors who contributed less than 100 pg. However, the authors do not report their findings in a manner that allows a breakdown of the rate of false exculpations against the quantity of DNA in the mixed sample, nor the amount contributed by particular donors.
Fortunately, another TA validation study allows exactly such a breakdown [10]. It examined the LRs that TA assigned to known contributors in laboratory-synthesized mixtures of 2, 3, 4 or 5 individuals. This study shows that TA frequently assigns falsely exculpatory LRs to known contributors and that this problem is particularly acute when there are more contributors and when the quantity of DNA is lower.
To help readers understand the rate of false exculpations in this study [11] I have prepared tables showing those findings. Table 1 shows the number, proportion and percentage of known contributors to whom TA assigned falsely exculpatory LRs (LogLR < 0) broken down by the total quantity of DNA in the mixture (1 ng or 200 pg) and the number of contributors. The data presented in Table 1 are taken entirely from figs 4 and 5 of [11].
Table 1 shows, for example, that TA computed 20 LRs for known contributors to two-person mixtures containing 1 ng of DNA, and none of those LRs was falsely exculpatory. By comparison, TA computed 40 LRs for known contributors to 4-person mixtures that contained 200 pg of DNA, and 15 of those LRs, or 37.5% were falsely exculpatory.
Table 2 shows additional evidence from the same study. It is a breakdown of the rate of falsely exculpatory findings by mixture weight, showing that the risk of a falsely exculpatory finding increases rapidly as the percentage of DNA the individual contributed to the mixture decreases.
The third validation study cited by Perlin et al. [12] examined TA results for 40 laboratory-prepared known-source two-person mixtures (with one male and one female contributor). For most of these samples, the quantity of DNA contributed by the male donor greatly exceeds the quantity of male DNA in the mixture analyzed in the case discussed in my article. For two of these samples, however, the total quantity of DNA was only 125 pg and the male donor contributed only 10% of that (approximately 12.5 pg). For one of those samples, a TA analysis that assumed both contributors are unknown produced an incriminating LR; for the other TA produced a falsely exculpatory LR.
To put these data in context, recall that the total amount of DNA in the mixture discussed in my article was only 92 pg and that the minor contributor accounted for, at best, about 25% of this amount, and perhaps much less. The findings reported by Perlin et al. [11] make it clear that TA often produces falsely exculpatory LRs for contributors of this type. Moreover, these falsely exculpatory LRs are often of the same magnitude as those produced by TA in that case (see tab. 6 and figs 4 and 5 of Perlin et al. [11]). Under these circumstances it is beyond my understanding how Perlin and his colleagues can argue that exculpatory results like those discussed in my article are trustworthy.
Given the high rates of falsely exculpatory findings that TA has been shown to produce for low-level DNA contributors, I think it is very clear that findings like the one discussed in my article cannot pass muster under the Daubert standard and should be inadmissible. Furthermore, I think it is irresponsible for forensic scientists to present such findings in court. As mentioned in my article, forensic scientists need to know “when to punt”—that is, “when to decline the opportunity to move forward with questionable or problematic evidence” [1].
In response to my suggestion that forensic scientists refrain from presenting such problematic findings, Perlin and his colleagues respond with what sounds like an advertising slogan—TA always allows forensic scientists to “go for the goal.” It is difficult for me to see how that could be true in cases like the one I discussed unless the goal is helping guilty defendants escape justice. My article called on forensic scientists to establish standards or guidelines for when such evidence is reliable enough to be used in a legal proceeding. The response from Perlin et al. [3] shows why such standards are necessary.
My article raised a number of additional concerns about TA that Perlin and his colleagues failed to address in any meaningful way. I pointed out, for example, that the mixture weights assigned by TA differ from those assigned by STRMix, and are difficult to square with the findings of the biological assay. The Amelogenin findings, estimates of the quantity of male DNA relative to total DNA, and examination of peak heights at loci with four detectable peaks all point to a mixture percentage for the male contributor (or contributors) far lower than TA's estimate [1]. In response, Perlin et al. say only that the TA results are more trustworthy because TA examines more of the data, including low-level peaks. In the absence of empirical evidence that TA is more accurate than other method for assigning mixture percentages, this argument is vacuous and circular.
I also raised concerns about the assumption that the mixture in question had two and only two contributors, rather than a higher number. The validation studies cited by Perlin et al. [10, 11] show convincingly that DNA analysts often underestimate the number of contributors when dealing with mixtures like the one discussed in my article. That may be an important reason why TA assigns a falsely exculpatory LR to so many low-level known contributors. As I explained in my article “…a true donor who is a minor contributor may be assigned an exculpatory LR because his genotype, in combination with the genotype of the primary donor, is a poor fit with the observed data under the assumption of two contributors. But the poor fit may occur because the mixture also contains the DNA of an unrecognized third donor who is responsible for some of the observed peaks” [1]. Perlin et al. [1] offer no response to this argument and no explanation for why LRs were reported only under the assumption of two contributors when the possibility of a higher number cannot be ruled out. In my article [1], and in my response to Buckleton and his colleagues [13], I argued that the forensic science and legal communities should pay more attention to the way laboratories handle uncertainty about such matters as the number of contributors to a DNA mixture, particularly when the results reported depend critically on assumptions that may be incorrect. This is another issue on which I believe forensic scientists need guidelines and standards.
Finally, I argued that Cybergenetics (the company that markets TA) has been presenting TA LRs in an extraordinarily biased and misleading manner in reports and testimony. I explained carefully why the statement about LRs that Cybergenetics uses as boilerplate is likely to be misinterpreted. I called on Cybergenetics to “immediately cease using this misleading language and find a better way to explain its findings” [1]. As I have explained elsewhere [14], I believe that there are two key requirements for reporting language in forensic science. First, the language must be justifiable scientifically—it must be technically correct. Second, it must communicate effectively with a lay audience—it must be the kind of statement that lay people will understand correctly. Whatever its scientific merits, I believe that Cybergenetics' LR statement fails badly on the second criterion. It is very likely to be misinterpreted. Perlin et al. respond by arguing that their statement is technically correct (at least as they interpret it), but they do not address my major concern about the potential for misinterpretation. Whether statements like the one Cybergenetics is using are adequate and appropriate is yet another issue where guidelines and standards from the forensic science community would be helpful.
Indeed, a major lesson from my article, and the discussion that has followed, is the need for organizations like OSAC and SWGDAM to address such issues. To expect competing for-profit companies to refrain from overclaiming and to fully disclose all uncertainties surrounding their findings is apparently expecting too much. To expect courts to regulate these matters as part of their review of admissibility apparently is also expecting too much [15]. If these matters are to be addressed at all, they will need to be addressed by the forensic science community through the standards development process.
期刊介绍:
The Journal of Forensic Sciences (JFS) is the official publication of the American Academy of Forensic Sciences (AAFS). It is devoted to the publication of original investigations, observations, scholarly inquiries and reviews in various branches of the forensic sciences. These include anthropology, criminalistics, digital and multimedia sciences, engineering and applied sciences, pathology/biology, psychiatry and behavioral science, jurisprudence, odontology, questioned documents, and toxicology. Similar submissions dealing with forensic aspects of other sciences and the social sciences are also accepted, as are submissions dealing with scientifically sound emerging science disciplines. The content and/or views expressed in the JFS are not necessarily those of the AAFS, the JFS Editorial Board, the organizations with which authors are affiliated, or the publisher of JFS. All manuscript submissions are double-blind peer-reviewed.