A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL Communications medicine Pub Date : 2024-10-14 DOI:10.1038/s43856-024-00631-7

Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng

{"title":"A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances","authors":"Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng","doi":"10.1038/s43856-024-00631-7","DOIUrl":null,"url":null,"abstract":"Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators. Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) contains the recording of individual suicide incidents taking place in the United States, and the contributing suicide circumstances. We used a computational method to check the accuracy of NVDRS records. Our method identified and rectified possible errors in labeling within the database. This method could be used to improve the label accuracy in the NVDRS database, enabling more accurate recording and study of suicide circumstances. Improved data recording of suicide circumstances could potentially be used to develop improved approaches to prevent suicide in the future. Wang et al. use a Natural Language Processing approach to detect suicide-circumstance annotation inconsistencies in death investigation notes. They identify possible label errors, show the effectiveness of identifying and rectifying possible label errors, and propose a coding consistency improvement solution.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":" ","pages":"1-13"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43856-024-00631-7.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43856-024-00631-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators. Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) contains the recording of individual suicide incidents taking place in the United States, and the contributing suicide circumstances. We used a computational method to check the accuracy of NVDRS records. Our method identified and rectified possible errors in labeling within the database. This method could be used to improve the label accuracy in the NVDRS database, enabling more accurate recording and study of suicide circumstances. Improved data recording of suicide circumstances could potentially be used to develop improved approaches to prevent suicide in the future. Wang et al. use a Natural Language Processing approach to detect suicide-circumstance annotation inconsistencies in death investigation notes. They identify possible label errors, show the effectiveness of identifying and rectifying possible label errors, and propose a coding consistency improvement solution.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用自然语言处理方法检测死亡调查笔记中归因于自杀情况的不一致之处

数据的准确性对于科学研究和政策制定至关重要。全国暴力死亡报告系统（NVDRS）数据被广泛用于发现死亡模式和致死因素。最近的研究表明，NVDRS 中存在注释不一致的情况，可能会对错误的自杀情况归因产生影响。我们提出了一种实证自然语言处理（NLP）方法来检测注释不一致的情况，并采用类似交叉验证的范式来识别可能的标签错误。我们分析了 NVDRS 中 2003 年至 2020 年间的 267,804 起自杀死亡事件。我们通过 F-1 分数的变化程度来衡量标注的不一致性。我们的结果表明，将目标州的数据纳入自杀事件分类器的训练，会使目标州测试集中的 F-1 分数提高 5.4%，而其他州测试集中的 F-1 分数降低 1.1%。总之，我们提出了一种检测注释不一致的 NLP 框架，展示了识别和纠正可能的标签错误的有效性，并最终提出了一种改进方案，以提高人类注释者的编码一致性。数据准确性对于科学研究和政策制定至关重要。美国国家暴力死亡报告系统（NVDRS）记录了发生在美国的单个自杀事件以及自杀的诱因。我们使用一种计算方法来检查 NVDRS 记录的准确性。我们的方法发现并纠正了数据库中可能存在的标签错误。这种方法可用于提高 NVDRS 数据库中标签的准确性，从而更准确地记录和研究自杀情况。改进后的自杀情况数据记录有可能用于开发改进后的预防自杀方法。Wang 等人使用自然语言处理方法检测死亡调查笔记中的自杀情况注释不一致之处。他们识别了可能的标签错误，展示了识别和纠正可能的标签错误的有效性，并提出了编码一致性改进方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Communications medicine

自引率

0.00%

发文量