用自然语言处理方法检测死亡调查笔记中归因于自杀情况的不一致之处

IF 5.4 Q1 MEDICINE, RESEARCH & EXPERIMENTAL Communications medicine Pub Date : 2024-10-14 DOI:10.1038/s43856-024-00631-7
Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng
{"title":"用自然语言处理方法检测死亡调查笔记中归因于自杀情况的不一致之处","authors":"Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng","doi":"10.1038/s43856-024-00631-7","DOIUrl":null,"url":null,"abstract":"Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators. Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) contains the recording of individual suicide incidents taking place in the United States, and the contributing suicide circumstances. We used a computational method to check the accuracy of NVDRS records. Our method identified and rectified possible errors in labeling within the database. This method could be used to improve the label accuracy in the NVDRS database, enabling more accurate recording and study of suicide circumstances. Improved data recording of suicide circumstances could potentially be used to develop improved approaches to prevent suicide in the future. Wang et al. use a Natural Language Processing approach to detect suicide-circumstance annotation inconsistencies in death investigation notes. They identify possible label errors, show the effectiveness of identifying and rectifying possible label errors, and propose a coding consistency improvement solution.","PeriodicalId":72646,"journal":{"name":"Communications medicine","volume":" ","pages":"1-13"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s43856-024-00631-7.pdf","citationCount":"0","resultStr":"{\"title\":\"A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances\",\"authors\":\"Song Wang, Yiliang Zhou, Ziqiang Han, Cui Tao, Yunyu Xiao, Ying Ding, Joydeep Ghosh, Yifan Peng\",\"doi\":\"10.1038/s43856-024-00631-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators. Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) contains the recording of individual suicide incidents taking place in the United States, and the contributing suicide circumstances. We used a computational method to check the accuracy of NVDRS records. Our method identified and rectified possible errors in labeling within the database. This method could be used to improve the label accuracy in the NVDRS database, enabling more accurate recording and study of suicide circumstances. Improved data recording of suicide circumstances could potentially be used to develop improved approaches to prevent suicide in the future. Wang et al. use a Natural Language Processing approach to detect suicide-circumstance annotation inconsistencies in death investigation notes. They identify possible label errors, show the effectiveness of identifying and rectifying possible label errors, and propose a coding consistency improvement solution.\",\"PeriodicalId\":72646,\"journal\":{\"name\":\"Communications medicine\",\"volume\":\" \",\"pages\":\"1-13\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.com/articles/s43856-024-00631-7.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Communications medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.nature.com/articles/s43856-024-00631-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, RESEARCH & EXPERIMENTAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s43856-024-00631-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0

摘要

数据的准确性对于科学研究和政策制定至关重要。全国暴力死亡报告系统(NVDRS)数据被广泛用于发现死亡模式和致死因素。最近的研究表明,NVDRS 中存在注释不一致的情况,可能会对错误的自杀情况归因产生影响。我们提出了一种实证自然语言处理(NLP)方法来检测注释不一致的情况,并采用类似交叉验证的范式来识别可能的标签错误。我们分析了 NVDRS 中 2003 年至 2020 年间的 267,804 起自杀死亡事件。我们通过 F-1 分数的变化程度来衡量标注的不一致性。我们的结果表明,将目标州的数据纳入自杀事件分类器的训练,会使目标州测试集中的 F-1 分数提高 5.4%,而其他州测试集中的 F-1 分数降低 1.1%。总之,我们提出了一种检测注释不一致的 NLP 框架,展示了识别和纠正可能的标签错误的有效性,并最终提出了一种改进方案,以提高人类注释者的编码一致性。数据准确性对于科学研究和政策制定至关重要。美国国家暴力死亡报告系统(NVDRS)记录了发生在美国的单个自杀事件以及自杀的诱因。我们使用一种计算方法来检查 NVDRS 记录的准确性。我们的方法发现并纠正了数据库中可能存在的标签错误。这种方法可用于提高 NVDRS 数据库中标签的准确性,从而更准确地记录和研究自杀情况。改进后的自杀情况数据记录有可能用于开发改进后的预防自杀方法。Wang 等人使用自然语言处理方法检测死亡调查笔记中的自杀情况注释不一致之处。他们识别了可能的标签错误,展示了识别和纠正可能的标签错误的有效性,并提出了编码一致性改进方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances
Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns and causing factors of death. Recent studies suggested the annotation inconsistencies within the NVDRS and the potential impact on erroneous suicide-circumstance attributions. We present an empirical Natural Language Processing (NLP) approach to detect annotation inconsistencies and adopt a cross-validation-like paradigm to identify possible label errors. We analyzed 267,804 suicide death incidents between 2003 and 2020 from the NVDRS. We measured annotation inconsistency by the degree of changes in the F-1 score. Our results show that incorporating the target state’s data into training the suicide-circumstance classifier brings an increase of 5.4% to the F-1 score on the target state’s test set and a decrease of 1.1% on other states’ test set. To conclude, we present an NLP framework to detect the annotation inconsistencies, show the effectiveness of identifying and rectifying possible label errors, and eventually propose an improvement solution to improve the coding consistency of human annotators. Data accuracy is essential for scientific research and policy development. The National Violent Death Reporting System (NVDRS) contains the recording of individual suicide incidents taking place in the United States, and the contributing suicide circumstances. We used a computational method to check the accuracy of NVDRS records. Our method identified and rectified possible errors in labeling within the database. This method could be used to improve the label accuracy in the NVDRS database, enabling more accurate recording and study of suicide circumstances. Improved data recording of suicide circumstances could potentially be used to develop improved approaches to prevent suicide in the future. Wang et al. use a Natural Language Processing approach to detect suicide-circumstance annotation inconsistencies in death investigation notes. They identify possible label errors, show the effectiveness of identifying and rectifying possible label errors, and propose a coding consistency improvement solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Spatiotemporal evolution and transmission dynamics of Alpha and Delta SARS-CoV-2 variants contributing to sequential outbreaks in Cambodia during 2021. Appraisal of umbrella reviews on vaccines. Applying a transformer architecture to intraoperative temporal dynamics improves the prediction of postoperative delirium. Application of machine learning algorithms to identify serological predictors of COVID-19 severity and outcomes. Discriminating Myalgic Encephalomyelitis/Chronic Fatigue Syndrome and comorbid conditions using metabolomics in UK Biobank.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1