使用自然语言处理从病例报告中发现健康的社会决定因素:算法开发和验证

Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz
{"title":"使用自然语言处理从病例报告中发现健康的社会决定因素:算法开发和验证","authors":"Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz","doi":"10.1186/s44247-023-00035-y","DOIUrl":null,"url":null,"abstract":"Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.","PeriodicalId":72426,"journal":{"name":"BMC digital health","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation\",\"authors\":\"Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz\",\"doi\":\"10.1186/s44247-023-00035-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.\",\"PeriodicalId\":72426,\"journal\":{\"name\":\"BMC digital health\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s44247-023-00035-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44247-023-00035-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

健康的社会决定因素是影响健康结果(SDOH)的非医学因素。电子健康记录、临床报告和社交媒体数据中提供了丰富的SDOH信息,通常采用自由文本格式。从自由文本中提取关键信息是一个重大挑战,需要使用自然语言处理(NLP)技术来提取关键信息。目的推进临床文献中SDOH的自动提取。从已发表的文献中整理COVID-19患者的病例报告,创建一个语料库。由专家对部分数据进行标注,生成基础真值标签,采用半监督学习方法对语料库进行重新标注。方法开发了一个自然语言处理框架,并对其进行了测试。采用双向评价方法对方法的数量和质量进行评价。结果提出的NER实现在我们的测试集上达到了92.98%的准确率(f1分数),并且在基准数据上有很好的泛化。对实例的仔细分析证明了所提出的方法在正确分类命名实体方面的优越性。结论NLP可以从自由文本中提取关键信息,如SDOH因子。为了进一步改善医疗保健结果,需要更准确地了解SDOH。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation
Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Potential sources of inaccuracy in the Apple watch series 4 energy expenditure estimation algorithm during wheelchair propulsion From an idea to the marketplace: identifying and addressing ethical and regulatory considerations across the digital health product-development lifecycle Efficacy of digital interventions on physical activity promotion in individuals with noncommunicable diseases: an overview of systematic reviews Software symptomcheckR: an R package for analyzing and visualizing symptom checker triage performance Introduction and content analysis of "Nurse's voice" WhatsApp group during COVID-19 pandemic: a qualitative study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1