使用自然语言处理从病例报告中发现健康的社会决定因素:算法开发和验证

BMC digital health Pub Date : 2023-09-11 DOI:10.1186/s44247-023-00035-y

Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz

{"title":"使用自然语言处理从病例报告中发现健康的社会决定因素:算法开发和验证","authors":"Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz","doi":"10.1186/s44247-023-00035-y","DOIUrl":null,"url":null,"abstract":"Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.","PeriodicalId":72426,"journal":{"name":"BMC digital health","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation\",\"authors\":\"Shaina Raza, Elham Dolatabadi, Nancy Ondrusek, Laura Rosella, Brian Schwartz\",\"doi\":\"10.1186/s44247-023-00035-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.\",\"PeriodicalId\":72426,\"journal\":{\"name\":\"BMC digital health\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s44247-023-00035-y\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s44247-023-00035-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

健康的社会决定因素是影响健康结果(SDOH)的非医学因素。电子健康记录、临床报告和社交媒体数据中提供了丰富的SDOH信息，通常采用自由文本格式。从自由文本中提取关键信息是一个重大挑战，需要使用自然语言处理(NLP)技术来提取关键信息。目的推进临床文献中SDOH的自动提取。从已发表的文献中整理COVID-19患者的病例报告，创建一个语料库。由专家对部分数据进行标注，生成基础真值标签，采用半监督学习方法对语料库进行重新标注。方法开发了一个自然语言处理框架，并对其进行了测试。采用双向评价方法对方法的数量和质量进行评价。结果提出的NER实现在我们的测试集上达到了92.98%的准确率(f1分数)，并且在基准数据上有很好的泛化。对实例的仔细分析证明了所提出的方法在正确分类命名实体方面的优越性。结论NLP可以从自由文本中提取关键信息，如SDOH因子。为了进一步改善医疗保健结果，需要更准确地了解SDOH。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Discovering social determinants of health from case reports using natural language processing: algorithmic development and validation

Abstract Background Social determinants of health are non-medical factors that influence health outcomes (SDOH). There is a wealth of SDOH information available in electronic health records, clinical reports, and social media data, usually in free text format. Extracting key information from free text poses a significant challenge and necessitates the use of natural language processing (NLP) techniques to extract key information. Objective The objective of this research is to advance the automatic extraction of SDOH from clinical texts. Setting and data The case reports of COVID-19 patients from the published literature are curated to create a corpus. A portion of the data is annotated by experts to create ground truth labels, and semi-supervised learning method is used for corpus re-annotation. Methods An NLP framework is developed and tested to extract SDOH from the free texts. A two-way evaluation method is used to assess the quantity and quality of the methods. Results The proposed NER implementation achieves an accuracy (F1-score) of 92.98% on our test set and generalizes well on benchmark data. A careful analysis of case examples demonstrates the superiority of the proposed approach in correctly classifying the named entities. Conclusions NLP can be used to extract key information, such as SDOH factors from free texts. A more accurate understanding of SDOH is needed to further improve healthcare outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC digital health

自引率

0.00%

发文量