Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection

Rildo Pinto da Silva, Juliana Tarossi Pollettini, Antonio Pazin Filho
{"title":"Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection","authors":"Rildo Pinto da Silva, Juliana Tarossi Pollettini, Antonio Pazin Filho","doi":"10.1590/0102-311XEN243722","DOIUrl":null,"url":null,"abstract":"Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.","PeriodicalId":122102,"journal":{"name":"Cadernos de Saúde Pública","volume":"25 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cadernos de Saúde Pública","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1590/0102-311XEN243722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Patients with post-COVID-19 syndrome benefit from health promotion programs. Their rapid identification is important for the cost-effective use of these programs. Traditional identification techniques perform poorly especially in pandemics. A descriptive observational study was carried out using 105,008 prior authorizations paid by a private health care provider with the application of an unsupervised natural language processing method by topic modeling to identify patients suspected of being infected by COVID-19. A total of 6 models were generated: 3 using the BERTopic algorithm and 3 Word2Vec models. The BERTopic model automatically creates disease groups. In the Word2Vec model, manual analysis of the first 100 cases of each topic was necessary to define the topics related to COVID-19. The BERTopic model with more than 1,000 authorizations per topic without word treatment selected more severe patients - average cost per prior authorizations paid of BRL 10,206 and total expenditure of BRL 20.3 million (5.4%) in 1,987 prior authorizations (1.9%). It had 70% accuracy compared to human analysis and 20% of cases with potential interest, all subject to analysis for inclusion in a health promotion program. It had an important loss of cases when compared to the traditional research model with structured language and identified other groups of diseases - orthopedic, mental and cancer. The BERTopic model served as an exploratory method to be used in case labeling and subsequent application in supervised models. The automatic identification of other diseases raises ethical questions about the treatment of health information by machine learning.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
识别疑似 COVID-19 感染者的无监督自然语言处理技术
covid -19后综合征患者受益于健康促进计划。它们的快速识别对于这些程序的成本效益使用是重要的。传统的识别技术表现不佳,特别是在流行病中。利用一家私人医疗保健提供者支付的105,008份事先授权,应用主题建模的无监督自然语言处理方法,对疑似感染COVID-19的患者进行了描述性观察研究。共生成6个模型:3个使用BERTopic算法,3个使用Word2Vec模型。BERTopic模型自动创建疾病组。在Word2Vec模型中,需要对每个主题的前100例进行人工分析,以定义与COVID-19相关的主题。BERTopic模型每个主题超过1000个授权,没有单词治疗,选择了更严重的患者- 1987个先前授权(1.9%)中每个先前授权的平均成本为10,206雷亚尔,总支出为2,030万雷亚尔(5.4%)。与人类分析相比,它的准确率为70%,对潜在利益病例的准确率为20%,所有这些病例都需要进行分析,以纳入健康促进计划。与使用结构化语言的传统研究模式相比,它有重要的病例损失,并确定了其他类型的疾病——骨科、精神和癌症。BERTopic模型作为一种探索性方法,用于案例标注和随后在监督模型中的应用。其他疾病的自动识别引发了关于机器学习处理健康信息的伦理问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Physical activity, anxiety, depression, and coping in Turkish men and women during the first wave of COVID-19 Presidential election results in 2018-2022 and its association with excess mortality during the 2020-2021 COVID-19 pandemic in Brazilian municipalities Food insecurity in Brazil by household arrangements and characteristics between 2004 and 2022 Association between job lost and mental health outcomes during the COVID-19 pandemic and the role of food insecurity as mediator of this relationship O papel mediador da dependência de mídia social e da qualidade do sono na associação entre tempo de uso de mídia social e sintomas depressivos em universitários
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1