Natural language processing to evaluate texting conversations between patients and healthcare providers during COVID-19 Home-Based Care in Rwanda at scale

medRxiv - Health Informatics Pub Date : 2024-08-31 DOI:10.1101/2024.08.30.24312636

Richard T Lester, Matthew Manson, Muhammed Semakula, Hyeju Jang, Hassan Mugabo, Ali Magzari, Junhong Ma Blackmer, Fanan Fattah, Simon Pierre Niyonsenga, Edson Rwagasore, Charles Ruranga, Eric Remera, Jean Claude S. Ngabonziza, Giuseppe Carenini, Sabin Nsanzimana

{"title":"Natural language processing to evaluate texting conversations between patients and healthcare providers during COVID-19 Home-Based Care in Rwanda at scale","authors":"Richard T Lester, Matthew Manson, Muhammed Semakula, Hyeju Jang, Hassan Mugabo, Ali Magzari, Junhong Ma Blackmer, Fanan Fattah, Simon Pierre Niyonsenga, Edson Rwagasore, Charles Ruranga, Eric Remera, Jean Claude S. Ngabonziza, Giuseppe Carenini, Sabin Nsanzimana","doi":"10.1101/2024.08.30.24312636","DOIUrl":null,"url":null,"abstract":"Isolation of patients with communicable infectious diseases limits spread of pathogens but can be difficult to manage outside hospitals. Rwanda deployed a digital health service nationally to assist public health clinicians to remotely monitor and support SARS-CoV-2 cases via their mobile phones using daily interactive short message service (SMS) check-ins. We aimed to assess the texting patterns and communicated topics to understand patient experiences. We extracted data on all COVID-19 cases and exposed contacts who were enrolled in the WelTel text messaging program between March 18, 2020, and March 31, 2022, and linked demographic and clinical data from the national COVID-19 registry. A sample of the text conversation corpus was English-translated and labeled with topics of interest defined by medical experts. Multiple natural language processing (NLP) topic classification models were trained and compared using F1 scores. Best performing models were applied to classify unlabeled conversations. Total 33,081 isolated patients (mean age 33·9, range 0-100), 44% female, including 30,398 cases and 2,683 contacts) were registered in WelTel. Registered patients generated 12,119 interactive text conversations in Kinyarwanda (n=8,183, 67%), English (n=3,069, 25%) and other languages. Sufficiently trained large language models (LLMs) were unavailable for Kinyarwanda. Traditional machine learning (ML) models outperformed fine-tuned transformer architecture language models on the native untranslated language corpus, however, the reverse was observed of models trained on English-only data. The most frequently identified topics discussed included symptoms (69%), diagnostics (38%), social issues (19%), prevention (18%), healthcare logistics (16%), and treatment (8·5%). Education, advice, and triage on these topics were provided to patients. Interactive text messaging can be used to remotely support isolated patients in pandemics at scale. NLP can help evaluate the medical and social factors that affect isolated patients which could ultimately inform precision public health responses to future pandemics.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv - Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.30.24312636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Isolation of patients with communicable infectious diseases limits spread of pathogens but can be difficult to manage outside hospitals. Rwanda deployed a digital health service nationally to assist public health clinicians to remotely monitor and support SARS-CoV-2 cases via their mobile phones using daily interactive short message service (SMS) check-ins. We aimed to assess the texting patterns and communicated topics to understand patient experiences. We extracted data on all COVID-19 cases and exposed contacts who were enrolled in the WelTel text messaging program between March 18, 2020, and March 31, 2022, and linked demographic and clinical data from the national COVID-19 registry. A sample of the text conversation corpus was English-translated and labeled with topics of interest defined by medical experts. Multiple natural language processing (NLP) topic classification models were trained and compared using F1 scores. Best performing models were applied to classify unlabeled conversations. Total 33,081 isolated patients (mean age 33·9, range 0-100), 44% female, including 30,398 cases and 2,683 contacts) were registered in WelTel. Registered patients generated 12,119 interactive text conversations in Kinyarwanda (n=8,183, 67%), English (n=3,069, 25%) and other languages. Sufficiently trained large language models (LLMs) were unavailable for Kinyarwanda. Traditional machine learning (ML) models outperformed fine-tuned transformer architecture language models on the native untranslated language corpus, however, the reverse was observed of models trained on English-only data. The most frequently identified topics discussed included symptoms (69%), diagnostics (38%), social issues (19%), prevention (18%), healthcare logistics (16%), and treatment (8·5%). Education, advice, and triage on these topics were provided to patients. Interactive text messaging can be used to remotely support isolated patients in pandemics at scale. NLP can help evaluate the medical and social factors that affect isolated patients which could ultimately inform precision public health responses to future pandemics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用自然语言处理技术评估卢旺达 COVID-19 家庭护理期间患者与医疗服务提供者之间的大规模短信对话

隔离传染性疾病患者可以限制病原体的传播，但在医院外却很难管理。卢旺达在全国范围内部署了一项数字医疗服务，以协助公共卫生临床医生通过手机使用每日互动短信服务（SMS）签到对 SARS-CoV-2 病例进行远程监控和支持。我们旨在评估短信模式和交流主题，以了解患者的经历。我们提取了 2020 年 3 月 18 日至 2022 年 3 月 31 日期间加入 WelTel 短信项目的所有 COVID-19 病例和接触者的数据，并将全国 COVID-19 登记处的人口统计学和临床数据联系起来。文本对话语料库的样本经过英语翻译，并标注了医学专家定义的相关主题。对多个自然语言处理（NLP）主题分类模型进行了训练，并使用 F1 分数进行比较。表现最好的模型被用于对未标记的对话进行分类。WelTel 共登记了 33,081 名孤立患者（平均年龄 33-9，范围 0-100），其中 44% 为女性，包括 30,398 个病例和 2,683 个联系人。已登记的患者以基尼亚卢旺达语（8183 人，占 67%）、英语（3069 人，占 25%）和其他语言进行了 12119 次互动文本对话。基尼亚卢旺达语没有经过充分训练的大型语言模型（LLM）。在本地未翻译语言语料库中，传统机器学习（ML）模型的表现优于微调转换器架构语言模型，但在纯英语数据中训练的模型则相反。最常见的讨论主题包括症状（69%）、诊断（38%）、社会问题（19%）、预防（18%）、医疗物流（16%）和治疗（8-5%）。就这些主题向患者提供了教育、建议和分流服务。互动短信可用于大规模远程支持大流行病中与世隔绝的患者。NLP 可以帮助评估影响被隔离患者的医疗和社会因素，最终为未来大流行病的精确公共卫生应对措施提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

medRxiv - Health Informatics

自引率

0.00%

发文量