#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning.

Health data science Pub Date : 2023-01-01 Epub Date: 2023-07-04 DOI:10.34133/hds.0078
Abeed Sarker, Sahithi Lakamana, Yuting Guo, Yao Ge, Abimbola Leslie, Omolola Okunromade, Elena Gonzalez-Polledo, Jeanmarie Perrone, Anne Marie McKenzie-Brown
{"title":"#ChronicPain: Automated Building of a Chronic Pain Cohort from Twitter Using Machine Learning.","authors":"Abeed Sarker, Sahithi Lakamana, Yuting Guo, Yao Ge, Abimbola Leslie, Omolola Okunromade, Elena Gonzalez-Polledo, Jeanmarie Perrone, Anne Marie McKenzie-Brown","doi":"10.34133/hds.0078","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers.</p><p><strong>Methods: </strong>We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses.</p><p><strong>Results: </strong>Interannotator agreement for the binary annotation was 0.82 (Cohen's kappa). The RoBERTa model performed best (F<sub>1</sub> score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies.</p><p><strong>Conclusion: </strong>Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.</p>","PeriodicalId":73207,"journal":{"name":"Health data science","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10852024/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health data science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/hds.0078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/7/4 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Due to the high burden of chronic pain, and the detrimental public health consequences of its treatment with opioids, there is a high-priority need to identify effective alternative therapies. Social media is a potentially valuable resource for knowledge about self-reported therapies by chronic pain sufferers.

Methods: We attempted to (a) verify the presence of large-scale chronic pain-related chatter on Twitter, (b) develop natural language processing and machine learning methods for automatically detecting self-disclosures, (c) collect longitudinal data posted by them, and (d) semiautomatically analyze the types of chronic pain-related information reported by them. We collected data using chronic pain-related hashtags and keywords and manually annotated 4,998 posts to indicate if they were self-reports of chronic pain experiences. We trained and evaluated several state-of-the-art supervised text classification models and deployed the best-performing classifier. We collected all publicly available posts from detected cohort members and conducted manual and natural language processing-driven descriptive analyses.

Results: Interannotator agreement for the binary annotation was 0.82 (Cohen's kappa). The RoBERTa model performed best (F1 score: 0.84; 95% confidence interval: 0.80 to 0.89), and we used this model to classify all collected unlabeled posts. We discovered 22,795 self-reported chronic pain sufferers and collected over 3 million of their past posts. Further analyses revealed information about, but not limited to, alternative treatments, patient sentiments about treatments, side effects, and self-management strategies.

Conclusion: Our social media based approach will result in an automatically growing large cohort over time, and the data can be leveraged to identify effective opioid-alternative therapies for diverse chronic pain types.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
#慢性疼痛:使用机器学习从推特自动构建慢性疼痛队列
背景:由于慢性疼痛的高负担,以及用阿片类药物治疗慢性疼痛的有害公共卫生后果,需要高度优先确定有效的替代疗法。社交媒体是了解慢性疼痛患者自我报告治疗方法的潜在宝贵资源。方法:我们试图(a)验证Twitter上是否存在大规模的慢性疼痛相关聊天,(b)开发自然语言处理和机器学习方法来自动检测自我披露,(c)收集他们发布的纵向数据,(d)半自动分析他们报告的慢性疼痛相关信息类型。我们使用与慢性疼痛相关的标签和关键词收集数据,并手动注释4,998篇文章,以表明它们是否是慢性疼痛经历的自我报告。我们训练和评估了几个最先进的监督文本分类模型,并部署了性能最好的分类器。我们从检测到的队列成员中收集了所有公开可用的帖子,并进行了手动和自然语言处理驱动的描述性分析。结果:二元标注间的一致性为0.82 (Cohen’s kappa)。RoBERTa模型表现最佳(f1得分:0.84;95%置信区间:0.80 ~ 0.89),我们使用该模型对所有收集到的未标记帖子进行分类。我们发现了22795名自我报告的慢性疼痛患者,并收集了超过300万份他们过去的帖子。进一步的分析揭示了有关但不限于替代治疗、患者对治疗的看法、副作用和自我管理策略的信息。结论:随着时间的推移,我们基于社交媒体的方法将导致一个自动增长的大队列,数据可以用来确定有效的阿片类药物替代疗法,用于治疗各种慢性疼痛类型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
3.70
自引率
0.00%
发文量
0
期刊最新文献
Explainable Mortality Prediction for Liver Transplant Candidates with Hepatocellular Carcinoma: A Supervised Clustering Approach. Enhancing the XGBoost Mortality Prediction Model for ICU Patients with Acute Ischemic Stroke. Response to "Enhancing the XGBoost Mortality Prediction Model for ICU Patients with Acute Ischemic Stroke". Projected Prevalence and Economic Burden of Alzheimer's Disease and Related Dementias in China: Regional Disparities and Policy Implications. Anticancer Drug Approval in China: From Acceleration of Access to Certainty of Benefits.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1