HealthE: Recognizing Health Advice & Entities in Online Health Communities

Joseph Gatto, Parker Seegmiller, Garrett M Johnston, Madhusudan Basak, Sarah Masud Preum
{"title":"HealthE: Recognizing Health Advice & Entities in Online Health Communities","authors":"Joseph Gatto, Parker Seegmiller, Garrett M Johnston, Madhusudan Basak, Sarah Masud Preum","doi":"10.1609/icwsm.v17i1.22210","DOIUrl":null,"url":null,"abstract":"The task of extracting and classifying entities is at the core of important Health-NLP systems such as misinformation detection, medical dialogue modeling, and patient-centric information tools. Granular knowledge of textual entities allows these systems to utilize knowledge bases, retrieve relevant information, and build graphical representations of texts. Unfortunately, most existing works on health entity recognition are trained on clinical notes, which are both lexically and semantically different from public health information found in online health resources or social media. In other words, existing health entity recognizers vastly under-represent the entities relevant to public health data, such as those provided by sites like WebMD. It is crucial that future Health-NLP systems be able to model such information, as people rely on online health advice for personal health management and clinically relevant decision making. \n\nIn this work, we release a new annotated dataset, HealthE, which facilitates the large-scale analysis of online textual health advice. HealthE consists of 3,400 health advice statements with token-level entity annotations. Additionally, we release 2,256 health statements which are not health advice to facilitate health advice mining. HealthE is the first dataset with an entity-recognition label space designed for the modeling of online health advice. We motivate the need for HealthE by demonstrating the limitations of five widely-used health entity recognizers on HealthE, such as those offered by Google and Amazon. We additionally benchmark three pre-trained language models on our dataset as reference for future research. All data is made publicly available.","PeriodicalId":175641,"journal":{"name":"International Conference on Web and Social Media","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v17i1.22210","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The task of extracting and classifying entities is at the core of important Health-NLP systems such as misinformation detection, medical dialogue modeling, and patient-centric information tools. Granular knowledge of textual entities allows these systems to utilize knowledge bases, retrieve relevant information, and build graphical representations of texts. Unfortunately, most existing works on health entity recognition are trained on clinical notes, which are both lexically and semantically different from public health information found in online health resources or social media. In other words, existing health entity recognizers vastly under-represent the entities relevant to public health data, such as those provided by sites like WebMD. It is crucial that future Health-NLP systems be able to model such information, as people rely on online health advice for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, which facilitates the large-scale analysis of online textual health advice. HealthE consists of 3,400 health advice statements with token-level entity annotations. Additionally, we release 2,256 health statements which are not health advice to facilitate health advice mining. HealthE is the first dataset with an entity-recognition label space designed for the modeling of online health advice. We motivate the need for HealthE by demonstrating the limitations of five widely-used health entity recognizers on HealthE, such as those offered by Google and Amazon. We additionally benchmark three pre-trained language models on our dataset as reference for future research. All data is made publicly available.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HealthE:在在线健康社区中识别健康建议和实体
提取和分类实体的任务是重要的健康nlp系统的核心,如错误信息检测、医学对话建模和以患者为中心的信息工具。文本实体的细粒度知识允许这些系统利用知识库、检索相关信息和构建文本的图形表示。不幸的是,大多数现有的健康实体识别工作都是在临床记录上进行训练的,这些记录在词汇和语义上都不同于在线健康资源或社交媒体上发现的公共卫生信息。换句话说,现有的卫生实体识别器远远不能代表与公共卫生数据相关的实体,例如WebMD等网站提供的实体。至关重要的是,未来的health - nlp系统能够模拟这些信息,因为人们依靠在线健康建议进行个人健康管理和临床相关决策。在这项工作中,我们发布了一个新的注释数据集HealthE,它促进了在线文本健康建议的大规模分析。HealthE由3400条带有令牌级实体注释的健康通知语句组成。此外,我们还发布了2256份非健康建议的健康声明,以方便健康建议的挖掘。HealthE是第一个具有实体识别标签空间的数据集,专为在线健康建议建模而设计。我们通过展示HealthE上五种广泛使用的健康实体识别器(如Google和Amazon提供的那些)的局限性,激发了对HealthE的需求。我们还在我们的数据集上对三个预训练的语言模型进行基准测试,作为未来研究的参考。所有数据都是公开的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RTANet: Recommendation Target-Aware Network Embedding Who Is behind a Trend? Temporal Analysis of Interactions among Trend Participants on Twitter Host-Centric Social Connectedness of Migrants in Europe on Facebook Recipe Networks and the Principles of Healthy Food on the Web Social Influence-Maximizing Group Recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1