Clustering and topic modeling over tweets: A comparison over a health dataset.

Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian
{"title":"Clustering and topic modeling over tweets: A comparison over a health dataset.","authors":"Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian","doi":"10.1109/bibm47256.2019.8983167","DOIUrl":null,"url":null,"abstract":"Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2019 ","pages":"1544-1547"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm47256.2019.8983167","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm47256.2019.8983167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
tweet上的聚类和主题建模:对健康数据集的比较。
Twitter成为医疗保健领域最流行的社交互动形式。因此,不同的团队已经将Twitter评估为一个额外的来源,患者可以在这里分享他们的医疗保健信息,潜在的目标是改善他们的结果。一些现有的主题建模和文档聚类应用程序已经被用于评估推文,表明由于推文的性质和特征,应用程序的性能受到负面影响。此外,由于缺乏现有应用程序之间的比较,Twitter的健康研究已经变得难以衡量。在本文中,我们对两个Twitter健康相关数据集进行了基于不同主题建模和文档聚类应用程序的内部索引的评估。结果表明,Online Twitter LDA和Gibbs LDA在提取主题和分组tweet方面具有更好的性能。我们希望为健康从业者提供这种比较,以选择最适合他们任务的应用程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Interpreting Lung Cancer Health Disparity between African American Males and European American Males. Causal Explanation from Mild Cognitive Impairment Progression using Graph Neural Networks. Predicting HIV Diagnosis Among Emerging Adults Using Electronic Health Records and Health Survey Data in All of Us Research Program. A generalizable physiological model for detection of Delayed Cerebral Ischemia using Federated Learning. Harnessing Transfer Learning for Dementia Prediction: Leveraging Sex-Different Mild Cognitive Impairment Prognosis.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1