Clustering and topic modeling over tweets: A comparison over a health dataset.

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine Pub Date : 2019-11-01 DOI:10.1109/bibm47256.2019.8983167

Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian

{"title":"Clustering and topic modeling over tweets: A comparison over a health dataset.","authors":"Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Tina Hernandez-Boussard, Jiang Bian","doi":"10.1109/bibm47256.2019.8983167","DOIUrl":null,"url":null,"abstract":"Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.","PeriodicalId":74563,"journal":{"name":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","volume":"2019 ","pages":"1544-1547"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/bibm47256.2019.8983167","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Bioinformatics and Biomedicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bibm47256.2019.8983167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

tweet上的聚类和主题建模:对健康数据集的比较。

Twitter成为医疗保健领域最流行的社交互动形式。因此，不同的团队已经将Twitter评估为一个额外的来源，患者可以在这里分享他们的医疗保健信息，潜在的目标是改善他们的结果。一些现有的主题建模和文档聚类应用程序已经被用于评估推文，表明由于推文的性质和特征，应用程序的性能受到负面影响。此外，由于缺乏现有应用程序之间的比较，Twitter的健康研究已经变得难以衡量。在本文中，我们对两个Twitter健康相关数据集进行了基于不同主题建模和文档聚类应用程序的内部索引的评估。结果表明，Online Twitter LDA和Gibbs LDA在提取主题和分组tweet方面具有更好的性能。我们希望为健康从业者提供这种比较，以选择最适合他们任务的应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings. IEEE International Conference on Bioinformatics and Biomedicine

自引率

0.00%

发文量