一个社交媒体帖子的在线主题发现系统

Juliana Litou, V. Kalogeraki
{"title":"一个社交媒体帖子的在线主题发现系统","authors":"Juliana Litou, V. Kalogeraki","doi":"10.1109/ICDCS.2017.289","DOIUrl":null,"url":null,"abstract":"Social media constitute nowadays one of the most common communication mediums. Millions of users exploit them daily to share information with their community in the network via messages, referred as posts. The massive volume of information shared is extremely diverse and covers a vast spectrum of topics and interests. Automatically identifying the topics of the posts is of particular interest as this can assist in a variety of applications, such as event detection, trends discovery, expert finding etc. However, designing an automated system that requires no human agent participation to identify the topics covered in posts published in Online Social Networks (OSNs) presents manifold challenges. First, posts are unstructured and commonly short, limited to just a few characters. This prevents existing classification schemes to be directly applied in such cases, due to sparseness of the text. Second, new information emerges constantly, hence building a learning corpus from past posts may fail to capture the ever evolving information emerging in OSNs. To overcome the aforementioned limitations we have designed Pythia, an automated system for short text classification that exploits the Wikipedia structure and articles to identify the topics of the posts. The topic discovery is performed in two phases. In the first step, the system exploits Wikipedia categories and articles of the corresponding categories to build the training corpus for the suppervised learning. In the second step, the text of a given post is augmented using a text enrichment mechanism that extends the post with relevant Wikipedia articles. After the initial steps are performed, we deploy k-NN classifier to determine the topic(s) covered in the original post.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"75 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Pythia: A System for Online Topic Discovery of Social Media Posts\",\"authors\":\"Juliana Litou, V. Kalogeraki\",\"doi\":\"10.1109/ICDCS.2017.289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media constitute nowadays one of the most common communication mediums. Millions of users exploit them daily to share information with their community in the network via messages, referred as posts. The massive volume of information shared is extremely diverse and covers a vast spectrum of topics and interests. Automatically identifying the topics of the posts is of particular interest as this can assist in a variety of applications, such as event detection, trends discovery, expert finding etc. However, designing an automated system that requires no human agent participation to identify the topics covered in posts published in Online Social Networks (OSNs) presents manifold challenges. First, posts are unstructured and commonly short, limited to just a few characters. This prevents existing classification schemes to be directly applied in such cases, due to sparseness of the text. Second, new information emerges constantly, hence building a learning corpus from past posts may fail to capture the ever evolving information emerging in OSNs. To overcome the aforementioned limitations we have designed Pythia, an automated system for short text classification that exploits the Wikipedia structure and articles to identify the topics of the posts. The topic discovery is performed in two phases. In the first step, the system exploits Wikipedia categories and articles of the corresponding categories to build the training corpus for the suppervised learning. In the second step, the text of a given post is augmented using a text enrichment mechanism that extends the post with relevant Wikipedia articles. After the initial steps are performed, we deploy k-NN classifier to determine the topic(s) covered in the original post.\",\"PeriodicalId\":127689,\"journal\":{\"name\":\"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"75 4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS.2017.289\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2017.289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

社交媒体是当今最常见的交流媒介之一。数以百万计的用户每天利用微博通过消息(即帖子)与他们在网络上的社区分享信息。共享的大量信息极其多样化,涵盖了广泛的主题和兴趣。自动识别帖子的主题是特别有趣的,因为这可以帮助各种应用程序,如事件检测,趋势发现,专家寻找等。然而,设计一个不需要人工参与的自动化系统来识别在线社交网络(OSNs)上发布的帖子所涵盖的主题,面临着多方面的挑战。首先,帖子没有结构,通常很短,只有几个字。由于文本的稀疏性,这阻止了现有的分类方案直接应用于这种情况。其次,新信息不断出现,因此从过去的帖子中构建学习语料库可能无法捕获osn中不断发展的信息。To overcome the aforementioned limitations we have designed Pythia, an automated system for short text classification that exploits the Wikipedia structure and articles to identify the topics of the posts. 主题发现分两个阶段进行。In the first step, the system exploits Wikipedia categories and articles of the corresponding categories to build the training corpus for the suppervised learning. 在第二步中,使用文本充实机制对给定文章的文本进行扩充,该机制用相关的Wikipedia文章扩展文章。在执行初始步骤之后,我们部署k-NN分类器来确定原始帖子中涵盖的主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Pythia: A System for Online Topic Discovery of Social Media Posts
Social media constitute nowadays one of the most common communication mediums. Millions of users exploit them daily to share information with their community in the network via messages, referred as posts. The massive volume of information shared is extremely diverse and covers a vast spectrum of topics and interests. Automatically identifying the topics of the posts is of particular interest as this can assist in a variety of applications, such as event detection, trends discovery, expert finding etc. However, designing an automated system that requires no human agent participation to identify the topics covered in posts published in Online Social Networks (OSNs) presents manifold challenges. First, posts are unstructured and commonly short, limited to just a few characters. This prevents existing classification schemes to be directly applied in such cases, due to sparseness of the text. Second, new information emerges constantly, hence building a learning corpus from past posts may fail to capture the ever evolving information emerging in OSNs. To overcome the aforementioned limitations we have designed Pythia, an automated system for short text classification that exploits the Wikipedia structure and articles to identify the topics of the posts. The topic discovery is performed in two phases. In the first step, the system exploits Wikipedia categories and articles of the corresponding categories to build the training corpus for the suppervised learning. In the second step, the text of a given post is augmented using a text enrichment mechanism that extends the post with relevant Wikipedia articles. After the initial steps are performed, we deploy k-NN classifier to determine the topic(s) covered in the original post.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Proximity Awareness Approach to Enhance Propagation Delay on the Bitcoin Peer-to-Peer Network ACTiCLOUD: Enabling the Next Generation of Cloud Applications The Internet of Things and Multiagent Systems: Decentralized Intelligence in Distributed Computing Decentralised Runtime Monitoring for Access Control Systems in Cloud Federations The Case for Using Content-Centric Networking for Distributing High-Energy Physics Software
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1