微博的维数和信息量:利用微博的结构和维数进行自组织检索

Jesus A. Rodriguez Perez, J. Jose
{"title":"微博的维数和信息量:利用微博的结构和维数进行自组织检索","authors":"Jesus A. Rodriguez Perez, J. Jose","doi":"10.1145/2808194.2809466","DOIUrl":null,"url":null,"abstract":"In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval\",\"authors\":\"Jesus A. Rodriguez Perez, J. Jose\",\"doi\":\"10.1145/2808194.2809466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.\",\"PeriodicalId\":440325,\"journal\":{\"name\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808194.2809466\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809466","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

近年来,微博服务如Twitter越来越受欢迎,导致如何有效利用其内容的研究活跃。微博文档(如tweets)在形态上不同于更传统的文档(如网页)。特别是,tweet比web文档短得多(140个字符),并且包含关于主题的上下文标签(hashtags),文档的目标受众(提及)以及指向外部内容的链接(url)。传统的和最先进的检索模型在捕获tweet的相关性方面表现得相当差,因为它们是在非常不同的条件下设计的。在这项工作中,我们将微博文档定义为一个高维实体,并研究了相关文档和不相关文档之间的结构差异。其次,我们尝试通过重新排序方法来增强观察到的最佳执行检索模型的行为,该方法可以解释推文之间这些维度的相对差异。此外,我们通过将相关和不相关的tweet建模为状态机,根据文档中不同维度之间的顺序来研究它们之间的交互。然后使用这些状态机生成分数,这些分数反过来用于重新排名。我们的评估结果显示,两种方法在不同截止点的精度方面比基线有统计学上的显著改善。这些结果证实了文档中不同维度的相对存在及其排序与微博的相关性有关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval
In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Entity Linking in Queries: Tasks and Evaluation Using Part-of-Speech N-grams for Sensitive-Text Classification Query Expansion with Freebase Partially Labeled Supervised Topic Models for RetrievingSimilar Questions in CQA Forums Two Operators to Define and Manipulate Themes of a Document Collection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1