基于潜在语义建模的跨语言微博检索

Archana Godavarthy, Yi Fang
{"title":"基于潜在语义建模的跨语言微博检索","authors":"Archana Godavarthy, Yi Fang","doi":"10.1145/2970398.2970436","DOIUrl":null,"url":null,"abstract":"Microblogging has become one of the major tools of sharing real-time information for people around the world. Finding relevant information across different languages on microblogs is highly desirable especially for the large number of multilingual users. However, the characteristics of microblog content pose great challenges to the existing cross-language information retrieval approaches. In this paper, we address the task of retrieving relevant tweets given another tweet in a different language. We build parallel corpora for tweets in different languages by bridging them via shared hashtags. We propose a latent semantic approach to model the parallel corpora by mapping the parallel tweets to a low-dimensional shared semantic space. The relevance between tweets in different languages is measured in this shared latent space and the model is trained on a pairwise loss function. The preliminary experiments on a Twitter dataset demonstrate the effectiveness of the proposed approach.","PeriodicalId":443715,"journal":{"name":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cross-Language Microblog Retrieval using Latent Semantic Modeling\",\"authors\":\"Archana Godavarthy, Yi Fang\",\"doi\":\"10.1145/2970398.2970436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Microblogging has become one of the major tools of sharing real-time information for people around the world. Finding relevant information across different languages on microblogs is highly desirable especially for the large number of multilingual users. However, the characteristics of microblog content pose great challenges to the existing cross-language information retrieval approaches. In this paper, we address the task of retrieving relevant tweets given another tweet in a different language. We build parallel corpora for tweets in different languages by bridging them via shared hashtags. We propose a latent semantic approach to model the parallel corpora by mapping the parallel tweets to a low-dimensional shared semantic space. The relevance between tweets in different languages is measured in this shared latent space and the model is trained on a pairwise loss function. The preliminary experiments on a Twitter dataset demonstrate the effectiveness of the proposed approach.\",\"PeriodicalId\":443715,\"journal\":{\"name\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2970398.2970436\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2970398.2970436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

微博已经成为世界各地人们分享实时信息的主要工具之一。在微博上找到不同语言的相关信息是非常可取的,特别是对于大量的多语言用户。然而,微博内容的特点对现有的跨语言信息检索方法提出了很大的挑战。在本文中,我们解决了在给定另一条不同语言的推文的情况下检索相关推文的任务。我们通过共享标签为不同语言的推文建立了并行语料库。我们提出了一种潜在语义方法,通过将并行推文映射到低维共享语义空间来建模并行语料库。在这个共享的潜在空间中测量不同语言推文之间的相关性,并在成对损失函数上训练模型。在Twitter数据集上的初步实验证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cross-Language Microblog Retrieval using Latent Semantic Modeling
Microblogging has become one of the major tools of sharing real-time information for people around the world. Finding relevant information across different languages on microblogs is highly desirable especially for the large number of multilingual users. However, the characteristics of microblog content pose great challenges to the existing cross-language information retrieval approaches. In this paper, we address the task of retrieving relevant tweets given another tweet in a different language. We build parallel corpora for tweets in different languages by bridging them via shared hashtags. We propose a latent semantic approach to model the parallel corpora by mapping the parallel tweets to a low-dimensional shared semantic space. The relevance between tweets in different languages is measured in this shared latent space and the model is trained on a pairwise loss function. The preliminary experiments on a Twitter dataset demonstrate the effectiveness of the proposed approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Simple and Effective Approach to Score Standardisation Understanding the Message of Images with Knowledge Base Traversals A Topical Approach to Retrievability Bias Estimation Efficient and Effective Higher Order Proximity Modeling Cross-Language Microblog Retrieval using Latent Semantic Modeling
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1