首页 > 最新文献

Proceedings of the first international workshop on Social media retrieval and analysis最新文献

英文 中文
The use of social media for music analysis and creation within the giantsteps project 在giantsteps项目中使用社交媒体进行音乐分析和创作
Peter Knees
GiantSteps is an EU-funded project that aims at developing the next generation of music composition tools for the creative industries by bridging the gap between music information research and end users' requirements. An important component of the project is the extraction of musical and application-targeted knowledge from social media and web resources. In this paper, we sketch potential ways to exploit social media and web data for the tasks of music analysis, creation, and algorithm evaluation.
GiantSteps是一个由欧盟资助的项目,旨在通过弥合音乐信息研究和最终用户需求之间的差距,为创意产业开发下一代音乐创作工具。该项目的一个重要组成部分是从社交媒体和网络资源中提取音乐和应用知识。在本文中,我们概述了利用社交媒体和网络数据进行音乐分析、创作和算法评估的潜在方法。
{"title":"The use of social media for music analysis and creation within the giantsteps project","authors":"Peter Knees","doi":"10.1145/2632188.2632212","DOIUrl":"https://doi.org/10.1145/2632188.2632212","url":null,"abstract":"GiantSteps is an EU-funded project that aims at developing the next generation of music composition tools for the creative industries by bridging the gap between music information research and end users' requirements. An important component of the project is the extraction of musical and application-targeted knowledge from social media and web resources. In this paper, we sketch potential ways to exploit social media and web data for the tasks of music analysis, creation, and algorithm evaluation.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128753237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
#nowplaying the future billboard: mining music listening behaviors of twitter users for hit song prediction #现在播放未来广告牌:挖掘twitter用户的音乐聆听行为,以预测热门歌曲
Yekyung Kim, B. Suh, Kyogu Lee
Microblogs are rich sources of information because they provide platforms for users to share their thoughts, news, information, activities, and so on. Twitter is one of the most popular microblogs. Twitter users often use hashtags to mark specific topics and to link them with related tweets. In this study, we investigate the relationship between the music listening behaviors of Twitter users and a popular music ranking service by comparing information extracted from tweets with music-related hashtags and the Billboard chart. We collect users' music listening behavior from Twitter using music-related hashtags (e.g., #nowplaying). We then build a predictive model to forecast the Billboard rankings and hit music. The results show that the numbers of daily tweets about a specific song and artist can be effectively used to predict Billboard rankings and hits. This research suggests that users' music listening behavior on Twitter is highly correlated with general music trends and could play an important role in understanding consumers' music consumption patterns. In addition, we believe that Twitter users' music listening behavior can be applied in the field of Music Information Retrieval (MIR).
微博是丰富的信息来源,因为它为用户提供了分享思想、新闻、信息、活动等的平台。Twitter是最受欢迎的微博之一。Twitter用户经常使用标签来标记特定的主题,并将它们与相关的推文链接起来。在这项研究中,我们通过比较从带有音乐相关标签的推文中提取的信息和公告牌排行榜,来研究Twitter用户的音乐聆听行为与流行音乐排名服务之间的关系。我们使用与音乐相关的标签(例如#nowplaying)从Twitter上收集用户的音乐收听行为。然后我们建立一个预测模型来预测公告牌排名和热门音乐。结果表明,关于特定歌曲和歌手的每日推文数量可以有效地用于预测公告牌排名和点击量。该研究表明,用户在Twitter上的音乐收听行为与一般音乐趋势高度相关,可以在了解消费者的音乐消费模式方面发挥重要作用。此外,我们认为Twitter用户的音乐聆听行为可以应用于音乐信息检索(MIR)领域。
{"title":"#nowplaying the future billboard: mining music listening behaviors of twitter users for hit song prediction","authors":"Yekyung Kim, B. Suh, Kyogu Lee","doi":"10.1145/2632188.2632206","DOIUrl":"https://doi.org/10.1145/2632188.2632206","url":null,"abstract":"Microblogs are rich sources of information because they provide platforms for users to share their thoughts, news, information, activities, and so on. Twitter is one of the most popular microblogs. Twitter users often use hashtags to mark specific topics and to link them with related tweets. In this study, we investigate the relationship between the music listening behaviors of Twitter users and a popular music ranking service by comparing information extracted from tweets with music-related hashtags and the Billboard chart. We collect users' music listening behavior from Twitter using music-related hashtags (e.g., #nowplaying). We then build a predictive model to forecast the Billboard rankings and hit music. The results show that the numbers of daily tweets about a specific song and artist can be effectively used to predict Billboard rankings and hits. This research suggests that users' music listening behavior on Twitter is highly correlated with general music trends and could play an important role in understanding consumers' music consumption patterns. In addition, we believe that Twitter users' music listening behavior can be applied in the field of Music Information Retrieval (MIR).","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127493469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
Finding selfies of users in microblogged photos 在微博照片中寻找用户的自拍
D. Joshi, Francine Chen, L. Wilcox
We examine the use of clustering to identify selfies in a social media user's photos. Faces are first detected within a user's photos followed by clustering using visual similarity. We define a cluster scoring scheme that uses a combination of within-cluster visual similarity and average face size in a cluster to rank potential selfie-clusters. Finally, we evaluate this ranking approach over a collection of Twitter users and discuss methods that can be used for improving performance in the future. An application of user selfies is estimating demographic information such as age, gender, and race in a more robust fashion.
我们研究了使用聚类来识别社交媒体用户照片中的自拍照。首先在用户的照片中检测人脸,然后使用视觉相似性进行聚类。我们定义了一种聚类评分方案,该方案使用聚类内视觉相似性和聚类中平均脸大小的组合来对潜在的自拍聚类进行排名。最后,我们在一组Twitter用户上评估了这种排名方法,并讨论了将来可以用于提高性能的方法。用户自拍的一个应用正在以更稳健的方式估计年龄、性别和种族等人口统计信息。
{"title":"Finding selfies of users in microblogged photos","authors":"D. Joshi, Francine Chen, L. Wilcox","doi":"10.1145/2632188.2632209","DOIUrl":"https://doi.org/10.1145/2632188.2632209","url":null,"abstract":"We examine the use of clustering to identify selfies in a social media user's photos. Faces are first detected within a user's photos followed by clustering using visual similarity. We define a cluster scoring scheme that uses a combination of within-cluster visual similarity and average face size in a cluster to rank potential selfie-clusters. Finally, we evaluate this ranking approach over a collection of Twitter users and discuss methods that can be used for improving performance in the future. An application of user selfies is estimating demographic information such as age, gender, and race in a more robust fashion.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129161604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Proceedings of the first international workshop on Social media retrieval and analysis 第一届社会媒体检索与分析国际研讨会论文集
M. Schedl, Peter Knees, Jialie Shen
It is our great pleasure to welcome you to the SoMeRA 2014: International Workshop on Social Media Retrieval and Analysis, co-located with SIGIR 2014 in Gold Coast, Australia. The amount of user-generated data (including content and contextual information of the users) has been spiraling during the past few years. Social media are fundamentally changing the way how we communicate. Nowadays, people create, share, and consume a huge number of multimedia material on the web and in particular on social platforms. The faster the growth of these corpora, the harder it gets for the individual to find the media documents which satisfy a particular information need. When it comes to multimedia material in particular, the users might also exhibit an entertainment need, which may involve aspects of novelty, serendipity, familiarity, or popularity. However, current retrieval, recommendation, and browsing techniques often fall short to deal with user-generated data of various kinds (audio, image, video, text, contextual, etc.), especially on a larger scale. Satisfying the information- or entertainment need of users in social media data requires a comprehensive understanding of them, which can be gained to some extent by means of social media analysis and -mining. Corresponding user models which are built from this knowledge will improve retrieval and recommendation in social media, going far beyond text-based search which is still the most common paradigm. The gained knowledge also enables intelligently informed and enriched applications in various media domains. The purpose of SoMeRA 2014 is to bring together researchers of different domains who are involved in social media analysis, mining, and retrieval, for instance, experts in multimedia, recommender systems, and user modeling. This is reflected by the 19 submissions received that cover topics as diverse as multimedia retrieval and exploration, user-aware recommender systems, network analysis, event detection, and computational linguistics in social media. Out of these, we selected the most outstanding works to be presented at the workshop, which features 5 oral and 8 poster presentations. In addition, the program includes a keynote speech by Prof. Tat-Seng Chua, National University of Singapore, entitled "From Social Media Data to Actionable Analytics".
我们非常高兴地欢迎您参加SoMeRA 2014:社交媒体检索和分析国际研讨会,该研讨会与SIGIR 2014在澳大利亚黄金海岸共同举办。用户生成的数据(包括用户的内容和上下文信息)的数量在过去几年中一直呈螺旋式增长。社交媒体正在从根本上改变我们的沟通方式。如今,人们在网络上,特别是在社交平台上创造、分享和消费大量的多媒体材料。这些语料库增长得越快,个人就越难找到满足特定信息需求的媒体文档。特别是当涉及到多媒体材料时,用户还可能表现出娱乐需求,这可能涉及新奇、意外、熟悉或流行的方面。然而,当前的检索、推荐和浏览技术往往无法处理各种类型的用户生成数据(音频、图像、视频、文本、上下文等),特别是在更大的范围内。满足用户在社交媒体数据中的信息或娱乐需求,需要对用户进行全面的了解,这可以通过社交媒体分析和挖掘在一定程度上获得。根据这些知识建立的相应的用户模型将改善社交媒体中的检索和推荐,远远超出目前最常见的基于文本的搜索模式。获得的知识还可以在各种媒体领域中实现智能通知和丰富的应用程序。SoMeRA 2014的目的是将参与社会媒体分析、挖掘和检索的不同领域的研究人员聚集在一起,例如多媒体、推荐系统和用户建模方面的专家。收到的19份作品反映了这一点,它们涵盖了多媒体检索和探索、用户感知推荐系统、网络分析、事件检测和社交媒体中的计算语言学等多种主题。在这些作品中,我们选出了最优秀的作品在研讨会上展示,其中包括5次口头演讲和8次海报演讲。此外,会议还邀请了新加坡国立大学蔡达生教授发表题为“从社交媒体数据到可操作分析”的主题演讲。
{"title":"Proceedings of the first international workshop on Social media retrieval and analysis","authors":"M. Schedl, Peter Knees, Jialie Shen","doi":"10.1145/2632188","DOIUrl":"https://doi.org/10.1145/2632188","url":null,"abstract":"It is our great pleasure to welcome you to the SoMeRA 2014: International Workshop on Social Media Retrieval and Analysis, co-located with SIGIR 2014 in Gold Coast, Australia. \u0000 \u0000The amount of user-generated data (including content and contextual information of the users) has been spiraling during the past few years. Social media are fundamentally changing the way how we communicate. Nowadays, people create, share, and consume a huge number of multimedia material on the web and in particular on social platforms. The faster the growth of these corpora, the harder it gets for the individual to find the media documents which satisfy a particular information need. When it comes to multimedia material in particular, the users might also exhibit an entertainment need, which may involve aspects of novelty, serendipity, familiarity, or popularity. However, current retrieval, recommendation, and browsing techniques often fall short to deal with user-generated data of various kinds (audio, image, video, text, contextual, etc.), especially on a larger scale. Satisfying the information- or entertainment need of users in social media data requires a comprehensive understanding of them, which can be gained to some extent by means of social media analysis and -mining. Corresponding user models which are built from this knowledge will improve retrieval and recommendation in social media, going far beyond text-based search which is still the most common paradigm. The gained knowledge also enables intelligently informed and enriched applications in various media domains. \u0000 \u0000The purpose of SoMeRA 2014 is to bring together researchers of different domains who are involved in social media analysis, mining, and retrieval, for instance, experts in multimedia, recommender systems, and user modeling. This is reflected by the 19 submissions received that cover topics as diverse as multimedia retrieval and exploration, user-aware recommender systems, network analysis, event detection, and computational linguistics in social media. Out of these, we selected the most outstanding works to be presented at the workshop, which features 5 oral and 8 poster presentations. In addition, the program includes a keynote speech by Prof. Tat-Seng Chua, National University of Singapore, entitled \"From Social Media Data to Actionable Analytics\".","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116535559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Social media and classical music?: a first analysis within the PHENICX project: "performances as highly enriched aNd interactive concert eXperiences" 社交媒体和古典音乐?PHENICX项目的首次分析:“作为高度丰富和互动的音乐会体验的表演”
M. Schedl
In the ongoing EU-FP7 project "Performances as Highly Enriched aNd Interactive Concert eXperiences" (PHENICX), one aim is to make Classical music appealing to new audiences, not at least the typically younger generation of social media users. In the context of the "Social Media Retrieval and Analysis" (SoMeRA) workshop, this paper sheds light on the use of two social media platforms (Last.fm and Twitter) by fans of Classical music.
在正在进行的EU-FP7项目“作为高度丰富和互动的音乐会体验的表演”(PHENICX)中,一个目标是让古典音乐吸引新的听众,至少不是典型的年轻一代社交媒体用户。在“社交媒体检索和分析”(SoMeRA)研讨会的背景下,本文阐明了两个社交媒体平台的使用(最后一个)。fm和Twitter)由古典音乐爱好者提供。
{"title":"Social media and classical music?: a first analysis within the PHENICX project: \"performances as highly enriched aNd interactive concert eXperiences\"","authors":"M. Schedl","doi":"10.1145/2632188.2632213","DOIUrl":"https://doi.org/10.1145/2632188.2632213","url":null,"abstract":"In the ongoing EU-FP7 project \"Performances as Highly Enriched aNd Interactive Concert eXperiences\" (PHENICX), one aim is to make Classical music appealing to new audiences, not at least the typically younger generation of social media users. In the context of the \"Social Media Retrieval and Analysis\" (SoMeRA) workshop, this paper sheds light on the use of two social media platforms (Last.fm and Twitter) by fans of Classical music.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123779853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of text-processing algorithms for adverse drug event extraction from social media 从社交媒体中提取不良药物事件的文本处理算法评价
Alejandro Metke-Jimenez, Sarvnaz Karimi, Cécile Paris
The discovery of suspected adverse drug reactions is no longer restricted to mining reports that pharmaceutical companies and health professionals send to regulators for possible safety signals. Patient forums and other social media are being studied for additional sources of information to assist in expediting adverse reaction discovery. Extracting information on drugs, adverse drug reactions, diseases and symptoms, or patient demographics from such media is an essential step of this process, but it is not straightforward. While most studies in this area use a lexicon-based information extraction methodology, they do not explicitly evaluate the impact of text-processing steps on their final results. We experimentally quantify the value of the most popular techniques to establish whether or not they benefit the information extraction process.
发现可疑的药物不良反应不再局限于制药公司和卫生专业人员向监管机构发送可能的安全信号的报告。正在研究患者论坛和其他社交媒体,以寻找额外的信息来源,以帮助加快不良反应的发现。从这种媒体中提取有关药物、药物不良反应、疾病和症状或患者人口统计资料的信息是这一进程的一个重要步骤,但这并不简单。虽然该领域的大多数研究使用基于词典的信息提取方法,但它们没有明确评估文本处理步骤对最终结果的影响。我们通过实验量化了最流行的技术的价值,以确定它们是否有利于信息提取过程。
{"title":"Evaluation of text-processing algorithms for adverse drug event extraction from social media","authors":"Alejandro Metke-Jimenez, Sarvnaz Karimi, Cécile Paris","doi":"10.1145/2632188.2632200","DOIUrl":"https://doi.org/10.1145/2632188.2632200","url":null,"abstract":"The discovery of suspected adverse drug reactions is no longer restricted to mining reports that pharmaceutical companies and health professionals send to regulators for possible safety signals. Patient forums and other social media are being studied for additional sources of information to assist in expediting adverse reaction discovery. Extracting information on drugs, adverse drug reactions, diseases and symptoms, or patient demographics from such media is an essential step of this process, but it is not straightforward. While most studies in this area use a lexicon-based information extraction methodology, they do not explicitly evaluate the impact of text-processing steps on their final results. We experimentally quantify the value of the most popular techniques to establish whether or not they benefit the information extraction process.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133454136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Short text categorization exploiting contextual enrichment and external knowledge 利用上下文丰富性和外部知识的短文本分类
Stefano Mizzaro, M. Pavan, Ivan Scagnetto, Martino Valenti
We address the problem of the categorization of short texts, like those posted by users on social networks and microblogging platforms. We specifically focus on Twitter. Since short texts do not provide sufficient word occurrences, and they often contain abbreviations and acronyms, traditional classification methods such as "Bag-of-Words" have limitations. Our proposed method enriches the original text with a new set of words, to add more semantic value by using information extracted from webpages of the same temporal context. Then we use those words to query Wikipedia, as an external knowledge base, with the final goal to categorize the original text using a predefined set of Wikipedia categories. We also present a first experimental evaluation that confirms the effectiveness of the algorithm design and implementation choices, highlighting some critical issues with short texts.
我们解决了短文本的分类问题,比如用户在社交网络和微博平台上发布的短文本。我们特别关注Twitter。由于短文本没有提供足够的单词出现次数,而且它们经常包含缩写和首字母缩略词,因此传统的分类方法(如“Bag-of-Words”)存在局限性。我们提出的方法利用从同一时间上下文的网页中提取的信息,用一组新的词来丰富原始文本,增加更多的语义价值。然后我们使用这些词来查询维基百科,作为一个外部知识库,最终目标是使用维基百科的预定义分类集对原始文本进行分类。我们还提出了第一个实验评估,证实了算法设计和实现选择的有效性,强调了短文本的一些关键问题。
{"title":"Short text categorization exploiting contextual enrichment and external knowledge","authors":"Stefano Mizzaro, M. Pavan, Ivan Scagnetto, Martino Valenti","doi":"10.1145/2632188.2632205","DOIUrl":"https://doi.org/10.1145/2632188.2632205","url":null,"abstract":"We address the problem of the categorization of short texts, like those posted by users on social networks and microblogging platforms. We specifically focus on Twitter. Since short texts do not provide sufficient word occurrences, and they often contain abbreviations and acronyms, traditional classification methods such as \"Bag-of-Words\" have limitations. Our proposed method enriches the original text with a new set of words, to add more semantic value by using information extracted from webpages of the same temporal context. Then we use those words to query Wikipedia, as an external knowledge base, with the final goal to categorize the original text using a predefined set of Wikipedia categories. We also present a first experimental evaluation that confirms the effectiveness of the algorithm design and implementation choices, highlighting some critical issues with short texts.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129314417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Automatic identification of arabic dialects in social media 自动识别阿拉伯语方言在社交媒体
F. Sadat, Farnaz Kazemi, Atefeh Farzindar
Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different conditions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable overall accuracy of 98%. This work is a first-step towards an ultimate goal of a translation system from Arabic to English and French, within the ASMAT project
现代标准阿拉伯语(MSA)是大多数阿拉伯国家的正式语言。阿拉伯语方言(AD)或日常语言与MSA不同,特别是在社交媒体交流中。然而,大多数阿拉伯社交媒体文本都是混合形式和许多变化,特别是在MSA和AD之间。本文旨在通过提供一个跨社交媒体数据集使用概率模型进行AD分类的框架,弥合MSA和AD之间的差距。我们提出了一组使用字符n-gram马尔可夫语言模型和朴素贝叶斯分类器的实验,并详细检查了哪些模型在社交媒体背景下的不同条件下表现最佳。实验结果表明,基于字符双图模型的朴素贝叶斯分类器可以识别出18种不同的阿拉伯语方言,总体准确率达到98%。在ASMAT项目中,这项工作是实现从阿拉伯语到英语和法语翻译系统的最终目标的第一步
{"title":"Automatic identification of arabic dialects in social media","authors":"F. Sadat, Farnaz Kazemi, Atefeh Farzindar","doi":"10.1145/2632188.2632207","DOIUrl":"https://doi.org/10.1145/2632188.2632207","url":null,"abstract":"Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for AD classification using probabilistic models across social media datasets. We present a set of experiments using the character n-gram Markov language model and Naive Bayes classifiers with detailed examination of what models perform best under different conditions in social media context. Experimental results show that Naive Bayes classifier based on character bi-gram model can identify the 18 different Arabic dialects with a considerable overall accuracy of 98%. This work is a first-step towards an ultimate goal of a translation system from Arabic to English and French, within the ASMAT project","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124200322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Query performance prediction for microblog search: a preliminary study 微博搜索查询性能预测的初步研究
Maram Hasanain, Rana Malhas, T. Elsayed
Microblogging has recently become an integral part of the daily life of millions of people around the world. With a continuous flood of posts, microblogging services (e.g., Twitter) have to effectively handle millions of user queries that aim to search and follow recent developments of news or events. While predicting the quality of retrieved documents against search queries was extensively studied in domains such as the Web and news, the different nature of data and search task in microblogs triggers the need for re-visiting the problem in that context. In this work, we re-examined several state-of-the-art query performance predictors in the domain of microblog ad-hoc search using the two most-commonly used tweets collections with three different retrieval models that are used in microblog search. Our experiments showed that a temporal predictor was generally the best to fit the prediction task in the context of microblog search, indicating the importance of the temporal aspect in this task. The results also highlighted the need to either re-design some of the existing predictors or propose new ones to function effectively with different retrieval models that are used in our tested domain. Finally, our experiments on combining multiple predictors resulted in achieving considerable improvements in prediction quality over individual predictors, which confirmed the results reported in the literature but in different domains.
微博最近已经成为全球数百万人日常生活中不可或缺的一部分。随着帖子的不断涌入,微博服务(如Twitter)必须有效地处理数以百万计的用户查询,这些查询旨在搜索和跟踪最新的新闻或事件发展。虽然在Web和新闻等领域对根据搜索查询预测检索文档的质量进行了广泛的研究,但微博中数据和搜索任务的不同性质触发了在该上下文中重新访问问题的需要。在这项工作中,我们使用微博搜索中使用的两种最常用的tweet集合和三种不同的检索模型,重新检查了微博ad-hoc搜索领域中几种最先进的查询性能预测器。我们的实验表明,在微博搜索的背景下,时间预测器通常是最适合预测任务的,这表明时间方面在该任务中的重要性。结果还强调需要重新设计一些现有的预测器,或者提出新的预测器,以便在我们测试的领域中使用不同的检索模型来有效地工作。最后,我们结合多个预测因子的实验结果在预测质量上比单个预测因子取得了相当大的改进,这证实了文献中报道的结果,但在不同的领域。
{"title":"Query performance prediction for microblog search: a preliminary study","authors":"Maram Hasanain, Rana Malhas, T. Elsayed","doi":"10.1145/2632188.2632210","DOIUrl":"https://doi.org/10.1145/2632188.2632210","url":null,"abstract":"Microblogging has recently become an integral part of the daily life of millions of people around the world. With a continuous flood of posts, microblogging services (e.g., Twitter) have to effectively handle millions of user queries that aim to search and follow recent developments of news or events. While predicting the quality of retrieved documents against search queries was extensively studied in domains such as the Web and news, the different nature of data and search task in microblogs triggers the need for re-visiting the problem in that context. In this work, we re-examined several state-of-the-art query performance predictors in the domain of microblog ad-hoc search using the two most-commonly used tweets collections with three different retrieval models that are used in microblog search. Our experiments showed that a temporal predictor was generally the best to fit the prediction task in the context of microblog search, indicating the importance of the temporal aspect in this task. The results also highlighted the need to either re-design some of the existing predictors or propose new ones to function effectively with different retrieval models that are used in our tested domain. Finally, our experiments on combining multiple predictors resulted in achieving considerable improvements in prediction quality over individual predictors, which confirmed the results reported in the literature but in different domains.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129468474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Ranking model selection and fusion for effective microblog search 有效微博搜索的排名模型选择与融合
Zhongyu Wei, Wei Gao, Tarek El-Ganainy, Walid Magdy, Kam-Fai Wong
Re-ranking was shown to have positive impact on the effectiveness for microblog search. Yet existing approaches mostly focused on using a single ranker to learn some better ranking function with respect to various relevance features. Given various available rank learners (such as learning to rank algorithms), in this work, we mainly study an orthogonal problem where multiple learned ranking models form an ensemble for re-ranking the retrieved tweets than just using a single ranking model in order to achieve higher search effectiveness. We explore the use of query-sensitive model selection and rank fusion methods based on the result lists produced from multiple rank learners. Base on the TREC microblog datasets, we found that our selection-based ensemble approach can significantly outperform using the single best ranker, and it also has clear advantage over the rank fusion that combines the results of all the available models.
重新排序对微博搜索的有效性有正向影响。然而,现有的方法主要集中在使用单个排序器来学习有关各种相关特征的更好的排序函数。鉴于各种可用的排名学习器(如学习排序算法),在这项工作中,我们主要研究一个正交问题,其中多个学习到的排名模型形成一个集成来对检索到的tweet进行重新排名,而不仅仅是使用单个排名模型,以获得更高的搜索效率。我们探索了基于多个秩学习器产生的结果列表的查询敏感模型选择和秩融合方法的使用。基于TREC微博数据集,我们发现基于选择的集成方法可以显著优于使用单个最佳排名的方法,并且也明显优于将所有可用模型的结果组合在一起的排名融合方法。
{"title":"Ranking model selection and fusion for effective microblog search","authors":"Zhongyu Wei, Wei Gao, Tarek El-Ganainy, Walid Magdy, Kam-Fai Wong","doi":"10.1145/2632188.2632202","DOIUrl":"https://doi.org/10.1145/2632188.2632202","url":null,"abstract":"Re-ranking was shown to have positive impact on the effectiveness for microblog search. Yet existing approaches mostly focused on using a single ranker to learn some better ranking function with respect to various relevance features. Given various available rank learners (such as learning to rank algorithms), in this work, we mainly study an orthogonal problem where multiple learned ranking models form an ensemble for re-ranking the retrieved tweets than just using a single ranking model in order to achieve higher search effectiveness. We explore the use of query-sensitive model selection and rank fusion methods based on the result lists produced from multiple rank learners. Base on the TREC microblog datasets, we found that our selection-based ensemble approach can significantly outperform using the single best ranker, and it also has clear advantage over the rank fusion that combines the results of all the available models.","PeriodicalId":178656,"journal":{"name":"Proceedings of the first international workshop on Social media retrieval and analysis","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115810904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Proceedings of the first international workshop on Social media retrieval and analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1