SMUC '11最新文献

英文中文

Improved answer ranking in social question-answering portals 改进了社交问答门户的答案排名

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065030

F. Hieber, S. Riezler

Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality content challenging but all the more important. We present an approach to answer ranking and show the usefulness of features that explicitly model answer quality. Furthermore, we introduce the idea of leveraging snippets of web search results for query expansion in answer ranking. We present an evaluation setup that avoids spurious results reported in earlier work. Our results show the usefulness of our features and query expansion techniques, and point to the importance of regularization when learning from noisy data.

社区QA门户为非事实性的问题回答提供了重要的资源。用户生成数据的固有噪声使得高质量内容的识别具有挑战性，但也更加重要。我们提出了一种答案排名的方法，并展示了明确建模答案质量的特征的有用性。此外，我们还介绍了利用网页搜索结果片段在答案排名中进行查询扩展的想法。我们提出了一种评估设置，以避免早期工作中报告的虚假结果。我们的结果显示了我们的特征和查询扩展技术的有用性，并指出了在从噪声数据中学习时正则化的重要性。

引用次数: 24

Analysis of communities in social media 社交媒体中的社区分析

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065033

M. Atzmüller

Social media have already woven themselves into the very fabric of everyday life. There are a variety of applications and associated computational social systems. Furthermore, we observe the emergence into more mobile and ubiquitous applications. Various social applications provide for a broad range of user interaction and communication. In this setting, data mining and analysis plays a central role, e.g., for automatically detecting associations and relationships, and identifying interesting topics. In particular, in this talk I will consider the discovery and analysis of communities, e.g., concerning users and user-generated content. Such communities can be applied, for example, for personalization or generating recommendations. However, while there exists a range of community mining options, a thorough evaluation and assessment typically relies on existing gold-standard data or costly user-studies. This talk presents approaches for the analysis of communities and descriptive patterns in social media. Methods for mining and assessing communities and descriptive patterns will be introduced. The proposed analysis methodology provides for a cost-efficient approach for identifying descriptive and user-interpretable communities, since the assessment is performed using secondary data that is easy to acquire. In this talk, I will provide examples for the presented analysis techniques using social data from real-world systems. In particular, I will focus on data from the social bookmarking system BibSonomy (http://www.bibsonomy.org), and from the social conference guidance system Conferator (http://www.conferator.org).

社交媒体已经融入了我们的日常生活。有各种各样的应用程序和相关的计算社会系统。此外，我们观察到更多移动和无处不在的应用程序的出现。各种社交应用程序提供了广泛的用户交互和通信。在这种情况下，数据挖掘和分析起着中心作用，例如，用于自动检测关联和关系，以及识别有趣的主题。特别是，在这次演讲中，我将考虑社区的发现和分析，例如，关于用户和用户生成的内容。例如，可以将此类社区应用于个性化或生成推荐。然而，虽然存在一系列社区采矿选择，但彻底的评价和评估通常依赖于现有的黄金标准数据或昂贵的用户研究。本讲座介绍了社会媒体中社区分析和描述模式的方法。将介绍挖掘和评估社区和描述模式的方法。所提议的分析方法为确定描述性和用户可解释的社区提供了一种成本效益高的方法，因为评估是使用易于获得的辅助数据进行的。在这次演讲中，我将提供使用来自现实世界系统的社会数据的分析技术的示例。我将特别关注来自社交书签系统BibSonomy (http://www.bibsonomy.org)和社交会议指导系统Conferator (http://www.conferator.org)的数据。

{"title":"Analysis of communities in social media","authors":"M. Atzmüller","doi":"10.1145/2065023.2065033","DOIUrl":"https://doi.org/10.1145/2065023.2065033","url":null,"abstract":"Social media have already woven themselves into the very fabric of everyday life. There are a variety of applications and associated computational social systems. Furthermore, we observe the emergence into more mobile and ubiquitous applications. Various social applications provide for a broad range of user interaction and communication. In this setting, data mining and analysis plays a central role, e.g., for automatically detecting associations and relationships, and identifying interesting topics. In particular, in this talk I will consider the discovery and analysis of communities, e.g., concerning users and user-generated content. Such communities can be applied, for example, for personalization or generating recommendations. However, while there exists a range of community mining options, a thorough evaluation and assessment typically relies on existing gold-standard data or costly user-studies.\u0000 This talk presents approaches for the analysis of communities and descriptive patterns in social media. Methods for mining and assessing communities and descriptive patterns will be introduced. The proposed analysis methodology provides for a cost-efficient approach for identifying descriptive and user-interpretable communities, since the assessment is performed using secondary data that is easy to acquire.\u0000 In this talk, I will provide examples for the presented analysis techniques using social data from real-world systems. In particular, I will focus on data from the social bookmarking system BibSonomy (http://www.bibsonomy.org), and from the social conference guidance system Conferator (http://www.conferator.org).","PeriodicalId":341071,"journal":{"name":"SMUC '11","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128570702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Detection of near-duplicate user generated contents: the SMS spam collection 检测近乎重复的用户生成内容:SMS垃圾邮件收集

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065031

Enrique Vallés, Paolo Rosso

Today, the number of spam text messages has grown in number, mainly because companies are looking for free advertising. For the users is very important to filter these kinds of spam messages that can be viewed as near-duplicate texts because mostly created from templates. The identification of spam text messages is a very hard and time-consuming task and it involves to carefully scanning hundreds of text messages. Therefore, since the task of near-duplicate detection can be seen as a specific case of plagiarism detection, we investigated whether plagiarism detection tools could be used as filters for spam text messages. Moreover we solve the near-duplicate detection problem on the basis of a clustering approach using CLUTO framework. We carried out some preliminary experiments on the SMS Spam Collection that recently was made available for research purposes. The results were compared with the ones obtained with the CLUTO. Althought plagiarism detection tools detect a good number of near-duplicate SMS spam messages even better results are obtained with the CLUTO clustering tool.

今天，垃圾短信的数量在增长，主要是因为公司在寻找免费广告。对于用户来说，过滤这些类型的垃圾邮件非常重要，这些垃圾邮件可以被视为近乎重复的文本，因为它们大多是从模板创建的。垃圾短信的识别是一项非常困难和耗时的任务，需要仔细扫描数百条短信。因此，由于近重复检测任务可以被视为剽窃检测的一个具体案例，我们研究了剽窃检测工具是否可以用作垃圾短信的过滤器。此外，我们还利用CLUTO框架解决了基于聚类的近重复检测问题。我们对最近为研究目的而提供的SMS垃圾邮件收集进行了一些初步实验。并与CLUTO的结果进行了比较。虽然抄袭检测工具可以检测到大量近乎重复的SMS垃圾邮件，但使用CLUTO聚类工具可以获得更好的结果。

引用次数: 22

Predicting age and gender in online social networks 预测在线社交网络中的年龄和性别

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065035

Claudia Peersman, Walter Daelemans, L. V. Vaerenbergh

A common characteristic of communication on online social networks is that it happens via short messages, often using non-standard language variations. These characteristics make this type of text a challenging text genre for natural language processing. Moreover, in these digital communities it is easy to provide a false name, age, gender and location in order to hide one's true identity, providing criminals such as pedophiles with new possibilities to groom their victims. It would therefore be useful if user profiles can be checked on the basis of text analysis, and false profiles flagged for monitoring. This paper presents an exploratory study in which we apply a text categorization approach for the prediction of age and gender on a corpus of chat texts, which we collected from the Belgian social networking site Netlog. We examine which types of features are most informative for a reliable prediction of age and gender on this difficult text type and perform experiments with different data set sizes in order to acquire more insight into the minimum data size requirements for this task.

在线社交网络上交流的一个共同特征是通过短信进行，通常使用非标准语言变体。这些特征使得这种类型的文本成为自然语言处理的一种具有挑战性的文本类型。此外，在这些数字社区中，为了隐藏自己的真实身份，很容易提供虚假的姓名、年龄、性别和地点，这为恋童癖等犯罪分子提供了培养受害者的新可能性。因此，如果可以在文本分析的基础上检查用户配置文件，并标记错误的配置文件以进行监视，这将是有用的。本文提出了一项探索性研究，我们在比利时社交网站Netlog收集的聊天文本语料库上应用文本分类方法来预测年龄和性别。我们研究了哪些类型的特征对于在这种困难的文本类型上可靠地预测年龄和性别最有信息，并使用不同的数据集大小进行实验，以便更深入地了解该任务的最小数据大小要求。

引用次数: 302

"I'm eating a sandwich in Glasgow": modeling locations with tweets “我正在格拉斯哥吃三明治”:用推特为地点建模

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065039

Sheila Kinsella, Vanessa Murdock, Neil O'Hare

Social media such as Twitter generate large quantities of data about what a person is thinking and doing in a particular location. We leverage this data to build models of locations to improve our understanding of a user's geographic context. Understanding the user's geographic context can in turn enable a variety of services that allow us to present information, recommend businesses and services, and place advertisements that are relevant at a hyper-local level. In this paper we create language models of locations using coordinates extracted from geotagged Twitter data. We model locations at varying levels of granularity, from the zip code to the country level. We measure the accuracy of these models by the degree to which we can predict the location of an individual tweet, and further by the accuracy with which we can predict the location of a user. We find that we can meet the performance of the industry standard tool for predicting both the tweet and the user at the country, state and city levels, and far exceed its performance at the hyper-local level, achieving a three- to ten-fold increase in accuracy at the zip code level.

像推特这样的社交媒体会产生大量关于一个人在特定地点的想法和行为的数据。我们利用这些数据来建立位置模型，以提高我们对用户地理环境的理解。了解用户的地理环境可以反过来提供各种服务，使我们能够呈现信息，推荐企业和服务，并在超本地级别放置相关广告。在本文中，我们使用从地理标记Twitter数据中提取的坐标来创建位置的语言模型。我们以不同的粒度级别对位置进行建模，从邮政编码到国家级别。我们通过预测单个tweet位置的程度来衡量这些模型的准确性，进而通过预测用户位置的准确性来衡量。我们发现，我们可以满足行业标准工具在国家、州和城市级别上预测推文和用户的性能，并且远远超过其在超本地级别上的性能，在邮政编码级别上实现了三到十倍的准确性提高。

引用次数: 262

On the generation of rich content metadata from social media 关于社交媒体富内容元数据的生成

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065042

Giacomo Inches, A. Basso, F. Crestani

This contribution proposes a framework to generate auxiliary rich TV content metadata by processing social networks data. Based on simple criteria to identify authoritative social media sources, we have analysed Twitter short messages relative to TV program content and devised a method to compute their informative value. We have extracted dozen of features and characterized such social data in terms of quality and relevancy. This is a first step towards integrating relevant social media information to enhance the description of TV content as well as for generating recommendations based on social data.

该贡献提出了一个框架，通过处理社交网络数据来生成辅助的丰富电视内容元数据。基于识别权威社交媒体来源的简单标准，我们分析了Twitter短消息与电视节目内容的关系，并设计了一种计算其信息价值的方法。我们已经提取了几十个特征，并在质量和相关性方面对这些社交数据进行了表征。这是整合相关社交媒体信息以增强电视内容描述以及基于社交数据生成推荐的第一步。

引用次数: 6

Mining tweets for tag recommendation on social media 挖掘推文，在社交媒体上推荐标签

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065040

D. Correa, A. Sureka

Automatic tag recommendation or annotation can help in improving the efficiency of text-based information retrieval on online social media services like Blogger, Last.FM, Flickr and YouTube. In this work, we investigate alternate solutions for tag recommendations by employing a Wisdom of Crowd approach in a mashup framework. In particular, we mine tweets on Twitter and use their hashtag(s) and content to annotate videos on Flickr, Photobucket, YouTube, Dailymotion and SoundCloud. We crawl Twitter to collect a random sample of tweets containing Flickr, Photo- bucket, YouTube, Dailymotion and SoundCloud URLs. We then recommend tags for these services using hashtag(s) and content present in tweets. We use a hybrid technique (automated and manual) to validate our results on different subsets (presence / absence of hashtags, presence / absence of media tags) of data. Experimental results demonstrate that the proposed solution approach is effective and reliable.

自动标签推荐或标注有助于提高在线社交媒体服务(如Blogger, Last)基于文本的信息检索效率。FM, Flickr和YouTube。在这项工作中，我们通过在mashup框架中使用人群智慧方法来研究标签推荐的替代解决方案。特别是，我们挖掘Twitter上的推文，并使用它们的标签和内容来注释Flickr、Photobucket、YouTube、Dailymotion和SoundCloud上的视频。我们对推特进行抓取，随机收集包含Flickr、Photo- bucket、YouTube、Dailymotion和SoundCloud url的推文样本。然后，我们使用tweet中的标签和内容为这些服务推荐标签。我们使用混合技术(自动和手动)在数据的不同子集(有/没有标签，有/没有媒体标签)上验证我们的结果。实验结果表明，该方法是有效可靠的。

引用次数: 13

Characterizing Wikipedia pages using edit network motif profiles 特征维基百科页面使用编辑网络主题配置文件

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065036

Guangyu Wu, Martin Harrigan, P. Cunningham

Good Wikipedia articles are authoritative sources due to the collaboration of a number of knowledgeable contributors. This is the many eyes idea. The edit network associated with a Wikipedia article can tell us something about its quality or authoritativeness. In this paper we explore the hypothesis that the characteristics of this edit network are predictive of the quality of the corresponding article's content. We characterize the edit network using a profile of network motifs and we show that this network motif profile is predictive of the Wikipedia quality classes assigned to articles by Wikipedia editors. We further show that the network motif profile can identify outlier articles particularly in the 'Featured Article' class, the highest Wikipedia quality class.

好的维基百科文章是权威的来源，这要归功于许多知识渊博的贡献者的合作。这就是多眼理论。与维基百科文章相关的编辑网络可以告诉我们它的质量或权威性。在本文中，我们探讨了这个编辑网络的特征可以预测相应文章内容质量的假设。我们使用网络基序配置文件来描述编辑网络，并表明该网络基序配置文件可以预测维基百科编辑分配给文章的维基百科质量类。我们进一步表明，网络基序配置文件可以识别异常文章，特别是在“特色文章”类中，这是维基百科质量最高的类。

引用次数: 52

Trend-based and reputation-versed personalized news network 基于趋势和声誉的个性化新闻网络

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065027

Olga Streibel, R. Alnemr

Web users while collaborating over social networks and micro-blogging services also contribute to news coverage worldwide. News feeds come from mainstream media as well as from social networks. Often feeds from social networks are more up-to-date and, for user's view, more credible than those that come from mainstream media. But the overwhelming amount of information requires to personally filter through it until one gets what is really needed. In this paper, we describe our idea of a personalized news network built on current Web technologies and our research projects by filtering Twitter and Facebook messages using both trend mining and reputation approaches. Based on the example of Egyptian revolution, we explain the main idea of personalized news.

网络用户通过社交网络和微博客服务进行协作，同时也为全球新闻报道做出了贡献。新闻源既来自主流媒体，也来自社交网络。通常，来自社交网络的消息比来自主流媒体的消息更及时，在用户看来，也更可信。但是大量的信息需要你亲自过滤，直到你得到真正需要的。在本文中，我们描述了我们基于当前Web技术建立的个性化新闻网络的想法，以及我们的研究项目，即使用趋势挖掘和声誉方法过滤Twitter和Facebook消息。以埃及革命为例，阐述了个性化新闻的主要思想。

引用次数: 10

A comparative evaluation of personality estimation algorithms for the twin recommender system 双推荐系统中人格估计算法的比较评价

SMUC '11

Pub Date : 2011-10-28 DOI: 10.1145/2065023.2065028

Alexandra Roshchina, J. Cardiff, Paolo Rosso

The appearance of the so-called recommender systems has led to the possibility of reducing the information overload experienced by individuals searching among online resources. One of the areas of application of recommender systems is the online tourism domain where sites like TripAdvisor allow people to post reviews of various hotels to help others make a good choice when planning their trip. As the number of such reviews grows in size every day, clearly it is impractical for the individual to go through all of them. We propose the TWIN ("Tell me What I Need") Personality-based Recommender System that analyzes the textual content of the reviews and estimates the personality of the user according to the Big Five model to suggest the reviews written by "twin-minded" people. In this paper we compare a number of algorithms to select the better option for personality estimation in the task of user profile construction.

所谓的推荐系统的出现，有可能减少个人在网上资源中搜索所经历的信息过载。推荐系统的应用领域之一是在线旅游领域，像TripAdvisor这样的网站允许人们发布对各种酒店的评论，以帮助其他人在计划旅行时做出正确的选择。由于此类评论的数量每天都在增长，显然，让个人浏览所有这些评论是不切实际的。我们提出了TWIN(“Tell me What I Need”)基于个性的推荐系统，该系统分析评论的文本内容，并根据Big Five模型估计用户的个性，以推荐“双心人”撰写的评论。在本文中，我们比较了一些算法，以选择更好的选项来进行用户档案构建任务中的人格估计。

引用次数: 33

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

SMUC '11

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀