首页 > 最新文献

Proceedings of the Ninth ACM International Conference on Web Search and Data Mining最新文献

英文 中文
Beyond-Accuracy Goals, Again 又是超精准目标
M. de Rijke
{"title":"Beyond-Accuracy Goals, Again","authors":"M. de Rijke","doi":"10.1145/3539597.3572332","DOIUrl":"https://doi.org/10.1145/3539597.3572332","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"26 1","pages":"2-3"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85117540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21 - 25, 2022 WSDM '22:第十五届ACM网络搜索和数据挖掘国际会议,虚拟事件/坦佩,亚利桑那州,美国,2022年2月21日至25日
{"title":"WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21 - 25, 2022","authors":"","doi":"10.1145/3488560","DOIUrl":"https://doi.org/10.1145/3488560","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90829088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multilingual and Multimodal Hate Speech Analysis in Twitter 推特上的多语言和多模态仇恨言论分析
Gretel Liz De la Peña Sarracén
{"title":"Multilingual and Multimodal Hate Speech Analysis in Twitter","authors":"Gretel Liz De la Peña Sarracén","doi":"10.1145/3437963.3441668","DOIUrl":"https://doi.org/10.1145/3437963.3441668","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"15 1","pages":"1109-1110"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82413702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Semantic Layer Querying Tool 语义层查询工具
Renato Stoffalette João
{"title":"A Semantic Layer Querying Tool","authors":"Renato Stoffalette João","doi":"10.1145/3437963.3441710","DOIUrl":"https://doi.org/10.1145/3437963.3441710","url":null,"abstract":"","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"97 1","pages":"1101-1104"},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80667286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing the Cogno-Web Observatory: To Characterize the Dynamics of Online Social Cognition 设计认知网络观测站:表征在线社会认知的动态
Raksha Pavagada Subbanarasimha
Our understanding of the web has been evolving from a large database of information to a Socio - Cognitive Space, where humans are not just using the web but participating in the web. World wide web has evolved into the largest source of information in the history, and it continues to grow without any known agenda. The web needs to be observed and studied to understand various impacts of it on the society (both positive and negative) and shape the future of the web and the society. This gave rise to the global grid of Web Observatories which focus and observe various aspects of the web. Web Observatories aim to share and collaborate various data sets, analysis tools and applications with all web observatories across the world. We plan to design and develop a Web Observatory called to observe and understand online social cognition. We propose that the social media on the web is acting as a Marketplace of Opinions where multiple users with differing interests exchange opinions. For a given trending topic on social media, we propose a model to identify the Signature of the trending topic which characterizes the discourse around the topic.
我们对网络的理解已经从一个庞大的信息数据库演变为一个社会认知空间,在这个空间中,人类不仅使用网络,而且参与网络。万维网已经发展成为历史上最大的信息来源,并且在没有任何已知议程的情况下继续增长。网络需要被观察和研究,以了解它对社会的各种影响(积极的和消极的),并塑造网络和社会的未来。这就产生了网络观测站的全球网格,它关注和观察网络的各个方面。网络观测站旨在与世界各地的所有网络观测站共享和协作各种数据集、分析工具和应用程序。我们计划设计和开发一个网络观测站,叫做观察和理解在线社会认知。我们认为,网络上的社交媒体是一个意见的市场,不同兴趣的多个用户在这里交换意见。对于社交媒体上给定的趋势话题,我们提出了一个模型来识别趋势话题的签名,该签名表征了围绕该话题的话语。
{"title":"Designing the Cogno-Web Observatory: To Characterize the Dynamics of Online Social Cognition","authors":"Raksha Pavagada Subbanarasimha","doi":"10.1145/3289600.3291600","DOIUrl":"https://doi.org/10.1145/3289600.3291600","url":null,"abstract":"Our understanding of the web has been evolving from a large database of information to a Socio - Cognitive Space, where humans are not just using the web but participating in the web. World wide web has evolved into the largest source of information in the history, and it continues to grow without any known agenda. The web needs to be observed and studied to understand various impacts of it on the society (both positive and negative) and shape the future of the web and the society. This gave rise to the global grid of Web Observatories which focus and observe various aspects of the web. Web Observatories aim to share and collaborate various data sets, analysis tools and applications with all web observatories across the world. We plan to design and develop a Web Observatory called to observe and understand online social cognition. We propose that the social media on the web is acting as a Marketplace of Opinions where multiple users with differing interests exchange opinions. For a given trending topic on social media, we propose a model to identify the Signature of the trending topic which characterizes the discourse around the topic.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"3 1","pages":"814-815"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74657493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reinforcement Learning to Rank 强化学习排序
M. de Rijke
Interactive systems such as search engines or recommender systems are increasingly moving away from single-turn exchanges with users. Instead, series of exchanges between the user and the system are becoming mainstream, especially when users have complex needs or when the system struggles to understand the user's intent. Standard machine learning has helped us a lot in the single-turn paradigm, where we use it to predict: intent, relevance, user satisfaction, etc. When we think of search or recommendation as a series of exchanges, we need to turn to bandit algorithms to determine which action the system should take next, or to reinforcement learning to determine not just the next action but also to plan future actions and estimate their potential pay-off. The use of reinforcement learning for search and recommendations comes with a number of challenges, because of the very large action spaces, the large number of potential contexts, and noisy feedback signals characteristic for this domain. This presentation will survey some recent success stories of reinforcement learning for search, recommendation, and conversations; and will identify promising future research directions for reinforcement learning for search and recommendation.
诸如搜索引擎或推荐系统之类的交互式系统正逐渐远离与用户的单轮交换。相反,用户和系统之间的一系列交流正在成为主流,特别是当用户有复杂的需求或系统难以理解用户的意图时。标准机器学习在单回合模式中帮助了我们很多,我们用它来预测:意图、相关性、用户满意度等。当我们认为搜索或推荐是一系列的交流时,我们需要求助于强盗算法来确定系统下一步应该采取什么行动,或者求助于强化学习,不仅要确定下一步行动,还要计划未来的行动,并估计它们的潜在回报。在搜索和推荐中使用强化学习带来了许多挑战,因为这个领域有非常大的动作空间、大量的潜在上下文和噪声反馈信号特征。本演讲将调查一些最近在搜索、推荐和对话方面的强化学习的成功案例;并将为搜索和推荐的强化学习确定有希望的未来研究方向。
{"title":"Reinforcement Learning to Rank","authors":"M. de Rijke","doi":"10.1145/3289600.3291605","DOIUrl":"https://doi.org/10.1145/3289600.3291605","url":null,"abstract":"Interactive systems such as search engines or recommender systems are increasingly moving away from single-turn exchanges with users. Instead, series of exchanges between the user and the system are becoming mainstream, especially when users have complex needs or when the system struggles to understand the user's intent. Standard machine learning has helped us a lot in the single-turn paradigm, where we use it to predict: intent, relevance, user satisfaction, etc. When we think of search or recommendation as a series of exchanges, we need to turn to bandit algorithms to determine which action the system should take next, or to reinforcement learning to determine not just the next action but also to plan future actions and estimate their potential pay-off. The use of reinforcement learning for search and recommendations comes with a number of challenges, because of the very large action spaces, the large number of potential contexts, and noisy feedback signals characteristic for this domain. This presentation will survey some recent success stories of reinforcement learning for search, recommendation, and conversations; and will identify promising future research directions for reinforcement learning for search and recommendation.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"12 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88156532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Event Mining over Distributed Text Streams 分布式文本流上的事件挖掘
John Calvo Martinez
This research presents a new set of techniques to deal with event mining from different text sources, a complex set of NLP tasks which aim to extract events of interest and their components including authors, targets, locations, and event categories. Our focus is on distributed text streams, such as tweets from different news agencies, in order to accurately retrieve events and its components by combining such sources in different ways using text stream mining. Therefore this research project aims to fill the gap between batch event mining, text stream mining and distributed data mining which have been used separately to address related learning tasks. We propose a multi-task and multi-stream mining approach to combine information from multiple text streams to accurately extract and categorise events under the assumptions of stream mining. Our approach also combines ontology matching to boost accuracy under imbalanced distributions. In addition, we plan to address two relatively unexplored event mining tasks: event coreference and event synthesis. Preliminary results show the appropriateness of our proposal, which is giving an increase of around 20% on macro prequential metrics for the event classification task.
本研究提出了一套新的技术来处理来自不同文本源的事件挖掘,这是一套复杂的NLP任务,旨在提取感兴趣的事件及其组成部分,包括作者、目标、位置和事件类别。我们的重点是分布式文本流,例如来自不同新闻机构的tweet,以便通过使用文本流挖掘以不同的方式组合这些源来准确地检索事件及其组件。因此,本研究项目旨在填补批量事件挖掘、文本流挖掘和分布式数据挖掘之间的空白,这些挖掘分别用于解决相关的学习任务。我们提出了一种多任务多流挖掘方法,在流挖掘的假设下,将来自多个文本流的信息组合在一起,以准确地提取和分类事件。我们的方法还结合了本体匹配来提高不平衡分布下的准确性。此外,我们计划解决两个相对未开发的事件挖掘任务:事件共引用和事件合成。初步结果表明了我们的建议的适当性,这使得事件分类任务的宏观优先度量增加了大约20%。
{"title":"Event Mining over Distributed Text Streams","authors":"John Calvo Martinez","doi":"10.1145/3159652.3170462","DOIUrl":"https://doi.org/10.1145/3159652.3170462","url":null,"abstract":"This research presents a new set of techniques to deal with event mining from different text sources, a complex set of NLP tasks which aim to extract events of interest and their components including authors, targets, locations, and event categories. Our focus is on distributed text streams, such as tweets from different news agencies, in order to accurately retrieve events and its components by combining such sources in different ways using text stream mining. Therefore this research project aims to fill the gap between batch event mining, text stream mining and distributed data mining which have been used separately to address related learning tasks. We propose a multi-task and multi-stream mining approach to combine information from multiple text streams to accurately extract and categorise events under the assumptions of stream mining. Our approach also combines ontology matching to boost accuracy under imbalanced distributions. In addition, we plan to address two relatively unexplored event mining tasks: event coreference and event synthesis. Preliminary results show the appropriateness of our proposal, which is giving an increase of around 20% on macro prequential metrics for the event classification task.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"31 1","pages":"745-746"},"PeriodicalIF":0.0,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75388135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
New Probabilistic Models for Recommender Systems with Rich Contextual and Content Information 具有丰富上下文和内容信息的推荐系统的新概率模型
Eliezer de Souza da Silva
This project is focused on the design of probabilistic models for recommender systems and collaborative ltering by extending and creating new models to include rich contextual and content information (content, user social network, location, time, user intent, etc), and developing scalable approximate inference algorithms for these models. The working hypothesis is that big data analytics combined with probabilistic modelling, through automatically mining of various data sources and combining di erent latent factors explaining the user interaction with the items, can be used to better infer the user behaviour and generate improved recommendations. Fundamentally we are interested in the following questions: 1) Does additional contextual information improve the quality of recommender systems? 2) What factors (features, model, methods) are relevant in the design of personalized systems? 3) What is the relation between the social network structure, the user model and the information need of the user? How does the social context interferes with user preferences? How the evolution of the social network structure can explain changes in the user preference model? 4) Does the choice of approximate inference method have a signi cant impact on the quality of the system (quality- efficiency trade-offs)? To address some of this questions we started by proposing a model (Figure 1) based on Poisson factorization models [2], combining a social factorization model [1] and a topic based factorization [3]. The main idea is to combine content latent factor (topic, tags, etc) and trust between users (trust weight in a social graph) in a way that both sources of information have additive e ects in the observed ratings. In the case of Poisson models, this additive constraint will induce non-negative latent factors to be more sparse and avoid overfitting (in comparison the Gausian based models [2]. The main objective at this point is to compare models that incorporated both source of information (content and social networks). The next steps will include empirical validation. Concluding, we are interested in the interplay between large scale data mining and probabilistic modeling in the design of recommender systems. One initial approach we are pursuing is to model content and social network feature in a Poisson latent variable model. Our main objective in the future is the development of methods with competitive computational complexity to perform inference using het- erogeneous data in dynamical probabilistic models, as well as exploring the scalability limits of the models we propose.
该项目专注于为推荐系统和协同过滤设计概率模型,通过扩展和创建新模型来包含丰富的上下文和内容信息(内容、用户社交网络、位置、时间、用户意图等),并为这些模型开发可扩展的近似推理算法。工作假设是,大数据分析与概率建模相结合,通过自动挖掘各种数据源,结合解释用户与物品交互的不同潜在因素,可以更好地推断用户行为并生成改进的推荐。从根本上说,我们对以下问题感兴趣:1)额外的上下文信息是否提高了推荐系统的质量?2)哪些因素(特征、模型、方法)与个性化系统的设计相关?3)社交网络结构、用户模型与用户信息需求之间的关系是什么?社交环境是如何影响用户偏好的?社会网络结构的演变如何解释用户偏好模型的变化?4)近似推理方法的选择是否对系统的质量(质量-效率权衡)有显著影响?为了解决其中的一些问题,我们首先提出了一个基于泊松分解模型[2]的模型(图1),结合了社会分解模型[1]和基于主题的分解[3]。其主要思想是将内容潜在因素(主题、标签等)和用户之间的信任(社交图中的信任权重)结合起来,使两种信息来源在观察到的评分中具有叠加效应。在泊松模型中,与基于高斯的模型[2]相比,这种加性约束将使非负潜因子更加稀疏,避免过拟合。这里的主要目标是比较包含两个信息源(内容和社会网络)的模型。接下来的步骤将包括实证验证。最后,我们对推荐系统设计中大规模数据挖掘和概率建模之间的相互作用感兴趣。我们所追求的一种最初的方法是在泊松潜变量模型中对内容和社会网络特征进行建模。我们未来的主要目标是开发具有竞争性计算复杂度的方法,在动态概率模型中使用异构数据进行推理,以及探索我们提出的模型的可扩展性限制。
{"title":"New Probabilistic Models for Recommender Systems with Rich Contextual and Content Information","authors":"Eliezer de Souza da Silva","doi":"10.1145/3018661.3022751","DOIUrl":"https://doi.org/10.1145/3018661.3022751","url":null,"abstract":"This project is focused on the design of probabilistic models for recommender systems and collaborative ltering by extending and creating new models to include rich contextual and content information (content, user social network, location, time, user intent, etc), and developing scalable approximate inference algorithms for these models. The working hypothesis is that big data analytics combined with probabilistic modelling, through automatically mining of various data sources and combining di erent latent factors explaining the user interaction with the items, can be used to better infer the user behaviour and generate improved recommendations. Fundamentally we are interested in the following questions: 1) Does additional contextual information improve the quality of recommender systems? 2) What factors (features, model, methods) are relevant in the design of personalized systems? 3) What is the relation between the social network structure, the user model and the information need of the user? How does the social context interferes with user preferences? How the evolution of the social network structure can explain changes in the user preference model? 4) Does the choice of approximate inference method have a signi cant impact on the quality of the system (quality- efficiency trade-offs)? To address some of this questions we started by proposing a model (Figure 1) based on Poisson factorization models [2], combining a social factorization model [1] and a topic based factorization [3]. The main idea is to combine content latent factor (topic, tags, etc) and trust between users (trust weight in a social graph) in a way that both sources of information have additive e ects in the observed ratings. In the case of Poisson models, this additive constraint will induce non-negative latent factors to be more sparse and avoid overfitting (in comparison the Gausian based models [2]. The main objective at this point is to compare models that incorporated both source of information (content and social networks). The next steps will include empirical validation. Concluding, we are interested in the interplay between large scale data mining and probabilistic modeling in the design of recommender systems. One initial approach we are pursuing is to model content and social network feature in a Poisson latent variable model. Our main objective in the future is the development of methods with competitive computational complexity to perform inference using het- erogeneous data in dynamical probabilistic models, as well as exploring the scalability limits of the models we propose.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"53 1","pages":"839"},"PeriodicalIF":0.0,"publicationDate":"2017-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76903583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Feature Generation and Selection on the Heterogeneous Graph for Music Recommendation 基于异构图的音乐推荐特征生成与选择
Chun Guo
In the past decade, online music streaming services (MSS), e.g. Pandora and Spotify, experienced exponential growth. The sheer volume of music collection makes music recommendation increasingly important and the related algorithms are well-documented. In prior studies, most algorithms employed content-based model (CBM) and/or collaborative filtering (CF) [3]. The former one focuses on acoustic/signal features extracted from audio content, and the latter one investigates music rating and user listening history. Actually, MSS generated user data present significant heterogeneity. Taking user-music relationship as an example, comment, bookmark, and listening history may potentially contribute to music recommendation in very different ways. Furthermore, user and music can be implicitly related via more complex relationships, e.g., user-play-artist-perform-music. From this viewpoint, user-user, music-music or user-music relationship can be much more complex than the classical CF approach assumes. For these reasons, we model music metadata and MSS generated user data in the form of a heterogeneous graph, where 6 different types of nodes interact through 16 types of relationships. We can propose many recommendation hypotheses based on the ways users and songs are connected on this graph, in the form of meta paths. The recommendation problem, then, becomes a (supervised) random walk problem on the heterogeneous graph [2]. Unlike previous heterogeneous graph mining studies, the constructed heterogeneous graph in our case is more complex, and manually formulated meta-path based hypotheses cannot guarantee good performance. In the pilot study [2], we proposed to automatically extract all the potential meta paths within a given length on the heterogeneous graph scheme, evaluate their recommendation performance on the training data, and build a learning to rank model with the best ones. Results show that the new method can significantly enhance the recommendation performance. However, there are two problems with this approach: 1. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM 2016 February 22-25, 2016, San Francisco, CA, USA c © 2016 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3716-8/16/02. DOI: http://dx.doi.org/10.1145/2835776.2855088 including the individually best performing meta paths in the learning to rank model neglects the dependency between features; 2. it is very time consuming to calculate graph based features. Traditional feature selection methods would only work if all feature values are readily available, which would make this recommendation approach highly inefficient. In this pr
在过去的十年里,在线音乐流媒体服务(MSS),如潘多拉和Spotify,经历了指数级的增长。音乐收藏的庞大数量使得音乐推荐变得越来越重要,相关的算法也有很好的证明。在之前的研究中,大多数算法采用基于内容的模型(content-based model, CBM)和/或协同过滤(collaborative filtering, CF)[3]。前者侧重于从音频内容中提取声学/信号特征,后者研究音乐评级和用户收听历史。实际上,MSS生成的用户数据存在显著的异质性。以用户与音乐的关系为例,评论、书签和收听历史可能会以非常不同的方式对音乐推荐做出潜在贡献。此外,用户和音乐可以通过更复杂的关系隐含地联系在一起,例如,用户玩-艺术家-表演-音乐。从这个角度来看,用户-用户、音乐-音乐或用户-音乐关系可能比经典CF方法所假设的要复杂得多。由于这些原因,我们以异构图的形式对音乐元数据和MSS生成的用户数据进行建模,其中6种不同类型的节点通过16种类型的关系进行交互。我们可以根据用户和歌曲在这张图上的联系方式,以元路径的形式提出许多推荐假设。那么,推荐问题就变成了异构图上的(监督的)随机漫步问题[2]。与以往的异构图挖掘研究不同,本案例中构建的异构图更为复杂,手动制定基于元路径的假设并不能保证良好的性能。在试点研究[2]中,我们提出在异构图方案上自动提取给定长度内的所有潜在元路径,评估它们在训练数据上的推荐性能,并构建一个学习排序模型。结果表明,该方法能显著提高推荐性能。然而,这种方法存在两个问题:允许制作部分或全部作品的数字或硬拷贝供个人或课堂使用,但不收取任何费用,前提是制作或分发副本不是为了盈利或商业利益,并且副本在第一页上带有本通知和完整的引用。本作品的第三方组件的版权必须得到尊重。对于所有其他用途,请联系所有者/作者。WSDM 2016 2016年2月22-25日,旧金山,CA, USA c©2016版权归所有人/作者所有。Acm isbn 978-1-4503-3716-8/16/02。DOI: http://dx.doi.org/10.1145/2835776.2855088在学习排序模型中包含单个表现最好的元路径忽略了特征之间的依赖性;2. 基于图的特征计算非常耗时。传统的特征选择方法只有在所有特征值都可用的情况下才有效,这使得这种推荐方法效率极低。在本提案中,我们试图通过采用耿、刘、秦和李[1]提出的特征选择排序方法(FSR)来解决这两个问题。这种特征选择方法是专门为学习排序任务而开发的,它根据特征单独使用时的重要性以及它们彼此之间的相似性来评估特征。将此方法应用于基于元路径的全部特征集将非常昂贵。或者,我们在子元路径上使用它,这些子元路径是多个完整元路径的共享组件。我们从长度为1的子元路径开始,只有FSR选择的子元路径才有机会成长为长度为2的子元路径。然后我们重复这个过程,直到选定的子元路径增长到完整的路径。在每一步中,我们删除一些元路径,因为它们包含未选择的子元路径。最后,我们将导出原始元路径的子集,并通过提取较少特征的值来节省时间。在我们的初步实验中,提出的方法在效率和有效性上都优于原FSR算法。
{"title":"Feature Generation and Selection on the Heterogeneous Graph for Music Recommendation","authors":"Chun Guo","doi":"10.1145/2835776.2855088","DOIUrl":"https://doi.org/10.1145/2835776.2855088","url":null,"abstract":"In the past decade, online music streaming services (MSS), e.g. Pandora and Spotify, experienced exponential growth. The sheer volume of music collection makes music recommendation increasingly important and the related algorithms are well-documented. In prior studies, most algorithms employed content-based model (CBM) and/or collaborative filtering (CF) [3]. The former one focuses on acoustic/signal features extracted from audio content, and the latter one investigates music rating and user listening history. Actually, MSS generated user data present significant heterogeneity. Taking user-music relationship as an example, comment, bookmark, and listening history may potentially contribute to music recommendation in very different ways. Furthermore, user and music can be implicitly related via more complex relationships, e.g., user-play-artist-perform-music. From this viewpoint, user-user, music-music or user-music relationship can be much more complex than the classical CF approach assumes. For these reasons, we model music metadata and MSS generated user data in the form of a heterogeneous graph, where 6 different types of nodes interact through 16 types of relationships. We can propose many recommendation hypotheses based on the ways users and songs are connected on this graph, in the form of meta paths. The recommendation problem, then, becomes a (supervised) random walk problem on the heterogeneous graph [2]. Unlike previous heterogeneous graph mining studies, the constructed heterogeneous graph in our case is more complex, and manually formulated meta-path based hypotheses cannot guarantee good performance. In the pilot study [2], we proposed to automatically extract all the potential meta paths within a given length on the heterogeneous graph scheme, evaluate their recommendation performance on the training data, and build a learning to rank model with the best ones. Results show that the new method can significantly enhance the recommendation performance. However, there are two problems with this approach: 1. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WSDM 2016 February 22-25, 2016, San Francisco, CA, USA c © 2016 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-3716-8/16/02. DOI: http://dx.doi.org/10.1145/2835776.2855088 including the individually best performing meta paths in the learning to rank model neglects the dependency between features; 2. it is very time consuming to calculate graph based features. Traditional feature selection methods would only work if all feature values are readily available, which would make this recommendation approach highly inefficient. In this pr","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73090134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics 事件搜索和分析:在搜索和分析的语义标注语料库中检测事件
Dhruv Gupta
In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.
在这篇文章中,我提出了我在博士研究中寻求答案的问题。我设想在语义注释的帮助下分析自然语言文本,并挖掘用于导航大型文本语料库的重要事件。语义注释(如命名实体、地理位置和时态表达式)可以帮助我们从给定的语料库中挖掘事件。因此,这些事件为我们提供了发现其中隐藏的知识的有用手段。我提出了三个问题,可以帮助在语义注释的文本语料库中打开这个知识库:1 .识别重要事件;2语义搜索;3事件分析。
{"title":"Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics","authors":"Dhruv Gupta","doi":"10.1145/2835776.2855083","DOIUrl":"https://doi.org/10.1145/2835776.2855083","url":null,"abstract":"In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.","PeriodicalId":20567,"journal":{"name":"Proceedings of the Ninth ACM International Conference on Web Search and Data Mining","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78611936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1