首页 > 最新文献

Proceedings of the 13th International Conference on Web Search and Data Mining最新文献

英文 中文
ConvERSe'20: The WSDM 2020 Workshop on Conversational Systems for E-Commerce Recommendations and Search ConvERSe'20: WSDM 2020关于电子商务推荐和搜索会话系统的研讨会
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371882
Eugene Agichtein, Dilek Z. Hakkani-Tür, S. Kallumadi, S. Malmasi
Conversational systems have improved dramatically recently, and are receiving increasing attention in academic literature. These systems are also becoming adapted in E-Commerce due to increased integration of E-Commerce search and recommendation source with virtual assistants such as Alexa, Siri, and Google assistant. However, significant research challenges remain spanning areas of dialogue systems, spoken natural language processing, human-computer interaction, and search and recommender systems, which all are exacerbated with demanding requirements of E-Commerce. The purpose of this workshop is to bring together researchers and practitioners in the areas of conversational systems, human-computer interaction, information retrieval, and recommender systems. Bringing diverse research areas together into a single workshop would accelerate progress on adapting conversation systems to the E-Commerce domain, to set a research agenda, to examine how to build and share data sets, and to establish common evaluation metrics and benchmarks to drive research progress.
近年来,会话系统得到了极大的改进,并在学术文献中受到越来越多的关注。由于电子商务搜索和推荐源与虚拟助手(如Alexa、Siri和谷歌助手)的集成增加,这些系统也开始适应电子商务。然而,重大的研究挑战仍然跨越对话系统、口语自然语言处理、人机交互、搜索和推荐系统等领域,这些都随着电子商务的要求而加剧。本次研讨会的目的是将对话系统、人机交互、信息检索和推荐系统领域的研究人员和实践者聚集在一起。将不同的研究领域汇集到一个研讨会中,将加速使对话系统适应电子商务领域、制定研究议程、研究如何建立和共享数据集以及建立共同的评估指标和基准以推动研究进展等方面的进展。
{"title":"ConvERSe'20: The WSDM 2020 Workshop on Conversational Systems for E-Commerce Recommendations and Search","authors":"Eugene Agichtein, Dilek Z. Hakkani-Tür, S. Kallumadi, S. Malmasi","doi":"10.1145/3336191.3371882","DOIUrl":"https://doi.org/10.1145/3336191.3371882","url":null,"abstract":"Conversational systems have improved dramatically recently, and are receiving increasing attention in academic literature. These systems are also becoming adapted in E-Commerce due to increased integration of E-Commerce search and recommendation source with virtual assistants such as Alexa, Siri, and Google assistant. However, significant research challenges remain spanning areas of dialogue systems, spoken natural language processing, human-computer interaction, and search and recommender systems, which all are exacerbated with demanding requirements of E-Commerce. The purpose of this workshop is to bring together researchers and practitioners in the areas of conversational systems, human-computer interaction, information retrieval, and recommender systems. Bringing diverse research areas together into a single workshop would accelerate progress on adapting conversation systems to the E-Commerce domain, to set a research agenda, to examine how to build and share data sets, and to establish common evaluation metrics and benchmarks to drive research progress.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123012160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Investigating Examination Behavior in Mobile Search 调查移动搜索中的检查行为
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371797
Yukun Zheng, Jiaxin Mao, Yiqun Liu, M. Sanderson, Min Zhang, Shaoping Ma
Examination is one of the most important user interactions in Web search. A number of works studied examination behavior in Web search and helped researchers better understand how users allocate their attention on search engine result pages (SERPs). Compared to desktop search, mobile search has a number of differences such as fewer results on the screen. These differences bring in mobile-specific factors affecting users' examination behavior. However, there still lacks research on users' attention allocation mechanism via viewports in mobile search. Therefore, we design a lab-based study to collect user's rich interaction behavior in mobile search. Based on the collected data, we first analyze how users examine SERPs and allocate their attention to heterogeneous results. Then we investigate the effect of mobile-specific factors and other common factors on users allocating attention. Finally, we apply the findings of user attention allocation from the user study into click model construction efforts, which significantly improves the state-of-the-art click model. Our work brings insights into a better understanding of users' interaction patterns in mobile search and may benefit other mobile search-related research.
检查是Web搜索中最重要的用户交互之一。许多研究工作研究了网络搜索中的检查行为,并帮助研究人员更好地理解用户如何在搜索引擎结果页面(serp)上分配他们的注意力。与桌面搜索相比,移动搜索有很多不同之处,比如屏幕上的搜索结果更少。这些差异带来了影响用户考试行为的手机特有因素。然而,对于移动搜索中用户通过视口分配注意力的机制,目前还缺乏相关研究。因此,我们设计了一个基于实验室的研究来收集用户在移动搜索中的丰富交互行为。基于收集到的数据,我们首先分析了用户如何检查serp并将他们的注意力分配到异构结果上。然后,我们研究了移动特定因素和其他常见因素对用户注意力分配的影响。最后,我们将用户注意力分配的研究结果应用到点击模型的构建工作中,这大大提高了当前的点击模型。我们的工作为更好地理解移动搜索中的用户交互模式提供了见解,并可能对其他移动搜索相关的研究有益。
{"title":"Investigating Examination Behavior in Mobile Search","authors":"Yukun Zheng, Jiaxin Mao, Yiqun Liu, M. Sanderson, Min Zhang, Shaoping Ma","doi":"10.1145/3336191.3371797","DOIUrl":"https://doi.org/10.1145/3336191.3371797","url":null,"abstract":"Examination is one of the most important user interactions in Web search. A number of works studied examination behavior in Web search and helped researchers better understand how users allocate their attention on search engine result pages (SERPs). Compared to desktop search, mobile search has a number of differences such as fewer results on the screen. These differences bring in mobile-specific factors affecting users' examination behavior. However, there still lacks research on users' attention allocation mechanism via viewports in mobile search. Therefore, we design a lab-based study to collect user's rich interaction behavior in mobile search. Based on the collected data, we first analyze how users examine SERPs and allocate their attention to heterogeneous results. Then we investigate the effect of mobile-specific factors and other common factors on users allocating attention. Finally, we apply the findings of user attention allocation from the user study into click model construction efforts, which significantly improves the state-of-the-art click model. Our work brings insights into a better understanding of users' interaction patterns in mobile search and may benefit other mobile search-related research.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126872121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding 属性网络嵌入的抗离群无监督深度架构
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371788
S. Bandyopadhyay, N. Lokesh, Saley Vishal Vivek, M. Murty
Attributed network embedding is the task to learn a lower dimensional vector representation of the nodes of an attributed network, which can be used further for downstream network mining tasks. Nodes in a network exhibit community structure and most of the network embedding algorithms work well when the nodes, along with their attributes, adhere to the community structure of the network. But real life networks come with community outlier nodes, which deviate significantly in terms of their link structure or attribute similarities from the other nodes of the community they belong to. These outlier nodes, if not processed carefully, can even affect the embeddings of the other nodes in the network. Thus, a node embedding framework for dealing with both the link structure and attributes in the presence of outliers in an unsupervised setting is practically important. In this work, we propose a deep unsupervised autoencoders based solution which minimizes the effect of outlier nodes while generating the network embedding. We use both stochastic gradient descent and closed form updates for faster optimization of the network parameters. We further explore the role of adversarial learning for this task, and propose a second unsupervised deep model which learns by discriminating the structure and the attribute based embeddings of the network and minimizes the effect of outliers in a coupled way. Our experiments show the merit of these deep models to detect outliers and also the superiority of the generated network embeddings for different downstream mining tasks. To the best of our knowledge, these are the first unsupervised non linear approaches that reduce the effect of the outlier nodes while generating Network Embedding.
属性网络嵌入是学习属性网络节点的低维向量表示的任务,可以进一步用于下游网络挖掘任务。网络中的节点表现出社区结构,当节点及其属性遵循网络的社区结构时,大多数网络嵌入算法都能很好地工作。但现实生活中的网络存在社区离群节点,这些节点在链接结构或属性相似性方面与其所属社区的其他节点存在显著偏差。这些异常节点如果处理不当,甚至会影响网络中其他节点的嵌入。因此,在无监督设置中处理异常值存在下的链接结构和属性的节点嵌入框架具有重要的实际意义。在这项工作中,我们提出了一种基于深度无监督自编码器的解决方案,该方案在生成网络嵌入时最大限度地减少了离群节点的影响。我们使用随机梯度下降和封闭形式更新来更快地优化网络参数。我们进一步探讨了对抗学习在这项任务中的作用,并提出了第二种无监督深度模型,该模型通过区分网络的结构和基于属性的嵌入来学习,并以耦合的方式最小化异常值的影响。我们的实验表明了这些深度模型在检测异常值方面的优点,以及生成的网络嵌入在不同的下游挖掘任务中的优越性。据我们所知,这些是在生成网络嵌入时减少离群节点影响的第一个无监督非线性方法。
{"title":"Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding","authors":"S. Bandyopadhyay, N. Lokesh, Saley Vishal Vivek, M. Murty","doi":"10.1145/3336191.3371788","DOIUrl":"https://doi.org/10.1145/3336191.3371788","url":null,"abstract":"Attributed network embedding is the task to learn a lower dimensional vector representation of the nodes of an attributed network, which can be used further for downstream network mining tasks. Nodes in a network exhibit community structure and most of the network embedding algorithms work well when the nodes, along with their attributes, adhere to the community structure of the network. But real life networks come with community outlier nodes, which deviate significantly in terms of their link structure or attribute similarities from the other nodes of the community they belong to. These outlier nodes, if not processed carefully, can even affect the embeddings of the other nodes in the network. Thus, a node embedding framework for dealing with both the link structure and attributes in the presence of outliers in an unsupervised setting is practically important. In this work, we propose a deep unsupervised autoencoders based solution which minimizes the effect of outlier nodes while generating the network embedding. We use both stochastic gradient descent and closed form updates for faster optimization of the network parameters. We further explore the role of adversarial learning for this task, and propose a second unsupervised deep model which learns by discriminating the structure and the attribute based embeddings of the network and minimizes the effect of outliers in a coupled way. Our experiments show the merit of these deep models to detect outliers and also the superiority of the generated network embeddings for different downstream mining tasks. To the best of our knowledge, these are the first unsupervised non linear approaches that reduce the effect of the outlier nodes while generating Network Embedding.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133878189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Learning a Joint Search and Recommendation Model from User-Item Interactions 从用户-项目交互中学习联合搜索和推荐模型
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371818
Hamed Zamani
Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.
现有的信息检索排序学习模型是基于显式或隐式查询文档相关信息进行训练的。在本文中,我们研究了基于用户-项目交互的检索模型学习任务。我们的模型对于具有丰富的用户-项目交互数据的系统具有潜在的应用,例如浏览和推荐,其中需要具有准确的搜索引擎。这包括流媒体服务和电子商务网站等。受协同过滤的神经方法和信息检索的语言建模方法的启发,我们的模型被联合优化以预测用户-物品交互和重建物品文本描述。更详细地说,我们的模型学习用户和项目表示,这样它们就可以准确地预测未来的用户-项目交互,同时为每个项目生成有效的一元语言模型。我们在电影和产品搜索和推荐的背景下对四个不同数据集进行的实验表明,除了提供与最先进的混合推荐模型相当的性能外,我们的模型实质上优于竞争性检索基线。
{"title":"Learning a Joint Search and Recommendation Model from User-Item Interactions","authors":"Hamed Zamani","doi":"10.1145/3336191.3371818","DOIUrl":"https://doi.org/10.1145/3336191.3371818","url":null,"abstract":"Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133043342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Jointly Optimized Neural Coreference Resolution with Mutual Attention 基于相互关注的联合优化神经关联分辨率
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371787
Jie Ma, Jun Liu, Yufei Li, Xin Hu, Yudai Pan, Shen Sun, Qika Lin
Coreference resolution aims at recognizing different forms in a document which refer to the same entity in the real world. Although many models have been proposed and achieved success, there still exist some challenges. Recent models that use recurrent neural networks to obtain mention representations ignore dependencies between spans and their proceeding distant spans, which will lead to predicted clusters that are locally consistent but globally inconsistent. In addition, these models are trained only by maximizing the marginal likelihood of gold antecedent spans from coreference clusters, which will make some gold mentions undetectable and cause unsatisfactory coreference results. To address these challenges, we propose a neural coreference resolution model. It employs mutual attention to take into account the dependencies between spans and their proceeding spans directly (use attention mechanism to capture global information between spans and their proceeding spans). And our model is trained by jointly optimizing mention clustering and imbalanced mention detection, which enables it to detect more gold mentions in a document to make more accurate coreference decisions. Experimental results on the CoNLL-2012 English dataset show that our model can detect the most gold mentions and achieve the state-of-the-art coreference performance compared with baselines.
共同参照决议的目的是识别文件中涉及现实世界中同一实体的不同形式。虽然提出了许多模式并取得了成功,但仍存在一些挑战。最近使用递归神经网络来获得提及表示的模型忽略了跨度和其继续的远跨度之间的依赖关系,这将导致预测的聚类局部一致但全局不一致。此外,这些模型仅通过最大化共参考聚类中gold先行词跨度的边际似然来训练,这将使某些gold提及无法被检测到,从而导致不满意的共参考结果。为了解决这些问题,我们提出了一个神经共参考解析模型。它采用相互关注的方式来直接考虑跨度及其继续跨度之间的依赖关系(使用关注机制来捕获跨度及其继续跨度之间的全局信息)。我们的模型通过联合优化提及聚类和不平衡提及检测来训练,使其能够在一篇文档中检测到更多的黄金提及,从而做出更准确的共参考决策。在CoNLL-2012英文数据集上的实验结果表明,与基线相比,我们的模型可以检测到最多的黄金提及,并达到了最先进的共同参考性能。
{"title":"Jointly Optimized Neural Coreference Resolution with Mutual Attention","authors":"Jie Ma, Jun Liu, Yufei Li, Xin Hu, Yudai Pan, Shen Sun, Qika Lin","doi":"10.1145/3336191.3371787","DOIUrl":"https://doi.org/10.1145/3336191.3371787","url":null,"abstract":"Coreference resolution aims at recognizing different forms in a document which refer to the same entity in the real world. Although many models have been proposed and achieved success, there still exist some challenges. Recent models that use recurrent neural networks to obtain mention representations ignore dependencies between spans and their proceeding distant spans, which will lead to predicted clusters that are locally consistent but globally inconsistent. In addition, these models are trained only by maximizing the marginal likelihood of gold antecedent spans from coreference clusters, which will make some gold mentions undetectable and cause unsatisfactory coreference results. To address these challenges, we propose a neural coreference resolution model. It employs mutual attention to take into account the dependencies between spans and their proceeding spans directly (use attention mechanism to capture global information between spans and their proceeding spans). And our model is trained by jointly optimizing mention clustering and imbalanced mention detection, which enables it to detect more gold mentions in a document to make more accurate coreference decisions. Experimental results on the CoNLL-2012 English dataset show that our model can detect the most gold mentions and achieve the state-of-the-art coreference performance compared with baselines.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128326785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Can Deep Learning Only Be Neural Networks? 深度学习只能是神经网络吗?
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372190
Zhi-Hua Zhou
The word "deep learning" is generally regarded as a synonym of "deep neural networks (DNNs)". In this talk, we will discuss on essentials in deep learning and claim that deep learning is not necessarily to be realized by neural networks and differentiable modules. We will then present an exploration to non-NN style deep learning, where the building blocks are non-differentiable modules and the training process does not rely on backpropagation or gradient-based adjustment. We will also talk about some recent advances and challenges in this direction of research.
“深度学习”一词通常被认为是“深度神经网络”(deep neural networks, dnn)的同义词。在这次演讲中,我们将讨论深度学习的要点,并声称深度学习不一定要通过神经网络和可微模块来实现。然后,我们将对非nn风格的深度学习进行探索,其中构建块是不可微的模块,并且训练过程不依赖于反向传播或基于梯度的调整。我们还将讨论这一研究方向的一些最新进展和挑战。
{"title":"Can Deep Learning Only Be Neural Networks?","authors":"Zhi-Hua Zhou","doi":"10.1145/3336191.3372190","DOIUrl":"https://doi.org/10.1145/3336191.3372190","url":null,"abstract":"The word \"deep learning\" is generally regarded as a synonym of \"deep neural networks (DNNs)\". In this talk, we will discuss on essentials in deep learning and claim that deep learning is not necessarily to be realized by neural networks and differentiable modules. We will then present an exploration to non-NN style deep learning, where the building blocks are non-differentiable modules and the training process does not rely on backpropagation or gradient-based adjustment. We will also talk about some recent advances and challenges in this direction of research.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nearly Linear Time Algorithm for Mean Hitting Times of Random Walks on a Graph 图上随机行走平均命中次数的近线性时间算法
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371777
Zuobai Zhang, Wanyue Xu, Zhongzhi Zhang
For random walks on a graph, the mean hitting time $H_j$ from a vertex i chosen from the stationary distribution to the target vertex j can be used as a measure of importance for vertex j, while the Kemeny constant K is the mean hitting time from a vertex i to a vertex j selected randomly according to the stationary distribution. Both quantities have found a large variety of applications in different areas. However, their high computational complexity limits their applications, especially for large networks with millions of vertices. In this paper, we first establish a connection between the two quantities, representing K in terms of $H_j$ for all vertices. We then express both quantities in terms of quadratic forms of the pseudoinverse for graph Laplacian, based on which we develop an efficient algorithm that provides an approximation of $H_j$ for all vertices and K in nearly linear time with respect to the edge number, with high probability. Extensive experiment results on real-life and model networks validate both the efficiency and accuracy of the proposed algorithm.
对于图上的随机行走,从平稳分布中选择的顶点i到目标顶点j的平均命中时间$H_j$可以作为顶点j重要性的度量,而Kemeny常数K是根据平稳分布随机选择的顶点i到目标顶点j的平均命中时间。这两种量都在不同的领域得到了广泛的应用。然而,它们的高计算复杂度限制了它们的应用,特别是对于具有数百万个顶点的大型网络。在本文中,我们首先建立了两个量之间的联系,对所有顶点用$H_j$表示K。然后,我们用图拉普拉斯伪逆的二次形式来表示这两个量,在此基础上,我们开发了一种有效的算法,该算法提供了所有顶点和K在近线性时间内关于边数的近似H_j$,具有高概率。在实际网络和模型网络上的大量实验结果验证了该算法的效率和准确性。
{"title":"Nearly Linear Time Algorithm for Mean Hitting Times of Random Walks on a Graph","authors":"Zuobai Zhang, Wanyue Xu, Zhongzhi Zhang","doi":"10.1145/3336191.3371777","DOIUrl":"https://doi.org/10.1145/3336191.3371777","url":null,"abstract":"For random walks on a graph, the mean hitting time $H_j$ from a vertex i chosen from the stationary distribution to the target vertex j can be used as a measure of importance for vertex j, while the Kemeny constant K is the mean hitting time from a vertex i to a vertex j selected randomly according to the stationary distribution. Both quantities have found a large variety of applications in different areas. However, their high computational complexity limits their applications, especially for large networks with millions of vertices. In this paper, we first establish a connection between the two quantities, representing K in terms of $H_j$ for all vertices. We then express both quantities in terms of quadratic forms of the pseudoinverse for graph Laplacian, based on which we develop an efficient algorithm that provides an approximation of $H_j$ for all vertices and K in nearly linear time with respect to the edge number, with high probability. Extensive experiment results on real-life and model networks validate both the efficiency and accuracy of the proposed algorithm.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117154943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
From Missing Data to Boltzmann Distributions and Time Dynamics: The Statistical Physics of Recommendation 从缺失数据到玻尔兹曼分布和时间动力学:推荐的统计物理
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372193
Ed H. Chi
The challenge of building a good recommendation system is deeply connected to missing data---unknown features and labels to suggest the most "valuable" items to the user. The mysterious properties of the power law distributions that generally arises out of recommender (and social systems in general) create skewed and long-tailed consumption patterns that are often still puzzling to many of us. Missing data and skewed distributions create not just accuracy and recall problems, but also capacity allocation problems, which are at the roots of recent debate on inclusiveness and responsibility. So how do we move forward in the face of these immense conceptual and practical issues? In our work, we have been asking ourselves ways to deriving insights from first principles and drawing inspiration from fields like statistical physics. Surprised, one might ask---what does the field of physics has to do with missing data in ranking and recommendations? As we all know, in the field of information systems, concepts like information entropy and probability have a rich intellectual history. This history is deeply connected to the greatest discoveries of science in the 19th century---statistical mechanics, thermodynamics, and specific concepts like thermal equilibrium. In this talk, I will take us on a journey connecting Boltzmann distribution and partition functions from statistical mechanics with importance weighting for learning better softmax functions, and then further to reinforcement learning, where we can plan better explorations using off-policy correction with policy gradient approaches. As I shall show, these techniques enable us to reason about missing data features, labels, and time dynamic patterns from our data.
建立一个好的推荐系统的挑战与缺失的数据密切相关——未知的特征和标签向用户推荐最“有价值”的商品。幂律分布的神秘属性通常是由推荐人(以及一般的社会系统)产生的,它创造了扭曲和长尾的消费模式,这对我们许多人来说仍然是一个谜。缺失的数据和扭曲的分布不仅会造成准确性和召回问题,还会造成能力分配问题,这是最近关于包容性和责任的辩论的根源。那么,面对这些巨大的概念和实际问题,我们如何向前迈进呢?在我们的工作中,我们一直在问自己如何从第一原理中获得见解,并从统计物理学等领域汲取灵感。有人可能会惊讶地问——物理学领域与排名和推荐中缺失的数据有什么关系?众所周知,在信息系统领域,信息熵、概率等概念有着丰富的思想史。这段历史与19世纪最伟大的科学发现——统计力学、热力学和热平衡等具体概念——密切相关。在这次演讲中,我将带领我们从统计力学中连接玻尔兹曼分布和配分函数,通过重要性加权来学习更好的softmax函数,然后进一步到强化学习,在那里我们可以使用策略梯度方法来规划更好的探索。正如我将展示的那样,这些技术使我们能够从数据中推断缺失的数据特征、标签和时间动态模式。
{"title":"From Missing Data to Boltzmann Distributions and Time Dynamics: The Statistical Physics of Recommendation","authors":"Ed H. Chi","doi":"10.1145/3336191.3372193","DOIUrl":"https://doi.org/10.1145/3336191.3372193","url":null,"abstract":"The challenge of building a good recommendation system is deeply connected to missing data---unknown features and labels to suggest the most \"valuable\" items to the user. The mysterious properties of the power law distributions that generally arises out of recommender (and social systems in general) create skewed and long-tailed consumption patterns that are often still puzzling to many of us. Missing data and skewed distributions create not just accuracy and recall problems, but also capacity allocation problems, which are at the roots of recent debate on inclusiveness and responsibility. So how do we move forward in the face of these immense conceptual and practical issues? In our work, we have been asking ourselves ways to deriving insights from first principles and drawing inspiration from fields like statistical physics. Surprised, one might ask---what does the field of physics has to do with missing data in ranking and recommendations? As we all know, in the field of information systems, concepts like information entropy and probability have a rich intellectual history. This history is deeply connected to the greatest discoveries of science in the 19th century---statistical mechanics, thermodynamics, and specific concepts like thermal equilibrium. In this talk, I will take us on a journey connecting Boltzmann distribution and partition functions from statistical mechanics with importance weighting for learning better softmax functions, and then further to reinforcement learning, where we can plan better explorations using off-policy correction with policy gradient approaches. As I shall show, these techniques enable us to reason about missing data features, labels, and time dynamic patterns from our data.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122452339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Temporal Context-Aware Representation Learning for Question Routing 问题路由的时态上下文感知表示学习
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371847
Xuchao Zhang, Wei Cheng, Bo Zong, Yuncong Chen, Jianwu Xu, Ding Li, Haifeng Chen
Question routing (QR) aims at recommending newly posted questions to the potential answerers who are most likely to answer the questions. The existing approaches that learn users' expertise from their past question-answering activities usually suffer from challenges in two aspects: 1) multi-faceted expertise and 2) temporal dynamics in the answering behavior. This paper proposes a novel temporal context-aware model in multiple granularities of temporal dynamics that concurrently address the above challenges. Specifically, the temporal context-aware attention characterizes the answerer's multi-faceted expertise in terms of the questions' semantic and temporal information simultaneously. Moreover, the design of the multi-shift and multi-resolution module enables our model to handle temporal impact on different time granularities. Extensive experiments on six datasets from different domains demonstrate that the proposed model significantly outperforms competitive baseline models.
问题路由(QR)旨在将新发布的问题推荐给最有可能回答问题的潜在答题者。现有的从用户过去的问答活动中学习用户专业知识的方法通常面临两个方面的挑战:1)专业知识的多面性和2)回答行为的时间动态性。本文提出了一种新的时间动态多粒度的时间上下文感知模型,同时解决了上述挑战。具体而言,时间上下文感知注意同时表征了回答者在问题语义和时间信息方面的多面专业知识。此外,多位移和多分辨率模块的设计使我们的模型能够处理不同时间粒度的时间影响。在不同领域的六个数据集上进行的大量实验表明,所提出的模型显著优于竞争性基线模型。
{"title":"Temporal Context-Aware Representation Learning for Question Routing","authors":"Xuchao Zhang, Wei Cheng, Bo Zong, Yuncong Chen, Jianwu Xu, Ding Li, Haifeng Chen","doi":"10.1145/3336191.3371847","DOIUrl":"https://doi.org/10.1145/3336191.3371847","url":null,"abstract":"Question routing (QR) aims at recommending newly posted questions to the potential answerers who are most likely to answer the questions. The existing approaches that learn users' expertise from their past question-answering activities usually suffer from challenges in two aspects: 1) multi-faceted expertise and 2) temporal dynamics in the answering behavior. This paper proposes a novel temporal context-aware model in multiple granularities of temporal dynamics that concurrently address the above challenges. Specifically, the temporal context-aware attention characterizes the answerer's multi-faceted expertise in terms of the questions' semantic and temporal information simultaneously. Moreover, the design of the multi-shift and multi-resolution module enables our model to handle temporal impact on different time granularities. Extensive experiments on six datasets from different domains demonstrate that the proposed model significantly outperforms competitive baseline models.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122879738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Automatic Speaker Recognition with Limited Data 有限数据的自动说话人识别
Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371802
Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang
Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.
自动说话人识别(ASR)是实现语义多媒体理解的基石技术,对多种下游应用都有好处。近年来,基于神经网络的ASR方法在训练数据充足的情况下取得了优异的识别性能。然而,为每个用户收集足够的训练数据是不切实际的,特别是对于新用户。因此,很大一部分用户通常只有非常有限的训练实例。因此,训练数据的缺乏阻碍了ASR系统准确地学习用户的声学生物特征,危及下游应用,并最终损害用户体验。在这项工作中,我们提出了一个对抗性的基于少量学习的说话人识别框架(AFEASI),以开发仅使用有限数量的训练实例的鲁棒说话人识别模型。我们首先采用基于度量学习的少镜头学习来学习说话人的声学表征,其中综合利用有限的实例来提高识别性能。此外,利用对抗学习进一步增强了对抗性样本说话人识别的泛化性和鲁棒性。在公开可用的大规模数据集上进行的实验表明,模型显著优于11种基线方法。进一步的分析表明了该方法的有效性和鲁棒性。
{"title":"Automatic Speaker Recognition with Limited Data","authors":"Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang","doi":"10.1145/3336191.3371802","DOIUrl":"https://doi.org/10.1145/3336191.3371802","url":null,"abstract":"Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123469923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the 13th International Conference on Web Search and Data Mining
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1