Proceedings of the 13th International Conference on Web Search and Data Mining最新文献_第9页

Impact of Online Job Search and Job Reviews on Job Decision 网上求职和工作评论对工作决策的影响

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372184

Faiz Ahamad

Online platforms such as LinkedIn or specialized platforms such as Glassdoor are widely used by job seekers before applying for the job. These web platforms have rating and reviews about employer and jobs. Hence a job seeker do online search for the employer, before applying for the job. They try to find if the employer and job is good for them or not, what are the pros and cons of working there etc. Therefore, these reviews and ratings have an impact on job seekers decision as it portrays the pros and cons of working in a particular firm. Hence, the main objective of this study is main objective of this study is to find how the job seekers search for online employer reviews and the impact of these reviews on employer attractiveness and job pursuit intention. The other objective is to find the most crucial job factors that are given priority by the employee. For this, the study is proposed to be conducted in two stages, first, collecting data from the website Glassdoor, having 600000 companies' reviews. In the second stage, conducting an experimental study to examine the influence of job attributes (high vs. low) and employer rating (high vs. low) on job choice and employer attractiveness.

求职者在申请工作之前广泛使用LinkedIn等在线平台或Glassdoor等专业平台。这些网络平台有对雇主和工作的评级和评论。因此，求职者在申请工作之前会在网上搜索雇主。他们试图找出雇主和工作是否适合他们，在那里工作的利弊是什么等等。因此，这些评价和评级对求职者的决定有影响，因为它描绘了在特定公司工作的利弊。因此，本研究的主要目的是研究求职者如何搜索在线雇主评论，以及这些评论对雇主吸引力和求职意向的影响。另一个目标是找到员工优先考虑的最重要的工作因素。为此，本研究拟分两个阶段进行，首先，从Glassdoor网站收集数据，该网站有60万家公司的评论。在第二阶段，进行实验研究，以检验工作属性(高与低)和雇主评级(高与低)对工作选择和雇主吸引力的影响。

{"title":"Impact of Online Job Search and Job Reviews on Job Decision","authors":"Faiz Ahamad","doi":"10.1145/3336191.3372184","DOIUrl":"https://doi.org/10.1145/3336191.3372184","url":null,"abstract":"Online platforms such as LinkedIn or specialized platforms such as Glassdoor are widely used by job seekers before applying for the job. These web platforms have rating and reviews about employer and jobs. Hence a job seeker do online search for the employer, before applying for the job. They try to find if the employer and job is good for them or not, what are the pros and cons of working there etc. Therefore, these reviews and ratings have an impact on job seekers decision as it portrays the pros and cons of working in a particular firm. Hence, the main objective of this study is main objective of this study is to find how the job seekers search for online employer reviews and the impact of these reviews on employer attractiveness and job pursuit intention. The other objective is to find the most crucial job factors that are given priority by the employee. For this, the study is proposed to be conducted in two stages, first, collecting data from the website Glassdoor, having 600000 companies' reviews. In the second stage, conducting an experimental study to examine the influence of job attributes (high vs. low) and employer rating (high vs. low) on job choice and employer attractiveness.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128397712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Ad Close Mitigation for Improved User Experience in Native Advertisements 缓解广告关闭以改善原生广告的用户体验

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371798

Natalia Silberstein, O. Somekh, Yair Koren, M. Aharon, Dror Porat, Avi Shahar, Tingyi Wu

Verizon Media native advertising (also known as Yahoo Gemini native) serves billions of ad impressions daily, reaching several hundreds of millions USD in revenue yearly. Although we strive to provide the best experience for our users, there will always be some users that dislike our ads in certain cases. To address these situations Gemini native platform provides an ad close mechanism that enables users to close ads that they dislike and also to provide a reasoning for their action. Surprisingly, users do care about their ad experience and their engagement with the ad close mechanism is quite significant. While the ad close rate (ACR) is lower than the click through rate (CTR), they are of the same order of magnitude, especially on Yahoo mail properties. Since ad close events indicate bad user experience caused mostly by poor ad quality, we would like to exploit the ad close signals to improve user experience and reduce the number of ad close events while maintaining a predefined total revenue loss. In this work we present our ad close mitigation (ACM) solution that penalizes ads with high closing likelihood, in our auctions. In particular, we use the ad close signal and other available features to predict the probability of an ad close event, and calculate the expected loss due to such event for using the true expected revenue in the auction. We show that this approach fundamentally changes the generalized second price (GSP) auction and provides incentive for advertisers to improve their ads' quality. Our solution was tested in both offline and large scale online settings, serving real Gemini native traffic. Results of the online experiment show that we are able to reduce the number of ad close events by more than 20%, while decreasing the revenue in less than 0.4%. In addition, we present a large scale analysis of the ad close signal that supports various design decisions and sheds light on ways the ad close mechanism affects different crowds.

Verizon Media原生广告(也被称为Yahoo Gemini原生广告)每天提供数十亿的广告印象，每年达到数亿美元的收入。虽然我们努力为用户提供最好的体验，但总会有一些用户在某些情况下不喜欢我们的广告。为了解决这些情况，Gemini原生平台提供了一个广告关闭机制，允许用户关闭他们不喜欢的广告，并为他们的行为提供一个理由。令人惊讶的是，用户确实关心他们的广告体验，他们对广告关闭机制的参与度相当高。虽然广告点击率(ACR)低于点击率(CTR)，但它们的数量级是相同的，尤其是在雅虎邮件属性上。由于广告关闭事件表明糟糕的用户体验主要是由于广告质量差造成的，我们希望利用广告关闭信号来改善用户体验，减少广告关闭事件的数量，同时保持预定义的总收入损失。在这项工作中，我们提出了我们的广告关闭缓解(ACM)解决方案，在我们的拍卖中惩罚具有高关闭可能性的广告。特别是，我们使用广告关闭信号和其他可用的特征来预测广告关闭事件的概率，并使用拍卖中的真实预期收入来计算该事件导致的预期损失。我们表明，这种方法从根本上改变了广义第二价格(GSP)拍卖，并为广告商提供了提高广告质量的激励。我们的解决方案在离线和大规模在线设置中进行了测试，服务于真实的Gemini本地流量。在线实验结果表明，我们能够将广告关闭事件的数量减少20%以上，而收入减少不到0.4%。此外，我们对广告关闭信号进行了大规模分析，该分析支持各种设计决策，并阐明了广告关闭机制影响不同人群的方式。

{"title":"Ad Close Mitigation for Improved User Experience in Native Advertisements","authors":"Natalia Silberstein, O. Somekh, Yair Koren, M. Aharon, Dror Porat, Avi Shahar, Tingyi Wu","doi":"10.1145/3336191.3371798","DOIUrl":"https://doi.org/10.1145/3336191.3371798","url":null,"abstract":"Verizon Media native advertising (also known as Yahoo Gemini native) serves billions of ad impressions daily, reaching several hundreds of millions USD in revenue yearly. Although we strive to provide the best experience for our users, there will always be some users that dislike our ads in certain cases. To address these situations Gemini native platform provides an ad close mechanism that enables users to close ads that they dislike and also to provide a reasoning for their action. Surprisingly, users do care about their ad experience and their engagement with the ad close mechanism is quite significant. While the ad close rate (ACR) is lower than the click through rate (CTR), they are of the same order of magnitude, especially on Yahoo mail properties. Since ad close events indicate bad user experience caused mostly by poor ad quality, we would like to exploit the ad close signals to improve user experience and reduce the number of ad close events while maintaining a predefined total revenue loss. In this work we present our ad close mitigation (ACM) solution that penalizes ads with high closing likelihood, in our auctions. In particular, we use the ad close signal and other available features to predict the probability of an ad close event, and calculate the expected loss due to such event for using the true expected revenue in the auction. We show that this approach fundamentally changes the generalized second price (GSP) auction and provides incentive for advertisers to improve their ads' quality. Our solution was tested in both offline and large scale online settings, serving real Gemini native traffic. Results of the online experiment show that we are able to reduce the number of ad close events by more than 20%, while decreasing the revenue in less than 0.4%. In addition, we present a large scale analysis of the ad close signal that supports various design decisions and sheds light on ways the ad close mechanism affects different crowds.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131009432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Learning a Joint Search and Recommendation Model from User-Item Interactions 从用户-项目交互中学习联合搜索和推荐模型

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371818

Hamed Zamani

Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.

现有的信息检索排序学习模型是基于显式或隐式查询文档相关信息进行训练的。在本文中，我们研究了基于用户-项目交互的检索模型学习任务。我们的模型对于具有丰富的用户-项目交互数据的系统具有潜在的应用，例如浏览和推荐，其中需要具有准确的搜索引擎。这包括流媒体服务和电子商务网站等。受协同过滤的神经方法和信息检索的语言建模方法的启发，我们的模型被联合优化以预测用户-物品交互和重建物品文本描述。更详细地说，我们的模型学习用户和项目表示，这样它们就可以准确地预测未来的用户-项目交互，同时为每个项目生成有效的一元语言模型。我们在电影和产品搜索和推荐的背景下对四个不同数据集进行的实验表明，除了提供与最先进的混合推荐模型相当的性能外，我们的模型实质上优于竞争性检索基线。

{"title":"Learning a Joint Search and Recommendation Model from User-Item Interactions","authors":"Hamed Zamani","doi":"10.1145/3336191.3371818","DOIUrl":"https://doi.org/10.1145/3336191.3371818","url":null,"abstract":"Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133043342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding 属性网络嵌入的抗离群无监督深度架构

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371788

S. Bandyopadhyay, N. Lokesh, Saley Vishal Vivek, M. Murty

Attributed network embedding is the task to learn a lower dimensional vector representation of the nodes of an attributed network, which can be used further for downstream network mining tasks. Nodes in a network exhibit community structure and most of the network embedding algorithms work well when the nodes, along with their attributes, adhere to the community structure of the network. But real life networks come with community outlier nodes, which deviate significantly in terms of their link structure or attribute similarities from the other nodes of the community they belong to. These outlier nodes, if not processed carefully, can even affect the embeddings of the other nodes in the network. Thus, a node embedding framework for dealing with both the link structure and attributes in the presence of outliers in an unsupervised setting is practically important. In this work, we propose a deep unsupervised autoencoders based solution which minimizes the effect of outlier nodes while generating the network embedding. We use both stochastic gradient descent and closed form updates for faster optimization of the network parameters. We further explore the role of adversarial learning for this task, and propose a second unsupervised deep model which learns by discriminating the structure and the attribute based embeddings of the network and minimizes the effect of outliers in a coupled way. Our experiments show the merit of these deep models to detect outliers and also the superiority of the generated network embeddings for different downstream mining tasks. To the best of our knowledge, these are the first unsupervised non linear approaches that reduce the effect of the outlier nodes while generating Network Embedding.

属性网络嵌入是学习属性网络节点的低维向量表示的任务，可以进一步用于下游网络挖掘任务。网络中的节点表现出社区结构，当节点及其属性遵循网络的社区结构时，大多数网络嵌入算法都能很好地工作。但现实生活中的网络存在社区离群节点，这些节点在链接结构或属性相似性方面与其所属社区的其他节点存在显著偏差。这些异常节点如果处理不当，甚至会影响网络中其他节点的嵌入。因此，在无监督设置中处理异常值存在下的链接结构和属性的节点嵌入框架具有重要的实际意义。在这项工作中，我们提出了一种基于深度无监督自编码器的解决方案，该方案在生成网络嵌入时最大限度地减少了离群节点的影响。我们使用随机梯度下降和封闭形式更新来更快地优化网络参数。我们进一步探讨了对抗学习在这项任务中的作用，并提出了第二种无监督深度模型，该模型通过区分网络的结构和基于属性的嵌入来学习，并以耦合的方式最小化异常值的影响。我们的实验表明了这些深度模型在检测异常值方面的优点，以及生成的网络嵌入在不同的下游挖掘任务中的优越性。据我们所知，这些是在生成网络嵌入时减少离群节点影响的第一个无监督非线性方法。

{"title":"Outlier Resistant Unsupervised Deep Architectures for Attributed Network Embedding","authors":"S. Bandyopadhyay, N. Lokesh, Saley Vishal Vivek, M. Murty","doi":"10.1145/3336191.3371788","DOIUrl":"https://doi.org/10.1145/3336191.3371788","url":null,"abstract":"Attributed network embedding is the task to learn a lower dimensional vector representation of the nodes of an attributed network, which can be used further for downstream network mining tasks. Nodes in a network exhibit community structure and most of the network embedding algorithms work well when the nodes, along with their attributes, adhere to the community structure of the network. But real life networks come with community outlier nodes, which deviate significantly in terms of their link structure or attribute similarities from the other nodes of the community they belong to. These outlier nodes, if not processed carefully, can even affect the embeddings of the other nodes in the network. Thus, a node embedding framework for dealing with both the link structure and attributes in the presence of outliers in an unsupervised setting is practically important. In this work, we propose a deep unsupervised autoencoders based solution which minimizes the effect of outlier nodes while generating the network embedding. We use both stochastic gradient descent and closed form updates for faster optimization of the network parameters. We further explore the role of adversarial learning for this task, and propose a second unsupervised deep model which learns by discriminating the structure and the attribute based embeddings of the network and minimizes the effect of outliers in a coupled way. Our experiments show the merit of these deep models to detect outliers and also the superiority of the generated network embeddings for different downstream mining tasks. To the best of our knowledge, these are the first unsupervised non linear approaches that reduce the effect of the outlier nodes while generating Network Embedding.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133878189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

Jointly Optimized Neural Coreference Resolution with Mutual Attention 基于相互关注的联合优化神经关联分辨率

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371787

Jie Ma, Jun Liu, Yufei Li, Xin Hu, Yudai Pan, Shen Sun, Qika Lin

Coreference resolution aims at recognizing different forms in a document which refer to the same entity in the real world. Although many models have been proposed and achieved success, there still exist some challenges. Recent models that use recurrent neural networks to obtain mention representations ignore dependencies between spans and their proceeding distant spans, which will lead to predicted clusters that are locally consistent but globally inconsistent. In addition, these models are trained only by maximizing the marginal likelihood of gold antecedent spans from coreference clusters, which will make some gold mentions undetectable and cause unsatisfactory coreference results. To address these challenges, we propose a neural coreference resolution model. It employs mutual attention to take into account the dependencies between spans and their proceeding spans directly (use attention mechanism to capture global information between spans and their proceeding spans). And our model is trained by jointly optimizing mention clustering and imbalanced mention detection, which enables it to detect more gold mentions in a document to make more accurate coreference decisions. Experimental results on the CoNLL-2012 English dataset show that our model can detect the most gold mentions and achieve the state-of-the-art coreference performance compared with baselines.

共同参照决议的目的是识别文件中涉及现实世界中同一实体的不同形式。虽然提出了许多模式并取得了成功，但仍存在一些挑战。最近使用递归神经网络来获得提及表示的模型忽略了跨度和其继续的远跨度之间的依赖关系，这将导致预测的聚类局部一致但全局不一致。此外，这些模型仅通过最大化共参考聚类中gold先行词跨度的边际似然来训练，这将使某些gold提及无法被检测到，从而导致不满意的共参考结果。为了解决这些问题，我们提出了一个神经共参考解析模型。它采用相互关注的方式来直接考虑跨度及其继续跨度之间的依赖关系(使用关注机制来捕获跨度及其继续跨度之间的全局信息)。我们的模型通过联合优化提及聚类和不平衡提及检测来训练，使其能够在一篇文档中检测到更多的黄金提及，从而做出更准确的共参考决策。在CoNLL-2012英文数据集上的实验结果表明，与基线相比，我们的模型可以检测到最多的黄金提及，并达到了最先进的共同参考性能。

{"title":"Jointly Optimized Neural Coreference Resolution with Mutual Attention","authors":"Jie Ma, Jun Liu, Yufei Li, Xin Hu, Yudai Pan, Shen Sun, Qika Lin","doi":"10.1145/3336191.3371787","DOIUrl":"https://doi.org/10.1145/3336191.3371787","url":null,"abstract":"Coreference resolution aims at recognizing different forms in a document which refer to the same entity in the real world. Although many models have been proposed and achieved success, there still exist some challenges. Recent models that use recurrent neural networks to obtain mention representations ignore dependencies between spans and their proceeding distant spans, which will lead to predicted clusters that are locally consistent but globally inconsistent. In addition, these models are trained only by maximizing the marginal likelihood of gold antecedent spans from coreference clusters, which will make some gold mentions undetectable and cause unsatisfactory coreference results. To address these challenges, we propose a neural coreference resolution model. It employs mutual attention to take into account the dependencies between spans and their proceeding spans directly (use attention mechanism to capture global information between spans and their proceeding spans). And our model is trained by jointly optimizing mention clustering and imbalanced mention detection, which enables it to detect more gold mentions in a document to make more accurate coreference decisions. Experimental results on the CoNLL-2012 English dataset show that our model can detect the most gold mentions and achieve the state-of-the-art coreference performance compared with baselines.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128326785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Can Deep Learning Only Be Neural Networks? 深度学习只能是神经网络吗?

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372190

Zhi-Hua Zhou

The word "deep learning" is generally regarded as a synonym of "deep neural networks (DNNs)". In this talk, we will discuss on essentials in deep learning and claim that deep learning is not necessarily to be realized by neural networks and differentiable modules. We will then present an exploration to non-NN style deep learning, where the building blocks are non-differentiable modules and the training process does not rely on backpropagation or gradient-based adjustment. We will also talk about some recent advances and challenges in this direction of research.

“深度学习”一词通常被认为是“深度神经网络”(deep neural networks, dnn)的同义词。在这次演讲中，我们将讨论深度学习的要点，并声称深度学习不一定要通过神经网络和可微模块来实现。然后，我们将对非nn风格的深度学习进行探索，其中构建块是不可微的模块，并且训练过程不依赖于反向传播或基于梯度的调整。我们还将讨论这一研究方向的一些最新进展和挑战。

引用次数: 0

Nearly Linear Time Algorithm for Mean Hitting Times of Random Walks on a Graph 图上随机行走平均命中次数的近线性时间算法

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371777

Zuobai Zhang, Wanyue Xu, Zhongzhi Zhang

For random walks on a graph, the mean hitting time $H_j$ from a vertex i chosen from the stationary distribution to the target vertex j can be used as a measure of importance for vertex j, while the Kemeny constant K is the mean hitting time from a vertex i to a vertex j selected randomly according to the stationary distribution. Both quantities have found a large variety of applications in different areas. However, their high computational complexity limits their applications, especially for large networks with millions of vertices. In this paper, we first establish a connection between the two quantities, representing K in terms of $H_j$ for all vertices. We then express both quantities in terms of quadratic forms of the pseudoinverse for graph Laplacian, based on which we develop an efficient algorithm that provides an approximation of $H_j$ for all vertices and K in nearly linear time with respect to the edge number, with high probability. Extensive experiment results on real-life and model networks validate both the efficiency and accuracy of the proposed algorithm.

对于图上的随机行走，从平稳分布中选择的顶点i到目标顶点j的平均命中时间$H_j$可以作为顶点j重要性的度量，而Kemeny常数K是根据平稳分布随机选择的顶点i到目标顶点j的平均命中时间。这两种量都在不同的领域得到了广泛的应用。然而，它们的高计算复杂度限制了它们的应用，特别是对于具有数百万个顶点的大型网络。在本文中，我们首先建立了两个量之间的联系，对所有顶点用$H_j$表示K。然后，我们用图拉普拉斯伪逆的二次形式来表示这两个量，在此基础上，我们开发了一种有效的算法，该算法提供了所有顶点和K在近线性时间内关于边数的近似H_j$，具有高概率。在实际网络和模型网络上的大量实验结果验证了该算法的效率和准确性。

引用次数: 5

From Missing Data to Boltzmann Distributions and Time Dynamics: The Statistical Physics of Recommendation 从缺失数据到玻尔兹曼分布和时间动力学:推荐的统计物理

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3372193

Ed H. Chi

The challenge of building a good recommendation system is deeply connected to missing data---unknown features and labels to suggest the most "valuable" items to the user. The mysterious properties of the power law distributions that generally arises out of recommender (and social systems in general) create skewed and long-tailed consumption patterns that are often still puzzling to many of us. Missing data and skewed distributions create not just accuracy and recall problems, but also capacity allocation problems, which are at the roots of recent debate on inclusiveness and responsibility. So how do we move forward in the face of these immense conceptual and practical issues? In our work, we have been asking ourselves ways to deriving insights from first principles and drawing inspiration from fields like statistical physics. Surprised, one might ask---what does the field of physics has to do with missing data in ranking and recommendations? As we all know, in the field of information systems, concepts like information entropy and probability have a rich intellectual history. This history is deeply connected to the greatest discoveries of science in the 19th century---statistical mechanics, thermodynamics, and specific concepts like thermal equilibrium. In this talk, I will take us on a journey connecting Boltzmann distribution and partition functions from statistical mechanics with importance weighting for learning better softmax functions, and then further to reinforcement learning, where we can plan better explorations using off-policy correction with policy gradient approaches. As I shall show, these techniques enable us to reason about missing data features, labels, and time dynamic patterns from our data.

建立一个好的推荐系统的挑战与缺失的数据密切相关——未知的特征和标签向用户推荐最“有价值”的商品。幂律分布的神秘属性通常是由推荐人(以及一般的社会系统)产生的，它创造了扭曲和长尾的消费模式，这对我们许多人来说仍然是一个谜。缺失的数据和扭曲的分布不仅会造成准确性和召回问题，还会造成能力分配问题，这是最近关于包容性和责任的辩论的根源。那么，面对这些巨大的概念和实际问题，我们如何向前迈进呢?在我们的工作中，我们一直在问自己如何从第一原理中获得见解，并从统计物理学等领域汲取灵感。有人可能会惊讶地问——物理学领域与排名和推荐中缺失的数据有什么关系?众所周知，在信息系统领域，信息熵、概率等概念有着丰富的思想史。这段历史与19世纪最伟大的科学发现——统计力学、热力学和热平衡等具体概念——密切相关。在这次演讲中，我将带领我们从统计力学中连接玻尔兹曼分布和配分函数，通过重要性加权来学习更好的softmax函数，然后进一步到强化学习，在那里我们可以使用策略梯度方法来规划更好的探索。正如我将展示的那样，这些技术使我们能够从数据中推断缺失的数据特征、标签和时间动态模式。

{"title":"From Missing Data to Boltzmann Distributions and Time Dynamics: The Statistical Physics of Recommendation","authors":"Ed H. Chi","doi":"10.1145/3336191.3372193","DOIUrl":"https://doi.org/10.1145/3336191.3372193","url":null,"abstract":"The challenge of building a good recommendation system is deeply connected to missing data---unknown features and labels to suggest the most \"valuable\" items to the user. The mysterious properties of the power law distributions that generally arises out of recommender (and social systems in general) create skewed and long-tailed consumption patterns that are often still puzzling to many of us. Missing data and skewed distributions create not just accuracy and recall problems, but also capacity allocation problems, which are at the roots of recent debate on inclusiveness and responsibility. So how do we move forward in the face of these immense conceptual and practical issues? In our work, we have been asking ourselves ways to deriving insights from first principles and drawing inspiration from fields like statistical physics. Surprised, one might ask---what does the field of physics has to do with missing data in ranking and recommendations? As we all know, in the field of information systems, concepts like information entropy and probability have a rich intellectual history. This history is deeply connected to the greatest discoveries of science in the 19th century---statistical mechanics, thermodynamics, and specific concepts like thermal equilibrium. In this talk, I will take us on a journey connecting Boltzmann distribution and partition functions from statistical mechanics with importance weighting for learning better softmax functions, and then further to reinforcement learning, where we can plan better explorations using off-policy correction with policy gradient approaches. As I shall show, these techniques enable us to reason about missing data features, labels, and time dynamic patterns from our data.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122452339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Temporal Context-Aware Representation Learning for Question Routing 问题路由的时态上下文感知表示学习

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371847

Xuchao Zhang, Wei Cheng, Bo Zong, Yuncong Chen, Jianwu Xu, Ding Li, Haifeng Chen

Question routing (QR) aims at recommending newly posted questions to the potential answerers who are most likely to answer the questions. The existing approaches that learn users' expertise from their past question-answering activities usually suffer from challenges in two aspects: 1) multi-faceted expertise and 2) temporal dynamics in the answering behavior. This paper proposes a novel temporal context-aware model in multiple granularities of temporal dynamics that concurrently address the above challenges. Specifically, the temporal context-aware attention characterizes the answerer's multi-faceted expertise in terms of the questions' semantic and temporal information simultaneously. Moreover, the design of the multi-shift and multi-resolution module enables our model to handle temporal impact on different time granularities. Extensive experiments on six datasets from different domains demonstrate that the proposed model significantly outperforms competitive baseline models.

问题路由(QR)旨在将新发布的问题推荐给最有可能回答问题的潜在答题者。现有的从用户过去的问答活动中学习用户专业知识的方法通常面临两个方面的挑战:1)专业知识的多面性和2)回答行为的时间动态性。本文提出了一种新的时间动态多粒度的时间上下文感知模型，同时解决了上述挑战。具体而言，时间上下文感知注意同时表征了回答者在问题语义和时间信息方面的多面专业知识。此外，多位移和多分辨率模块的设计使我们的模型能够处理不同时间粒度的时间影响。在不同领域的六个数据集上进行的大量实验表明，所提出的模型显著优于竞争性基线模型。

引用次数: 23

Automatic Speaker Recognition with Limited Data 有限数据的自动说话人识别

Proceedings of the 13th International Conference on Web Search and Data Mining

Pub Date : 2020-01-20 DOI: 10.1145/3336191.3371802

Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang

Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.

自动说话人识别(ASR)是实现语义多媒体理解的基石技术，对多种下游应用都有好处。近年来，基于神经网络的ASR方法在训练数据充足的情况下取得了优异的识别性能。然而，为每个用户收集足够的训练数据是不切实际的，特别是对于新用户。因此，很大一部分用户通常只有非常有限的训练实例。因此，训练数据的缺乏阻碍了ASR系统准确地学习用户的声学生物特征，危及下游应用，并最终损害用户体验。在这项工作中，我们提出了一个对抗性的基于少量学习的说话人识别框架(AFEASI)，以开发仅使用有限数量的训练实例的鲁棒说话人识别模型。我们首先采用基于度量学习的少镜头学习来学习说话人的声学表征，其中综合利用有限的实例来提高识别性能。此外，利用对抗学习进一步增强了对抗性样本说话人识别的泛化性和鲁棒性。在公开可用的大规模数据集上进行的实验表明，模型显著优于11种基线方法。进一步的分析表明了该方法的有效性和鲁棒性。

{"title":"Automatic Speaker Recognition with Limited Data","authors":"Ruirui Li, Jyun-Yu Jiang, Jiahao Liu, Chu-Cheng Hsieh, Wei Wang","doi":"10.1145/3336191.3371802","DOIUrl":"https://doi.org/10.1145/3336191.3371802","url":null,"abstract":"Automatic speaker recognition (ASR) is a stepping-stone technology towards semantic multimedia understanding and benefits versatile downstream applications. In recent years, neural network-based ASR methods have demonstrated remarkable power to achieve excellent recognition performance with sufficient training data. However, it is impractical to collect sufficient training data for every user, especially for fresh users. Therefore, a large portion of users usually has a very limited number of training instances. As a consequence, the lack of training data prevents ASR systems from accurately learning users acoustic biometrics, jeopardizes the downstream applications, and eventually impairs user experience. In this work, we propose an adversarial few-shot learning-based speaker identification framework (AFEASI) to develop robust speaker identification models with only a limited number of training instances. We first employ metric learning-based few-shot learning to learn speaker acoustic representations, where the limited instances are comprehensively utilized to improve the identification performance. In addition, adversarial learning is applied to further enhance the generalization and robustness for speaker identification with adversarial examples. Experiments conducted on a publicly available large-scale dataset demonstrate that model significantly outperforms eleven baseline methods. An in-depth analysis further indicates both effectiveness and robustness of the proposed method.","PeriodicalId":319008,"journal":{"name":"Proceedings of the 13th International Conference on Web Search and Data Mining","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123469923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26