Proceedings of The Web Conference 2020最新文献_第9页

Efficient Non-Sampling Factorization Machines for Optimal Context-Aware Recommendation 高效的非采样分解机器用于最优上下文感知推荐

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380303

C. Chen, Min Zhang, Weizhi Ma, Yiqun Liu, Shaoping Ma

To provide more accurate recommendation, it is a trending topic to go beyond modeling user-item interactions and take context features into account. Factorization Machines (FM) with negative sampling is a popular solution for context-aware recommendation. However, it is not robust as sampling may lost important information and usually leads to non-optimal performances in practical. Several recent efforts have enhanced FM with deep learning architectures for modelling high-order feature interactions. While they either focus on rating prediction task only, or typically adopt the negative sampling strategy for optimizing the ranking performance. Due to the dramatic fluctuation of sampling, it is reasonable to argue that these sampling-based FM methods are still suboptimal for context-aware recommendation. In this paper, we propose to learn FM without sampling for ranking tasks that helps context-aware recommendation particularly. Despite effectiveness, such a non-sampling strategy presents strong challenge in learning efficiency of the model. Accordingly, we further design a new ideal framework named Efficient Non-Sampling Factorization Machines (ENSFM). ENSFM not only seamlessly connects the relationship between FM and Matrix Factorization (MF), but also resolves the challenging efficiency issue via novel memorization strategies. Through extensive experiments on three real-world public datasets, we show that 1) the proposed ENSFM consistently and significantly outperforms the state-of-the-art methods on context-aware Top-K recommendation, and 2) ENSFM achieves significant advantages in training efficiency, which makes it more applicable to real-world large-scale systems. Moreover, the empirical results indicate that a proper learning method is even more important than advanced neural network structures for Top-K recommendation task. Our implementation has been released 1 to facilitate further developments on efficient non-sampling methods.

为了提供更准确的推荐，超越用户-物品交互建模并考虑上下文特征是一个热门话题。带有负采样的因子分解机(FM)是上下文感知推荐的一种流行解决方案。然而，由于采样可能会丢失重要信息，在实际应用中通常会导致非最优性能，因此它的鲁棒性不强。最近的一些努力已经通过深度学习架构增强了FM，用于建模高阶特征交互。而它们要么只关注评级预测任务，要么通常采用负抽样策略来优化排名性能。由于采样的剧烈波动，有理由认为这些基于采样的FM方法对于上下文感知推荐仍然是次优的。在本文中，我们建议学习不采样的FM来排序任务，这特别有助于上下文感知推荐。这种非采样策略虽然有效，但对模型的学习效率提出了很大的挑战。据此，我们进一步设计了一种新的理想框架——高效非采样分解机(ENSFM)。ENSFM不仅无缝地连接了FM和矩阵分解(MF)之间的关系，而且通过新颖的记忆策略解决了具有挑战性的效率问题。通过在三个真实世界的公共数据集上的大量实验，我们表明:1)所提出的ENSFM在上下文感知的Top-K推荐方面持续且显著优于最先进的方法;2)ENSFM在训练效率上取得了显著优势，使其更适用于真实世界的大规模系统。此外，实证结果表明，对于Top-K推荐任务，适当的学习方法比先进的神经网络结构更为重要。我们的实施已发布1，以促进有效的非抽样方法的进一步发展。

{"title":"Efficient Non-Sampling Factorization Machines for Optimal Context-Aware Recommendation","authors":"C. Chen, Min Zhang, Weizhi Ma, Yiqun Liu, Shaoping Ma","doi":"10.1145/3366423.3380303","DOIUrl":"https://doi.org/10.1145/3366423.3380303","url":null,"abstract":"To provide more accurate recommendation, it is a trending topic to go beyond modeling user-item interactions and take context features into account. Factorization Machines (FM) with negative sampling is a popular solution for context-aware recommendation. However, it is not robust as sampling may lost important information and usually leads to non-optimal performances in practical. Several recent efforts have enhanced FM with deep learning architectures for modelling high-order feature interactions. While they either focus on rating prediction task only, or typically adopt the negative sampling strategy for optimizing the ranking performance. Due to the dramatic fluctuation of sampling, it is reasonable to argue that these sampling-based FM methods are still suboptimal for context-aware recommendation. In this paper, we propose to learn FM without sampling for ranking tasks that helps context-aware recommendation particularly. Despite effectiveness, such a non-sampling strategy presents strong challenge in learning efficiency of the model. Accordingly, we further design a new ideal framework named Efficient Non-Sampling Factorization Machines (ENSFM). ENSFM not only seamlessly connects the relationship between FM and Matrix Factorization (MF), but also resolves the challenging efficiency issue via novel memorization strategies. Through extensive experiments on three real-world public datasets, we show that 1) the proposed ENSFM consistently and significantly outperforms the state-of-the-art methods on context-aware Top-K recommendation, and 2) ENSFM achieves significant advantages in training efficiency, which makes it more applicable to real-world large-scale systems. Moreover, the empirical results indicate that a proper learning method is even more important than advanced neural network structures for Top-K recommendation task. Our implementation has been released 1 to facilitate further developments on efficient non-sampling methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"0 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75973456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Intention Modeling from Ordered and Unordered Facets for Sequential Recommendation 面向顺序推荐的有序和无序面意向建模

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380190

Xueliang Guo, Chongyang Shi, Chuanming Liu

Recently, sequential recommendation has attracted substantial attention from researchers due to its status as an essential service for e-commerce. Accurately understanding user intention is an important factor to improve the performance of recommendation system. However, user intention is highly time-dependent and flexible, so it is very challenging to learn the latent dynamic intention of users for sequential recommendation. To this end, in this paper, we propose a novel intention modeling from ordered and unordered facets (IMfOU) for sequential recommendation. Specifically, the global and local item embedding (GLIE) we proposed can comprehensively capture the sequential context information in the sequences and highlight the important features that users care about. We further design ordered preference drift learning (OPDL) and unordered purchase motivation learning (UPML) to obtain user’s the process of preference drift and purchase motivation respectively. With combining the users’ dynamic preference and current motivation, it considers not only sequential dependencies between items but also flexible dependencies and models the user purchase intention more accurately from ordered and unordered facets respectively. Evaluation results on three real-world datasets demonstrate that our proposed approach achieves better performance than the state-of-the-art sequential recommendation methods achieving improvement of AUC by an average of 2.26%.

顺序推荐作为电子商务的一项重要服务，近年来引起了研究者的广泛关注。准确理解用户意图是提高推荐系统性能的重要因素。然而，用户意向具有高度的时效性和灵活性，因此，学习用户对顺序推荐的潜在动态意向是非常具有挑战性的。为此，在本文中，我们提出了一种新的基于有序和无序面(IMfOU)的顺序推荐意图模型。具体而言，我们提出的全局和局部项嵌入(GLIE)可以全面捕获序列中的序列上下文信息，并突出用户关心的重要特征。我们进一步设计有序偏好漂移学习(OPDL)和无序购买动机学习(UPML)，分别获得用户偏好漂移和购买动机的过程。结合用户的动态偏好和当前动机，既考虑了商品之间的顺序依赖关系，又考虑了商品之间的灵活依赖关系，分别从有序和无序两个方面对用户购买意愿进行了更准确的建模。在三个真实数据集上的评估结果表明，我们提出的方法比最先进的顺序推荐方法取得了更好的性能，平均提高了2.26%的AUC。

{"title":"Intention Modeling from Ordered and Unordered Facets for Sequential Recommendation","authors":"Xueliang Guo, Chongyang Shi, Chuanming Liu","doi":"10.1145/3366423.3380190","DOIUrl":"https://doi.org/10.1145/3366423.3380190","url":null,"abstract":"Recently, sequential recommendation has attracted substantial attention from researchers due to its status as an essential service for e-commerce. Accurately understanding user intention is an important factor to improve the performance of recommendation system. However, user intention is highly time-dependent and flexible, so it is very challenging to learn the latent dynamic intention of users for sequential recommendation. To this end, in this paper, we propose a novel intention modeling from ordered and unordered facets (IMfOU) for sequential recommendation. Specifically, the global and local item embedding (GLIE) we proposed can comprehensively capture the sequential context information in the sequences and highlight the important features that users care about. We further design ordered preference drift learning (OPDL) and unordered purchase motivation learning (UPML) to obtain user’s the process of preference drift and purchase motivation respectively. With combining the users’ dynamic preference and current motivation, it considers not only sequential dependencies between items but also flexible dependencies and models the user purchase intention more accurately from ordered and unordered facets respectively. Evaluation results on three real-world datasets demonstrate that our proposed approach achieves better performance than the state-of-the-art sequential recommendation methods achieving improvement of AUC by an average of 2.26%.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78489000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Power-Law Graphs Have Minimal Scaling of Kemeny Constant for Random Walks 幂律图具有最小尺度的随机漫步Kemeny常数

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380093

Wanyue Xu, Y. Sheng, Zuobai Zhang, Haibin Kan, Zhongzhi Zhang

The mean hitting time from a node i to a node j selected randomly according to the stationary distribution of random walks is called the Kemeny constant, which has found various applications. It was proved that over all graphs with N vertices, complete graphs have the exact minimum Kemeny constant, growing linearly with N. Here we study numerically or analytically the Kemeny constant on many sparse real-world and model networks with scale-free small-world topology, and show that their Kemeny constant also behaves linearly with N. Thus, sparse networks with scale-free and small-world topology are favorable architectures with optimal scaling of Kemeny constant. We then present a theoretically guaranteed estimation algorithm, which approximates the Kemeny constant for a graph in nearly linear time with respect to the number of edges. Extensive numerical experiments on model and real networks show that our approximation algorithm is both efficient and accurate.

根据随机行走的平稳分布随机选择的节点i到节点j的平均命中时间称为Kemeny常数，它有各种各样的应用。证明了在所有N个顶点的图上，完全图具有精确的最小Kemeny常数，并随N线性增长。本文对许多具有无标度小世界拓扑的稀疏现实网络和模型网络的Kemeny常数进行了数值或解析研究，并表明它们的Kemeny常数也与N呈线性关系。因此，具有无标度和小世界拓扑的稀疏网络是具有Kemeny常数最优标度的有利结构。然后，我们提出了一种理论上保证的估计算法，该算法在近线性时间内近似于图的边数的Kemeny常数。在模型和实际网络上进行的大量数值实验表明，该近似算法既有效又准确。

引用次数: 10

Learning from Cross-Modal Behavior Dynamics with Graph-Regularized Neural Contextual Bandit 基于图正则化神经上下文强盗的跨模态行为动力学学习

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380178

Xian Wu, Suleyman Cetintas, Deguang Kong, Miaoyu Lu, Jian Yang, N. Chawla

Contextual multi-armed bandit algorithms have received significant attention in modeling users’ preferences for online personalized recommender systems in a timely manner. While significant progress has been made along this direction, a few major challenges have not been well addressed yet: (i) a vast majority of the literature is based on linear models that cannot capture complex non-linear inter-dependencies of user-item interactions; (ii) existing literature mainly ignores the latent relations among users and non-recommended items: hence may not properly reflect users’ preferences in the real-world; (iii) current solutions are mainly based on historical data and are prone to cold-start problems for new users who have no interaction history. To address the above challenges, we develop a Graph Regularized Cross-modal (GRC) learning model, a general framework to exploit transferable knowledge learned from user-item interactions as well as the external features of users and items in online personalized recommendations. In particular, the GRC framework leverage a non-linearity of neural network to model complex inherent structure of user-item interactions. We further augment GRC with the cooperation of the metric learning technique and a graph-constrained embedding module, to map the units from different dimensions (temporal, social and semantic) into the same latent space. An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.

上下文多臂强盗算法在实时建模用户对在线个性化推荐系统的偏好方面受到了广泛关注。虽然沿着这个方向取得了重大进展，但一些主要挑战尚未得到很好的解决:(i)绝大多数文献是基于线性模型的，不能捕捉用户-物品交互的复杂非线性相互依赖关系;(ii)现有文献主要忽略了用户与非推荐项目之间的潜在关系，因此可能不能很好地反映现实世界中用户的偏好;(3)目前的解决方案主要基于历史数据，对于没有交互历史的新用户容易出现冷启动问题。为了解决上述挑战，我们开发了一个图正则化跨模态(GRC)学习模型，这是一个通用框架，用于利用从用户-项目交互中学习到的可转移知识，以及在线个性化推荐中用户和项目的外部特征。特别是，GRC框架利用神经网络的非线性来模拟用户-项目交互的复杂固有结构。通过度量学习技术和图约束嵌入模块的合作，我们进一步增强了GRC，将来自不同维度(时间、社会和语义)的单元映射到相同的潜在空间。在两个基准数据集以及来自主要搜索引擎的大规模专有数据集上进行了大量实验，证明了所提出的GRC模型在不同设置下有效捕获用户动态偏好的能力，并大大优于所有基线。

{"title":"Learning from Cross-Modal Behavior Dynamics with Graph-Regularized Neural Contextual Bandit","authors":"Xian Wu, Suleyman Cetintas, Deguang Kong, Miaoyu Lu, Jian Yang, N. Chawla","doi":"10.1145/3366423.3380178","DOIUrl":"https://doi.org/10.1145/3366423.3380178","url":null,"abstract":"Contextual multi-armed bandit algorithms have received significant attention in modeling users’ preferences for online personalized recommender systems in a timely manner. While significant progress has been made along this direction, a few major challenges have not been well addressed yet: (i) a vast majority of the literature is based on linear models that cannot capture complex non-linear inter-dependencies of user-item interactions; (ii) existing literature mainly ignores the latent relations among users and non-recommended items: hence may not properly reflect users’ preferences in the real-world; (iii) current solutions are mainly based on historical data and are prone to cold-start problems for new users who have no interaction history. To address the above challenges, we develop a Graph Regularized Cross-modal (GRC) learning model, a general framework to exploit transferable knowledge learned from user-item interactions as well as the external features of users and items in online personalized recommendations. In particular, the GRC framework leverage a non-linearity of neural network to model complex inherent structure of user-item interactions. We further augment GRC with the cooperation of the metric learning technique and a graph-constrained embedding module, to map the units from different dimensions (temporal, social and semantic) into the same latent space. An extensive set of experiments are conducted on two benchmark datasets as well as a large scale proprietary dataset from a major search engine demonstrates the power of the proposed GRC model in effectively capturing users’ dynamic preferences under different settings by outperforming all baselines by a large margin.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79906162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

#Outage: Detecting Power and Communication Outages from Social Networks #停电:从社交网络检测电源和通信中断

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380251

Udit Paul, Alexander Ermakov, Michael Nekrasov, V. Adarsh, E. Belding-Royer

Natural disasters are increasing worldwide at an alarming rate. To aid relief operations during and post disaster, humanitarian organizations rely on various types of situational information such as missing, trapped or injured people and damaged infrastructure in an area. Crucial and timely identification of infrastructure and utility damage is critical to properly plan and execute search and rescue operations. However, in the wake of natural disasters, real-time identification of this information becomes challenging. In this research, we investigate the use of tweets posted on the Twitter social media platform to detect power and communication outages during natural disasters. We first curate a data set of 18,097 tweets based on domain-specific keywords obtained using Latent Dirichlet Allocation. We annotate the gathered data set to separate the tweets into different types of outage-related events: power outage, communication outage and both power-communication outage. We analyze the tweets to identify information such as popular words, length of words and hashtags as well as sentiments that are associated with tweets in these outage-related categories. Furthermore, we apply machine learning algorithms to classify these tweets into their respective categories. Our results show that simple classifiers such as the boosting algorithm are able to classify outage related tweets from unrelated tweets with close to 100% f1-score. Additionally, we observe that the transfer learning model, BERT, is able to classify different categories of outage-related tweets with close to 90% accuracy in less than 90 seconds of training and testing time, demonstrating that tweets can be mined in real-time to assist first responders during natural disasters.

世界范围内的自然灾害正在以惊人的速度增加。为了在灾难期间和灾难后帮助救援行动，人道主义组织依靠各种类型的情景信息，如失踪、被困或受伤的人员以及一个地区受损的基础设施。关键和及时识别基础设施和公用事业的损害是正确规划和执行搜救行动的关键。然而，在自然灾害之后，实时识别这些信息变得具有挑战性。在这项研究中，我们调查了在自然灾害期间使用Twitter社交媒体平台上发布的推文来检测电力和通信中断。我们首先根据使用潜在狄利克雷分配获得的特定领域关键字来管理18,097条推文的数据集。我们对收集到的数据集进行注释，将tweet划分为不同类型的与中断相关的事件:停电、通信中断和电力通信中断。我们分析推文，以识别诸如流行词、单词长度和标签等信息，以及与这些与中断相关类别的推文相关的情绪。此外，我们应用机器学习算法将这些推文分类到各自的类别中。我们的结果表明，简单的分类器(如增强算法)能够以接近100%的f1得分将与停电相关的推文从不相关的推文中分类出来。此外，我们观察到迁移学习模型BERT能够在不到90秒的训练和测试时间内以接近90%的准确率对不同类别的停机相关推文进行分类，这表明推文可以实时挖掘，以帮助自然灾害期间的急救人员。

{"title":"#Outage: Detecting Power and Communication Outages from Social Networks","authors":"Udit Paul, Alexander Ermakov, Michael Nekrasov, V. Adarsh, E. Belding-Royer","doi":"10.1145/3366423.3380251","DOIUrl":"https://doi.org/10.1145/3366423.3380251","url":null,"abstract":"Natural disasters are increasing worldwide at an alarming rate. To aid relief operations during and post disaster, humanitarian organizations rely on various types of situational information such as missing, trapped or injured people and damaged infrastructure in an area. Crucial and timely identification of infrastructure and utility damage is critical to properly plan and execute search and rescue operations. However, in the wake of natural disasters, real-time identification of this information becomes challenging. In this research, we investigate the use of tweets posted on the Twitter social media platform to detect power and communication outages during natural disasters. We first curate a data set of 18,097 tweets based on domain-specific keywords obtained using Latent Dirichlet Allocation. We annotate the gathered data set to separate the tweets into different types of outage-related events: power outage, communication outage and both power-communication outage. We analyze the tweets to identify information such as popular words, length of words and hashtags as well as sentiments that are associated with tweets in these outage-related categories. Furthermore, we apply machine learning algorithms to classify these tweets into their respective categories. Our results show that simple classifiers such as the boosting algorithm are able to classify outage related tweets from unrelated tweets with close to 100% f1-score. Additionally, we observe that the transfer learning model, BERT, is able to classify different categories of outage-related tweets with close to 90% accuracy in less than 90 seconds of training and testing time, demonstrating that tweets can be mined in real-time to assist first responders during natural disasters.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83573510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter 利用情感分布来区分Twitter上的健康报告

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380198

Rhys Biddle, Aditya Joshi, Shaowu Liu, Cécile Paris, Guandong Xu

Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as ‘health mention classification’) amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets.

利用来自社交媒体的数据来监测卫生事件是公共卫生监测的一个有希望的途径。关键的一步是在提及疾病词汇的tweet中检测疾病报告(称为“健康提及分类”)。先前的工作表明，疾病词的比喻用法可能被证明是健康提及分类的挑战。由于疾病的经历与负面情绪相关，我们提出了一种利用情绪信息来改进健康提及分类的方法。具体来说，我们的健康提及分类器将预先训练的上下文单词表示与tweet中单词的情感分布相结合。在我们的实验中，我们扩展了tweet的基准数据集，用于健康提及分类，在疾病中添加了超过14k条手动注释的tweet。此外，我们还在每条tweet上标注了一个标签，以指示是否在比喻意义上使用疾病词。我们的分类器在检测与健康相关的和提到疾病词的比喻性推文方面优于当前的SOTA方法。我们还表明，包含疾病词的推文比与健康相关的上下文更常被象征性地提及，这证明了针对与健康相关的推文的分类器具有挑战性。

{"title":"Leveraging Sentiment Distributions to Distinguish Figurative From Literal Health Reports on Twitter","authors":"Rhys Biddle, Aditya Joshi, Shaowu Liu, Cécile Paris, Guandong Xu","doi":"10.1145/3366423.3380198","DOIUrl":"https://doi.org/10.1145/3366423.3380198","url":null,"abstract":"Harnessing data from social media to monitor health events is a promising avenue for public health surveillance. A key step is the detection of reports of a disease (referred to as ‘health mention classification’) amongst tweets that mention disease words. Prior work shows that figurative usage of disease words may prove to be challenging for health mention classification. Since the experience of a disease is associated with a negative sentiment, we present a method that utilises sentiment information to improve health mention classification. Specifically, our classifier for health mention classification combines pre-trained contextual word representations with sentiment distributions of words in the tweet. For our experiments, we extend a benchmark dataset of tweets for health mention classification, adding over 14k manually annotated tweets across diseases. We also additionally annotate each tweet with a label that indicates if the disease words are used in a figurative sense. Our classifier outperforms current SOTA approaches in detecting both health-related and figurative tweets that mention disease words. We also show that tweets containing disease words are mentioned figuratively more often than in a health-related context, proving to be challenging for classifiers targeting health-related tweets.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81958135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Deep Rating Elicitation for New Users in Collaborative Filtering 协同过滤中新用户深度评价的启发

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380042

Wonbin Kweon, SeongKu Kang, Junyoung Hwang, Hwanjo Yu

Recent recommender systems started to use rating elicitation, which asks new users to rate a small seed itemset for inferring their preferences, to improve the quality of initial recommendations. The key challenge of the rating elicitation is to choose the seed items which can best infer the new users’ preference. This paper proposes a novel end-to-end Deep learning framework for Rating Elicitation (DRE), that chooses all the seed items at a time with consideration of the non-linear interactions. To this end, it first defines categorical distributions to sample seed items from the entire itemset, then it trains both the categorical distributions and a neural reconstruction network to infer users’ preferences on the remaining items from CF information of the sampled seed items. Through the end-to-end training, the categorical distributions are learned to select the most representative seed items while reflecting the complex non-linear interactions. Experimental results show that DRE outperforms the state-of-the-art approaches in the recommendation quality by accurately inferring the new users’ preferences and its seed itemset better represents the latent space than the seed itemset obtained by the other methods.

最近的推荐系统开始使用评级启发，它要求新用户对一个小的种子项目集进行评级，以推断他们的偏好，以提高初始推荐的质量。评级启发的关键挑战是选择最能推断新用户偏好的种子项目。本文提出了一种新的端到端深度学习评级启发框架，该框架在考虑非线性相互作用的情况下一次选择所有种子项。为此，首先定义从整个项目集中采样种子项目的分类分布，然后训练分类分布和神经重构网络，从采样种子项目的CF信息中推断用户对剩余项目的偏好。通过端到端训练，学习分类分布选择最具代表性的种子项，同时反映复杂的非线性相互作用。实验结果表明，DRE能够准确地推断新用户的偏好，其种子项集比其他方法获得的种子项集更能代表潜在空间，在推荐质量上优于现有方法。

{"title":"Deep Rating Elicitation for New Users in Collaborative Filtering","authors":"Wonbin Kweon, SeongKu Kang, Junyoung Hwang, Hwanjo Yu","doi":"10.1145/3366423.3380042","DOIUrl":"https://doi.org/10.1145/3366423.3380042","url":null,"abstract":"Recent recommender systems started to use rating elicitation, which asks new users to rate a small seed itemset for inferring their preferences, to improve the quality of initial recommendations. The key challenge of the rating elicitation is to choose the seed items which can best infer the new users’ preference. This paper proposes a novel end-to-end Deep learning framework for Rating Elicitation (DRE), that chooses all the seed items at a time with consideration of the non-linear interactions. To this end, it first defines categorical distributions to sample seed items from the entire itemset, then it trains both the categorical distributions and a neural reconstruction network to infer users’ preferences on the remaining items from CF information of the sampled seed items. Through the end-to-end training, the categorical distributions are learned to select the most representative seed items while reflecting the complex non-linear interactions. Experimental results show that DRE outperforms the state-of-the-art approaches in the recommendation quality by accurately inferring the new users’ preferences and its seed itemset better represents the latent space than the seed itemset obtained by the other methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75353863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards IP-based Geolocation via Fine-grained and Stable Webcam Landmarks 通过细粒度和稳定的网络摄像头地标实现基于ip的地理定位

Proceedings of The Web Conference 2020

Pub Date : 2020-04-20 DOI: 10.1145/3366423.3380216

Zhihao Wang, Qiang Li, Jinke Song, Haining Wang, Limin Sun

IP-based geolocation is essential for various location-aware Internet applications, such as online advertisement, content delivery, and online fraud prevention. Achieving accurate geolocation enormously relies on the number of high-quality (i.e., the fine-grained and stable over time) landmarks. However, the previous efforts of garnering landmarks have been impeded by the limited visible landmarks on the Internet and manual time cost. In this paper, we leverage the availability of numerous online webcams that are used to monitor physical surroundings as a rich source of promising high-quality landmarks for serving IP-based geolocation. In particular, we present a new framework called GeoCAM, which is designed to automatically generate qualified landmarks from online webcams, providing IP-based geolocation services with high accuracy and wide coverage. GeoCAM periodically monitors websites that are hosting live webcams and uses the natural language processing technique to extract the IP addresses and latitude/longitude of webcams for generating landmarks at large-scale. We develop a prototype of GeoCAM and conduct real-world experiments for validating its efficacy. Our results show that GeoCam can detect 282,902 live webcams hosted in webpages with 94.2% precision and 90.4% recall, and then generate 16,863 stable and fine-grained landmarks, which are two orders of magnitude more than the landmarks used in prior works. Thus, by correlating a large scale of landmarks, GeoCAM is able to provide a geolocation service with high accuracy and wide coverage.

基于ip的地理定位对于各种位置感知的Internet应用程序(如在线广告、内容交付和在线欺诈预防)至关重要。实现精确的地理定位在很大程度上依赖于高质量(即，细粒度和稳定的时间)地标的数量。然而，由于互联网上可见的地标有限，以及人工时间成本，以往的地标收集工作受到了阻碍。在本文中，我们利用了许多在线网络摄像头的可用性，这些摄像头用于监控物理环境，作为提供基于ip的地理定位服务的有前途的高质量地标的丰富来源。特别地，我们提出了一个名为GeoCAM的新框架，该框架旨在从在线网络摄像头自动生成合格的地标，提供基于ip的高精度和广泛覆盖的地理定位服务。GeoCAM定期监控托管实时网络摄像头的网站，并使用自然语言处理技术提取网络摄像头的IP地址和经纬度，用于大规模生成地标。我们开发了GeoCAM的原型，并进行了真实世界的实验来验证其有效性。我们的研究结果表明，GeoCam可以以94.2%的精度和90.4%的召回率检测到282,902个网页上的实时网络摄像头，然后生成16,863个稳定和细粒的地标，比之前使用的地标增加了两个数量级。因此，通过将大尺度的地标相关联，GeoCAM能够提供高精度和广泛覆盖的地理定位服务。

{"title":"Towards IP-based Geolocation via Fine-grained and Stable Webcam Landmarks","authors":"Zhihao Wang, Qiang Li, Jinke Song, Haining Wang, Limin Sun","doi":"10.1145/3366423.3380216","DOIUrl":"https://doi.org/10.1145/3366423.3380216","url":null,"abstract":"IP-based geolocation is essential for various location-aware Internet applications, such as online advertisement, content delivery, and online fraud prevention. Achieving accurate geolocation enormously relies on the number of high-quality (i.e., the fine-grained and stable over time) landmarks. However, the previous efforts of garnering landmarks have been impeded by the limited visible landmarks on the Internet and manual time cost. In this paper, we leverage the availability of numerous online webcams that are used to monitor physical surroundings as a rich source of promising high-quality landmarks for serving IP-based geolocation. In particular, we present a new framework called GeoCAM, which is designed to automatically generate qualified landmarks from online webcams, providing IP-based geolocation services with high accuracy and wide coverage. GeoCAM periodically monitors websites that are hosting live webcams and uses the natural language processing technique to extract the IP addresses and latitude/longitude of webcams for generating landmarks at large-scale. We develop a prototype of GeoCAM and conduct real-world experiments for validating its efficacy. Our results show that GeoCam can detect 282,902 live webcams hosted in webpages with 94.2% precision and 90.4% recall, and then generate 16,863 stable and fine-grained landmarks, which are two orders of magnitude more than the landmarks used in prior works. Thus, by correlating a large scale of landmarks, GeoCAM is able to provide a geolocation service with high accuracy and wide coverage.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89543717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Identifying Referential Intention with Heterogeneous Contexts 异质语境下指称意图的识别

Proceedings of The Web Conference 2020

Pub Date : 2020-04-19 DOI: 10.1145/3366423.3380175

W. Yu, Mengxia Yu, Tong Zhao, Meng Jiang

Citing, quoting, and forwarding & commenting behaviors are widely seen in academia, news media, and social media. Existing behavior modeling approaches focused on mining content and describing preferences of authors, speakers, and users. However, behavioral intention plays an important role in generating content on the platforms. In this work, we propose to identify the referential intention which motivates the action of using the referred (e.g., cited, quoted, and retweeted) source and content to support their claims. We adopt a theory in sociology to develop a schema of four types of intentions. The challenge lies in the heterogeneity of observed contextual information surrounding the referential behavior, such as referred content (e.g., a cited paper), local context (e.g., the sentence citing the paper), neighboring context (e.g., the former and latter sentences), and network context (e.g., the academic network of authors, affiliations, and keywords). We propose a new neural framework with Interactive Hierarchical Attention (IHA) to identify the intention of referential behavior by properly aggregating the heterogeneous contexts. Experiments demonstrate that the proposed method can effectively identify the type of intention of citing behaviors (on academic data) and retweeting behaviors (on Twitter). And learning the heterogeneous contexts collectively can improve the performance. This work opens a door for understanding content generation from a fundamental perspective of behavior sciences.

引用、引用和转发评论行为在学术界、新闻媒体和社交媒体中广泛存在。现有的行为建模方法侧重于挖掘内容和描述作者、演讲者和用户的偏好。然而，行为意图在平台上的内容生成中起着重要的作用。在这项工作中，我们建议确定激励使用引用(例如，引用，引用和转发)来源和内容来支持其主张的行为的参考意图。我们采用一种社会学理论来发展四种类型意图的图式。挑战在于围绕参考行为观察到的上下文信息的异质性，例如引用的内容(例如，被引用的论文)、本地上下文(例如，引用论文的句子)、邻近上下文(例如，前句和后句)和网络上下文(例如，作者、隶属关系和关键词的学术网络)。我们提出了一种新的具有交互层次注意(IHA)的神经框架，通过适当地聚合异构上下文来识别参考行为的意图。实验表明，该方法可以有效识别学术数据上的引用行为和Twitter上的转发行为的意图类型。对异构上下文进行集体学习可以提高性能。这项工作为从行为科学的基本角度理解内容生成打开了一扇门。

{"title":"Identifying Referential Intention with Heterogeneous Contexts","authors":"W. Yu, Mengxia Yu, Tong Zhao, Meng Jiang","doi":"10.1145/3366423.3380175","DOIUrl":"https://doi.org/10.1145/3366423.3380175","url":null,"abstract":"Citing, quoting, and forwarding & commenting behaviors are widely seen in academia, news media, and social media. Existing behavior modeling approaches focused on mining content and describing preferences of authors, speakers, and users. However, behavioral intention plays an important role in generating content on the platforms. In this work, we propose to identify the referential intention which motivates the action of using the referred (e.g., cited, quoted, and retweeted) source and content to support their claims. We adopt a theory in sociology to develop a schema of four types of intentions. The challenge lies in the heterogeneity of observed contextual information surrounding the referential behavior, such as referred content (e.g., a cited paper), local context (e.g., the sentence citing the paper), neighboring context (e.g., the former and latter sentences), and network context (e.g., the academic network of authors, affiliations, and keywords). We propose a new neural framework with Interactive Hierarchical Attention (IHA) to identify the intention of referential behavior by properly aggregating the heterogeneous contexts. Experiments demonstrate that the proposed method can effectively identify the type of intention of citing behaviors (on academic data) and retweeting behaviors (on Twitter). And learning the heterogeneous contexts collectively can improve the performance. This work opens a door for understanding content generation from a fundamental perspective of behavior sciences.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"144 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77583268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Hierarchical Visual-aware Minimax Ranking Based on Co-purchase Data for Personalized Recommendation 基于共同购买数据的分层视觉感知Minimax排序个性化推荐

Proceedings of The Web Conference 2020

Pub Date : 2020-04-19 DOI: 10.1145/3366423.3380007

Xiaoya Chong, Qing Li, Howard Leung, Qianhui Men, Xianjin Chao

Personalized recommendation aims at ranking a set of items according to the learnt preferences of the user. Existing methods optimize the ranking function by considering an item that the user has not bought yet as a negative item and assuming that the user prefers the positive item that he has bought to the negative item. The strategy is to exclude irrelevant items from the dataset to narrow down the set of potential positive items to improve ranking accuracy. It conflicts with the goal of recommendation from the seller’s point of view, which aims to enlarge that set for each user. In this paper, we diminish this limitation by proposing a novel learning method called Hierarchical Visual-aware Minimax Ranking (H-VMMR), in which a new concept of predictive sampling is proposed to sample items in a close relationship with the positive items (e.g., substitutes, compliments). We set up the problem by maximizing the preference discrepancy between positive and negative items, as well as minimizing the gap between positive and predictive items based on visual features. We also build a hierarchical learning model based on co-purchase data to solve the data sparsity problem. Our method is able to enlarge the set of potential positive items as well as true negative items during ranking. The experimental results show that our H-VMMR outperforms the state-of-the-art learning methods.

个性化推荐的目的是根据用户的学习偏好对一组项目进行排名。现有的方法通过将用户尚未购买的物品视为负面物品，并假设用户更喜欢他已购买的正面物品而不是负面物品来优化排名函数。该策略是从数据集中排除不相关的项目，以缩小潜在的积极项目的集合，以提高排名的准确性。从卖家的角度来看，这与推荐的目标相冲突，卖家的目标是扩大每个用户的推荐集。在本文中，我们提出了一种新的学习方法，称为分层视觉感知最小最大排序(H-VMMR)，其中提出了预测抽样的新概念，对与积极项目(例如替代品，赞美)密切相关的项目进行抽样，从而消除了这一限制。我们通过最大化积极和消极项目之间的偏好差异，以及最小化基于视觉特征的积极和预测项目之间的差距来建立问题。我们还建立了基于共同购买数据的分层学习模型，以解决数据稀疏性问题。我们的方法能够在排序过程中扩大潜在的正项目和真负项目的集合。实验结果表明，我们的H-VMMR学习方法优于目前最先进的学习方法。

{"title":"Hierarchical Visual-aware Minimax Ranking Based on Co-purchase Data for Personalized Recommendation","authors":"Xiaoya Chong, Qing Li, Howard Leung, Qianhui Men, Xianjin Chao","doi":"10.1145/3366423.3380007","DOIUrl":"https://doi.org/10.1145/3366423.3380007","url":null,"abstract":"Personalized recommendation aims at ranking a set of items according to the learnt preferences of the user. Existing methods optimize the ranking function by considering an item that the user has not bought yet as a negative item and assuming that the user prefers the positive item that he has bought to the negative item. The strategy is to exclude irrelevant items from the dataset to narrow down the set of potential positive items to improve ranking accuracy. It conflicts with the goal of recommendation from the seller’s point of view, which aims to enlarge that set for each user. In this paper, we diminish this limitation by proposing a novel learning method called Hierarchical Visual-aware Minimax Ranking (H-VMMR), in which a new concept of predictive sampling is proposed to sample items in a close relationship with the positive items (e.g., substitutes, compliments). We set up the problem by maximizing the preference discrepancy between positive and negative items, as well as minimizing the gap between positive and predictive items based on visual features. We also build a hierarchical learning model based on co-purchase data to solve the data sparsity problem. Our method is able to enlarge the set of potential positive items as well as true negative items during ranking. The experimental results show that our H-VMMR outperforms the state-of-the-art learning methods.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77685321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6