The World Wide Web Conference最新文献_第4页

Product-Aware Helpfulness Prediction of Online Reviews 在线评论的产品感知有用性预测

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313523

M. Fan, Chao Feng, Lin Guo, Mingming Sun, Ping Li

Helpful reviews are essential for e-commerce and review websites, as they can help customers make quick purchase decisions and merchants to increase profits. Due to a great number of online reviews with unknown helpfulness, it recently leads to promising research on building automatic mechanisms to assess review helpfulness. The mainstream methods generally extract various linguistic and embedding features solely from the text of a review as the evidence for helpfulness prediction. We, however, consider that the helpfulness of a review should be fully aware of the metadata (such as the title, the brand, the category, and the description) of its target product, besides the textual content of the review itself. Hence, in this paper we propose an end-to-end deep neural architecture directly fed by both the metadata of a product and the raw text of its reviews to acquire product-aware review representations for helpfulness prediction. The learned representations do not require tedious labor on feature engineering and are expected to be more informative as the target-aware evidence to assess the helpfulness of online reviews. We also construct two large-scale datasets which are a portion of the real-world web data in Amazon and Yelp, respectively, to train and test our approach. Experiments are conducted on two different tasks: helpfulness identification and regression of online reviews, and results demonstrate that our approach can achieve state-of-the-art performance with substantial improvements.

有用的评论对于电子商务和评论网站来说是必不可少的，因为它们可以帮助客户快速做出购买决定，也可以帮助商家增加利润。由于大量在线评论的有用性未知，最近人们开始研究如何建立自动机制来评估评论的有用性。主流方法一般仅从评论文本中提取各种语言和嵌入特征作为有用性预测的证据。然而，我们认为，除了评论本身的文本内容外，评论的有用性应该充分了解其目标产品的元数据(如标题、品牌、类别和描述)。因此，在本文中，我们提出了一个端到端的深度神经架构，该架构直接由产品的元数据和评论的原始文本提供，以获得产品感知的评论表示，用于有用的预测。学习到的表示不需要在特征工程上进行繁琐的劳动，并且期望作为目标感知证据来评估在线评论的有用性。我们还构建了两个大型数据集，分别是亚马逊和Yelp的真实网络数据的一部分，以训练和测试我们的方法。在两个不同的任务上进行了实验:帮助性识别和在线评论的回归，结果表明我们的方法可以在大幅度改进的情况下达到最先进的性能。

{"title":"Product-Aware Helpfulness Prediction of Online Reviews","authors":"M. Fan, Chao Feng, Lin Guo, Mingming Sun, Ping Li","doi":"10.1145/3308558.3313523","DOIUrl":"https://doi.org/10.1145/3308558.3313523","url":null,"abstract":"Helpful reviews are essential for e-commerce and review websites, as they can help customers make quick purchase decisions and merchants to increase profits. Due to a great number of online reviews with unknown helpfulness, it recently leads to promising research on building automatic mechanisms to assess review helpfulness. The mainstream methods generally extract various linguistic and embedding features solely from the text of a review as the evidence for helpfulness prediction. We, however, consider that the helpfulness of a review should be fully aware of the metadata (such as the title, the brand, the category, and the description) of its target product, besides the textual content of the review itself. Hence, in this paper we propose an end-to-end deep neural architecture directly fed by both the metadata of a product and the raw text of its reviews to acquire product-aware review representations for helpfulness prediction. The learned representations do not require tedious labor on feature engineering and are expected to be more informative as the target-aware evidence to assess the helpfulness of online reviews. We also construct two large-scale datasets which are a portion of the real-world web data in Amazon and Yelp, respectively, to train and test our approach. Experiments are conducted on two different tasks: helpfulness identification and regression of online reviews, and results demonstrate that our approach can achieve state-of-the-art performance with substantial improvements.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"357 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76328267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Quality Effects on User Preferences and Behaviorsin Mobile News Streaming 移动新闻流中用户偏好和行为的质量影响

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313751

Hongyu Lu, Min Zhang, Weizhi Ma, Yunqiu Shao, Yiqun Liu, Shaoping Ma

User behaviors are widely used as implicit feedbacks of user preferences in personalized information systems. In previous works and online applications, the user's click signals are used as positive feedback for ranking, recommendation, evaluation, etc. However, when users click on a piece of low-quality news, they are more likely to have negative experiences and different reading behaviors. Hence, the ignorance of the quality effects of news may lead to the misinterpretation of user behaviors as well as consequence studies. To address these issues, we conducted an in-depth user study in mobile news streaming scenario to investigate whether and how the quality of news may affect user preferences and user behaviors. Firstly, we verify that quality does affect user preferences, and low-quality news results in a lower preference. We further find that this effect varies with both interaction phases and user's interest in the topic of the news. Secondly, we inspect how users interact with low-quality news. Surprisingly, we find that users are more likely to click on low-quality news because of its high title persuasion. Moreover, users will read less and slower with fewer revisits and examinations while reading the low-quality news. Based on these quality effects we have discovered, we propose the Preference Behavior Quality (PBQ) probability model which incorporates the quality into traditional behavior-only implicit feedback. The significant improvement demonstrates that incorporating quality can help build implicit feedback. Since the importance and difficulty in collecting news quality, we further investigate how to identify it automatically. Based on point-wise and pair-wise distinguishing experiments, we show that user behaviors, especially reading ratio and dwell time, have high ability to identify news quality. Our research has comprehensively analyzed the effects of quality on user preferences and behaviors, and raised the awareness of item quality in interpreting user behaviors and estimating user preferences.

在个性化信息系统中，用户行为作为用户偏好的隐式反馈被广泛应用。在以往的作品和在线应用中，用户的点击信号被用作正反馈，用于排名、推荐、评价等。然而，当用户点击一条低质量的新闻时，他们更有可能产生负面的体验和不同的阅读行为。因此，忽视新闻的质量效应可能会导致对用户行为和后果研究的误解。为了解决这些问题，我们对移动新闻流场景进行了深入的用户研究，以调查新闻质量是否以及如何影响用户偏好和用户行为。首先，我们验证了质量确实会影响用户偏好，而低质量的新闻会导致更低的偏好。我们进一步发现，这种影响随交互阶段和用户对新闻主题的兴趣而变化。其次，我们考察用户如何与低质量新闻互动。令人惊讶的是，我们发现用户更有可能点击低质量的新闻，因为它的标题说服力很强。此外，在阅读低质量的新闻时，用户会阅读更少、更慢、更少的访问和检查。基于这些质量效应，我们提出了偏好行为质量(PBQ)概率模型，该模型将质量纳入传统的纯行为隐式反馈中。显著的改进表明，结合质量可以帮助建立隐式反馈。鉴于新闻质量采集的重要性和难度，我们进一步研究了如何对新闻质量进行自动识别。基于点对和对对区分实验，我们发现用户行为，特别是阅读率和停留时间，对新闻质量有很高的识别能力。我们的研究全面分析了质量对用户偏好和行为的影响，提高了对商品质量在解释用户行为和估计用户偏好方面的认识。

{"title":"Quality Effects on User Preferences and Behaviorsin Mobile News Streaming","authors":"Hongyu Lu, Min Zhang, Weizhi Ma, Yunqiu Shao, Yiqun Liu, Shaoping Ma","doi":"10.1145/3308558.3313751","DOIUrl":"https://doi.org/10.1145/3308558.3313751","url":null,"abstract":"User behaviors are widely used as implicit feedbacks of user preferences in personalized information systems. In previous works and online applications, the user's click signals are used as positive feedback for ranking, recommendation, evaluation, etc. However, when users click on a piece of low-quality news, they are more likely to have negative experiences and different reading behaviors. Hence, the ignorance of the quality effects of news may lead to the misinterpretation of user behaviors as well as consequence studies. To address these issues, we conducted an in-depth user study in mobile news streaming scenario to investigate whether and how the quality of news may affect user preferences and user behaviors. Firstly, we verify that quality does affect user preferences, and low-quality news results in a lower preference. We further find that this effect varies with both interaction phases and user's interest in the topic of the news. Secondly, we inspect how users interact with low-quality news. Surprisingly, we find that users are more likely to click on low-quality news because of its high title persuasion. Moreover, users will read less and slower with fewer revisits and examinations while reading the low-quality news. Based on these quality effects we have discovered, we propose the Preference Behavior Quality (PBQ) probability model which incorporates the quality into traditional behavior-only implicit feedback. The significant improvement demonstrates that incorporating quality can help build implicit feedback. Since the importance and difficulty in collecting news quality, we further investigate how to identify it automatically. Based on point-wise and pair-wise distinguishing experiments, we show that user behaviors, especially reading ratio and dwell time, have high ability to identify news quality. Our research has comprehensively analyzed the effects of quality on user preferences and behaviors, and raised the awareness of item quality in interpreting user behaviors and estimating user preferences.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73341100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Recurrent Convolutional Neural Network for Sequential Recommendation 序列推荐的递归卷积神经网络

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313408

Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Jiajie Xu, V. Sheng, Zhiming Cui, Xiaofang Zhou, Hui Xiong

The sequential recommendation, which models sequential behavioral patterns among users for the recommendation, plays a critical role in recommender systems. However, the state-of-the-art Recurrent Neural Networks (RNN) solutions rarely consider the non-linear feature interactions and non-monotone short-term sequential patterns, which are essential for user behavior modeling in sparse sequence data. In this paper, we propose a novel Recurrent Convolutional Neural Network model (RCNN). It not only utilizes the recurrent architecture of RNN to capture complex long-term dependencies, but also leverages the convolutional operation of Convolutional Neural Network (CNN) model to extract short-term sequential patterns among recurrent hidden states. Specifically, we first generate a hidden state at each time step with the recurrent layer. Then the recent hidden states are regarded as an “image”, and RCNN searches non-linear feature interactions and non-monotone local patterns via intra-step horizontal and inter-step vertical convolutional filters, respectively. Moreover, the output of convolutional filters and the hidden state are concatenated and fed into a fully-connected layer to generate the recommendation. Finally, we evaluate the proposed model using four real-world datasets from various application scenarios. The experimental results show that our model RCNN significantly outperforms the state-of-the-art approaches on sequential recommendation.

顺序推荐在推荐系统中起着至关重要的作用，它对用户之间的顺序行为模式进行了建模。然而，最先进的递归神经网络(RNN)解决方案很少考虑非线性特征交互和非单调短期序列模式，而这些对于稀疏序列数据中的用户行为建模至关重要。本文提出了一种新的递归卷积神经网络模型(RCNN)。它不仅利用RNN的循环架构来捕获复杂的长期依赖关系，而且利用卷积神经网络(CNN)模型的卷积运算来提取循环隐藏状态之间的短期序列模式。具体来说，我们首先用循环层在每个时间步长生成一个隐藏状态。然后将最近的隐藏状态视为“图像”，RCNN分别通过阶内水平卷积滤波器和阶间垂直卷积滤波器搜索非线性特征交互和非单调局部模式。此外，将卷积滤波器的输出和隐藏状态连接并馈送到全连接层以生成推荐。最后，我们使用来自不同应用场景的四个真实数据集来评估所提出的模型。实验结果表明，我们的模型RCNN在顺序推荐方面明显优于最先进的方法。

{"title":"Recurrent Convolutional Neural Network for Sequential Recommendation","authors":"Chengfeng Xu, Pengpeng Zhao, Yanchi Liu, Jiajie Xu, V. Sheng, Zhiming Cui, Xiaofang Zhou, Hui Xiong","doi":"10.1145/3308558.3313408","DOIUrl":"https://doi.org/10.1145/3308558.3313408","url":null,"abstract":"The sequential recommendation, which models sequential behavioral patterns among users for the recommendation, plays a critical role in recommender systems. However, the state-of-the-art Recurrent Neural Networks (RNN) solutions rarely consider the non-linear feature interactions and non-monotone short-term sequential patterns, which are essential for user behavior modeling in sparse sequence data. In this paper, we propose a novel Recurrent Convolutional Neural Network model (RCNN). It not only utilizes the recurrent architecture of RNN to capture complex long-term dependencies, but also leverages the convolutional operation of Convolutional Neural Network (CNN) model to extract short-term sequential patterns among recurrent hidden states. Specifically, we first generate a hidden state at each time step with the recurrent layer. Then the recent hidden states are regarded as an “image”, and RCNN searches non-linear feature interactions and non-monotone local patterns via intra-step horizontal and inter-step vertical convolutional filters, respectively. Moreover, the output of convolutional filters and the hidden state are concatenated and fed into a fully-connected layer to generate the recommendation. Finally, we evaluate the proposed model using four real-world datasets from various application scenarios. The experimental results show that our model RCNN significantly outperforms the state-of-the-art approaches on sequential recommendation.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73921176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94

Semantic Text Matching for Long-Form Documents 长格式文档的语义文本匹配

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313707

Jyun-Yu Jiang, Mingyang Zhang, Cheng Li, Michael Bendersky, Nadav Golbandi, Marc Najork

Semantic text matching is one of the most important research problems in many domains, including, but not limited to, information retrieval, question answering, and recommendation. Among the different types of semantic text matching, long-document-to-long-document text matching has many applications, but has rarely been studied. Most existing approaches for semantic text matching have limited success in this setting, due to their inability to capture and distill the main ideas and topics from long-form text. In this paper, we propose a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long-form document based semantic text matching. In addition to word information, SMASH RNN is using the document structure to improve the representation of long-form documents. Specifically, SMASH RNN synthesizes information from different document structure levels, including paragraphs, sentences, and words. An attention-based hierarchical RNN derives a representation for each document structure level. Then, the representations learned from the different levels are aggregated to learn a more comprehensive semantic representation of the entire document. For semantic text matching, a Siamese structure couples the representations of a pair of documents, and infers a probabilistic score as their similarity. We conduct an extensive empirical evaluation of SMASH RNN with three practical applications, including email attachment suggestion, related article recommendation, and citation recommendation. Experimental results on public data sets demonstrate that SMASH RNN significantly outperforms competitive baseline methods across various classification and ranking scenarios in the context of semantic matching of long-form documents.

语义文本匹配是许多领域中最重要的研究问题之一，包括但不限于信息检索、问答和推荐。在不同类型的语义文本匹配中，长文档到长文档的文本匹配有着广泛的应用，但很少被研究。大多数现有的语义文本匹配方法在这种情况下都取得了有限的成功，因为它们无法从长格式文本中捕获和提取主要思想和主题。在本文中，我们提出了一种新的基于Siamese多深度注意的递归递归神经网络(SMASH RNN)，它可以学习长格式语义，并实现基于长格式文档的语义文本匹配。除了单词信息外，SMASH RNN还使用文档结构来改进长格式文档的表示。具体来说，SMASH RNN综合了来自不同文档结构层次的信息，包括段落、句子和单词。基于注意力的分层RNN为每个文档结构级别派生一个表示。然后，将从不同层次学习到的表示聚合起来，以学习整个文档的更全面的语义表示。对于语义文本匹配，Siamese结构将一对文档的表示耦合起来，并推断出它们的相似性的概率分数。本文通过邮件附件推荐、相关文章推荐和引文推荐三种实际应用，对SMASH RNN进行了广泛的实证评估。在公共数据集上的实验结果表明，SMASH RNN在长格式文档语义匹配的各种分类和排序场景中显著优于竞争性基线方法。

{"title":"Semantic Text Matching for Long-Form Documents","authors":"Jyun-Yu Jiang, Mingyang Zhang, Cheng Li, Michael Bendersky, Nadav Golbandi, Marc Najork","doi":"10.1145/3308558.3313707","DOIUrl":"https://doi.org/10.1145/3308558.3313707","url":null,"abstract":"Semantic text matching is one of the most important research problems in many domains, including, but not limited to, information retrieval, question answering, and recommendation. Among the different types of semantic text matching, long-document-to-long-document text matching has many applications, but has rarely been studied. Most existing approaches for semantic text matching have limited success in this setting, due to their inability to capture and distill the main ideas and topics from long-form text. In this paper, we propose a novel Siamese multi-depth attention-based hierarchical recurrent neural network (SMASH RNN) that learns the long-form semantics, and enables long-form document based semantic text matching. In addition to word information, SMASH RNN is using the document structure to improve the representation of long-form documents. Specifically, SMASH RNN synthesizes information from different document structure levels, including paragraphs, sentences, and words. An attention-based hierarchical RNN derives a representation for each document structure level. Then, the representations learned from the different levels are aggregated to learn a more comprehensive semantic representation of the entire document. For semantic text matching, a Siamese structure couples the representations of a pair of documents, and infers a probabilistic score as their similarity. We conduct an extensive empirical evaluation of SMASH RNN with three practical applications, including email attachment suggestion, related article recommendation, and citation recommendation. Experimental results on public data sets demonstrate that SMASH RNN significantly outperforms competitive baseline methods across various classification and ranking scenarios in the context of semantic matching of long-form documents.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84237312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Classifying Extremely Short Texts by Exploiting Semantic Centroids in Word Mover's Distance Space 利用Word Mover距离空间的语义质心对极短文本进行分类

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313397

C. Li, Jihong Ouyang, Ximing Li

Automatically classifying extremely short texts, such as social media posts and web page titles, plays an important role in a wide range of content analysis applications. However, traditional classifiers based on bag-of-words (BoW) representations often fail in this task. The underlying reason is that the document similarity can not be accurately measured under BoW representations due to the extreme sparseness of short texts. This results in significant difficulty to capture the generality of short texts. To address this problem, we use a better regularized word mover's distance (RWMD), which can measure distances among short texts at the semantic level. We then propose a RWMD-based centroid classifier for short texts, named RWMD-CC. Basically, RWMD-CC computes a representative semantic centroid for each category under the RWMD measure, and predicts test documents by finding the closest semantic centroid. The testing is much more efficient than the prior art of K nearest neighbor classifier based on WMD. Experimental results indicate that our RWMD-CC can achieve very competitive classification performance on extremely short texts.

自动分类极短的文本，如社交媒体帖子和网页标题，在广泛的内容分析应用中起着重要作用。然而，传统的基于词袋(BoW)表示的分类器在这一任务中往往失败。其根本原因是由于短文本的极度稀疏性，在BoW表示下无法准确测量文档的相似度。这就给捕捉短文本的通用性带来了极大的困难。为了解决这个问题，我们使用了一个更好的正则化词移动距离(RWMD)，它可以在语义层面测量短文本之间的距离。然后，我们提出了一个基于rwmd的短文本质心分类器，命名为RWMD-CC。基本上，RWMD- cc为RWMD度量下的每个类别计算一个有代表性的语义质心，并通过找到最接近的语义质心来预测测试文档。与现有的基于WMD的K近邻分类器相比，该方法的测试效率更高。实验结果表明，我们的RWMD-CC在极短文本上可以取得非常有竞争力的分类性能。

引用次数: 15

RecBoard: A Web-based Platform for Recommendation System Research and Development RecBoard:基于web的推荐系统研究与开发平台

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3314133

M. Chawla, Kriti Singh, Longqi Yang, D. Estrin

This paper introduces RecBoard, a unified web-based platform that facilitates researchers and practitioners to train, test, deploy, and monitor recommendation systems. RecBoard streamlines the end-to-end process of building recommendation systems by providing a collaborative user interface that automates repetitive tasks related to dataset management, model training, visualization, deployments, and monitoring. Our demo prototype demonstrates how RecBoard can empower common tasks in research and development. RecBoard will be open-sourced and publicly available upon publication.

本文介绍了RecBoard，这是一个基于web的统一平台，可以帮助研究人员和从业者培训、测试、部署和监控推荐系统。RecBoard通过提供一个协作用户界面来简化构建推荐系统的端到端流程，该界面可自动执行与数据集管理、模型训练、可视化、部署和监控相关的重复任务。我们的演示原型演示了RecBoard如何支持研究和开发中的常见任务。RecBoard将是开源的，并在出版后向公众开放。

引用次数: 0

InfraNodus: Generating Insight Using Text Network Analysis InfraNodus:使用文本网络分析生成洞察力

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3314123

Dmitry Paranyushkin

In this paper we present a web-based open source tool and a method for generating insight from any text or discourse using text network analysis. The tool (InfraNodus) can be used by researchers and writers to organize and to better understand their notes, to measure the level of bias in discourse, and to identify the parts of the discourse where there is a potential for insight and new ideas. The method is based on text network analysis algorithm, which represents any text as a network and identifies the most influential words in a discourse based on the terms' co-occurrence. Graph community detection algorithm is then applied in order to identify the different topical clusters, which represent the main topics in the text as well as the relations between them. The community structure is used in conjunction with other measures to identify the level of bias or cognitive diversity of the discourse. Finally, the structural gaps in the graph can indicate the parts of the discourse where the connections are lacking, therefore highlighting the areas where there's a potential for new ideas. The tool can be used as stand-alone software by end users as well as implemented via an API into other tools. Another interesting application is in the field of recommendation systems: structural gaps could indicate potentially interesting non-trivial connections to any connected datasets.

在本文中，我们提出了一个基于web的开源工具和一种使用文本网络分析从任何文本或话语中生成洞察力的方法。研究人员和作者可以使用这个工具(InfraNodus)来组织和更好地理解他们的笔记，衡量话语中的偏见程度，并确定话语中有潜在洞察力和新想法的部分。该方法基于文本网络分析算法，将任意文本表示为一个网络，并根据词的共现性来识别语篇中最具影响力的词。然后使用图社区检测算法来识别不同的主题聚类，这些主题聚类代表了文本中的主要主题以及它们之间的关系。社区结构与其他措施一起使用，以确定话语的偏见水平或认知多样性。最后，图表中的结构间隙可以指出话语中缺乏联系的部分，从而突出显示可能产生新想法的区域。该工具可以作为独立软件由最终用户使用，也可以通过API实现到其他工具中。另一个有趣的应用是在推荐系统领域:结构间隙可以指示任何连接数据集的潜在有趣的非平凡连接。

{"title":"InfraNodus: Generating Insight Using Text Network Analysis","authors":"Dmitry Paranyushkin","doi":"10.1145/3308558.3314123","DOIUrl":"https://doi.org/10.1145/3308558.3314123","url":null,"abstract":"In this paper we present a web-based open source tool and a method for generating insight from any text or discourse using text network analysis. The tool (InfraNodus) can be used by researchers and writers to organize and to better understand their notes, to measure the level of bias in discourse, and to identify the parts of the discourse where there is a potential for insight and new ideas. The method is based on text network analysis algorithm, which represents any text as a network and identifies the most influential words in a discourse based on the terms' co-occurrence. Graph community detection algorithm is then applied in order to identify the different topical clusters, which represent the main topics in the text as well as the relations between them. The community structure is used in conjunction with other measures to identify the level of bias or cognitive diversity of the discourse. Finally, the structural gaps in the graph can indicate the parts of the discourse where the connections are lacking, therefore highlighting the areas where there's a potential for new ideas. The tool can be used as stand-alone software by end users as well as implemented via an API into other tools. Another interesting application is in the field of recommendation systems: structural gaps could indicate potentially interesting non-trivial connections to any connected datasets.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"AES-10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84515742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

From Stances' Imbalance to Their HierarchicalRepresentation and Detection 从姿态的不平衡到姿态的层次表示与检测

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313724

Qiang Zhang, Shangsong Liang, Aldo Lipani, Z. Ren, Emine Yilmaz

Stance detection has gained increasing interest from the research community due to its importance for fake news detection. The goal of stance detection is to categorize an overall position of a subject towards an object into one of the four classes: agree, disagree, discuss, and unrelated. One of the major problems faced by current machine learning models used for stance detection is caused by a severe class imbalance among these classes. Hence, most models fail to correctly classify instances that fall into minority classes. In this paper, we address this problem by proposing a hierarchical representation of these classes, which combines the agree, disagree, and discuss classes under a new related class. Further, we propose a two-layer neural network that learns from this hierarchical representation and controls the error propagation between the two layers using the Maximum Mean Discrepancy regularizer. Compared with conventional four-way classifiers, this model has two advantages: (1) the hierarchical architecture mitigates the class imbalance problem; (2) the regularization makes the model to better discern between the related and unrelated stances. An extensive experimentation demonstrates state-of-the-art accuracy performance of the proposed model for stance detection.

姿态检测由于其在假新闻检测中的重要性而引起了研究界越来越多的兴趣。姿态检测的目标是将主体对客体的总体位置分为四类:同意、不同意、讨论和不相关。当前用于姿态检测的机器学习模型面临的主要问题之一是由这些类之间严重的类不平衡引起的。因此，大多数模型不能正确地对属于少数类的实例进行分类。在本文中，我们通过提出这些类的分层表示来解决这个问题，该表示将同意类，不同意类和讨论类组合在一个新的相关类下。此外，我们提出了一个两层神经网络，该网络从这种分层表示中学习，并使用最大平均差异正则化器控制两层之间的误差传播。与传统的四向分类器相比，该模型具有两个优点:(1)层次结构减轻了类不平衡问题;(2)正则化使模型能够更好地区分相关和不相关的立场。广泛的实验证明了所提出的姿态检测模型的最先进的精度性能。

{"title":"From Stances' Imbalance to Their HierarchicalRepresentation and Detection","authors":"Qiang Zhang, Shangsong Liang, Aldo Lipani, Z. Ren, Emine Yilmaz","doi":"10.1145/3308558.3313724","DOIUrl":"https://doi.org/10.1145/3308558.3313724","url":null,"abstract":"Stance detection has gained increasing interest from the research community due to its importance for fake news detection. The goal of stance detection is to categorize an overall position of a subject towards an object into one of the four classes: agree, disagree, discuss, and unrelated. One of the major problems faced by current machine learning models used for stance detection is caused by a severe class imbalance among these classes. Hence, most models fail to correctly classify instances that fall into minority classes. In this paper, we address this problem by proposing a hierarchical representation of these classes, which combines the agree, disagree, and discuss classes under a new related class. Further, we propose a two-layer neural network that learns from this hierarchical representation and controls the error propagation between the two layers using the Maximum Mean Discrepancy regularizer. Compared with conventional four-way classifiers, this model has two advantages: (1) the hierarchical architecture mitigates the class imbalance problem; (2) the regularization makes the model to better discern between the related and unrelated stances. An extensive experimentation demonstrates state-of-the-art accuracy performance of the proposed model for stance detection.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84096458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Fuzzy Multi-task Learning for Hate Speech Type Identification 仇恨言论类型识别的模糊多任务学习

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313546

Han Liu, P. Burnap, Wafa Alorainy, M. Williams

In traditional machine learning, classifiers training is typically undertaken in the setting of single-task learning, so the trained classifier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93.

在传统的机器学习中，分类器的训练通常是在单任务学习的环境下进行的，因此训练出来的分类器可以区分不同的类别。然而，这必须基于不同的类是互斥的假设。在实际应用中，上述假设并不总是成立。例如，同一本书可能属于多个主题。从这个角度来看，研究人员被激励去制定多标签学习问题。在这种情况下，每个实例可以分配多个标签，但分类器的训练仍然通常在单任务学习的环境中进行。当采用概率方法进行分类器训练时，可以通过将多标记数据集转换为多个二值数据集来实现多任务学习。上述数据转换通常会导致类不平衡问题。如果不进行上述数据转换，对数据进行多重标记会导致类的数量呈指数增长，导致每个类的实例数量减少，识别每个类的难度增加。此外，在一些应用领域，如仇恨语音检测中，数据的多重标记非常耗时和昂贵。在本文中，我们通过我们提出的模糊集成方法引入了多任务学习环境下仇恨言论类型识别问题的新公式。在这种情况下，单标签数据可以用于半监督多标签学习，因此提出了两个新的指标(检测率和不相关率)来更有效地衡量这类学习任务的性能。我们报告了一项关于识别四种类型的仇恨言论的实验研究，即:宗教，种族，残疾和性取向。实验结果表明，本文提出的模糊集成方法优于其他常用的概率方法，总体检测率为0.93。

{"title":"Fuzzy Multi-task Learning for Hate Speech Type Identification","authors":"Han Liu, P. Burnap, Wafa Alorainy, M. Williams","doi":"10.1145/3308558.3313546","DOIUrl":"https://doi.org/10.1145/3308558.3313546","url":null,"abstract":"In traditional machine learning, classifiers training is typically undertaken in the setting of single-task learning, so the trained classifier can discriminate between different classes. However, this must be based on the assumption that different classes are mutually exclusive. In real applications, the above assumption does not always hold. For example, the same book may belong to multiple subjects. From this point of view, researchers were motivated to formulate multi-label learning problems. In this context, each instance can be assigned multiple labels but the classifiers training is still typically undertaken in the setting of single-task learning. When probabilistic approaches are adopted for classifiers training, multi-task learning can be enabled through transformation of a multi-labelled data set into several binary data sets. The above data transformation could usually result in the class imbalance issue. Without the above data transformation, multi-labelling of data results in an exponential increase of the number of classes, leading to fewer instances for each class and a higher difficulty for identifying each class. In addition, multi-labelling of data is very time consuming and expensive in some application areas, such as hate speech detection. In this paper, we introduce a novel formulation of the hate speech type identification problem in the setting of multi-task learning through our proposed fuzzy ensemble approach. In this setting, single-labelled data can be used for semi-supervised multi-label learning and two new metrics (detection rate and irrelevance rate) are thus proposed to measure more effectively the performance for this kind of learning tasks. We report an experimental study on identification of four types of hate speech, namely: religion, race, disability and sexual orientation. The experimental results show that our proposed fuzzy ensemble approach outperforms other popular probabilistic approaches, with an overall detection rate of 0.93.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80358298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario CityFlow:大规模城市交通场景的多智能体强化学习环境

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3314139

Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Z. Li

Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.

交通信号控制是强化学习的一个新兴应用场景。交通信号控制作为影响人们日常通勤生活的重要问题，在适应动态交通环境、协调包括车辆和行人在内的数千个智能体方面，对强化学习提出了独特的挑战。现代强化学习成功的一个关键因素是依靠一个好的模拟器来生成大量的数据样本进行学习。然而，目前最常用的开源交通模拟器SUMO不能扩展到大型路网和大交通流，这阻碍了交通场景下强化学习的研究。这促使我们创建一个新的交通模拟器CityFlow，从根本上优化了数据结构和有效的算法。CityFlow可以支持基于合成和真实数据的道路网络和交通流的灵活定义。它还为强化学习提供了用户友好的界面。最重要的是，CityFlow比SUMO快20倍以上，并且能够支持城市范围内的交通模拟，并提供交互式渲染以进行监控。除了交通信号控制，CityFlow还可以作为其他交通研究的基础，为智能交通领域测试机器学习方法创造新的可能性。

{"title":"CityFlow: A Multi-Agent Reinforcement Learning Environment for Large Scale City Traffic Scenario","authors":"Huichu Zhang, Siyuan Feng, Chang Liu, Yaoyao Ding, Yichen Zhu, Zihan Zhou, Weinan Zhang, Yong Yu, Haiming Jin, Z. Li","doi":"10.1145/3308558.3314139","DOIUrl":"https://doi.org/10.1145/3308558.3314139","url":null,"abstract":"Traffic signal control is an emerging application scenario for reinforcement learning. Besides being as an important problem that affects people's daily life in commuting, traffic signal control poses its unique challenges for reinforcement learning in terms of adapting to dynamic traffic environment and coordinating thousands of agents including vehicles and pedestrians. A key factor in the success of modern reinforcement learning relies on a good simulator to generate a large number of data samples for learning. The most commonly used open-source traffic simulator SUMO is, however, not scalable to large road network and large traffic flow, which hinders the study of reinforcement learning on traffic scenarios. This motivates us to create a new traffic simulator CityFlow with fundamentally optimized data structures and efficient algorithms. CityFlow can support flexible definitions for road network and traffic flow based on synthetic and real-world data. It also provides user-friendly interface for reinforcement learning. Most importantly, CityFlow is more than twenty times faster than SUMO and is capable of supporting city-wide traffic simulation with an interactive render for monitoring. Besides traffic signal control, CityFlow could serve as the base for other transportation studies and can create new possibilities to test machine learning methods in the intelligent transportation domain.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80408086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 171