The World Wide Web Conference最新文献_第10页

Detect Rumors on Twitter by Promoting Information Campaigns with Generative Adversarial Learning 通过生成对抗学习促进信息运动来检测Twitter上的谣言

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313741

Jing Ma, Wei Gao, Kam-Fai Wong

Rumors can cause devastating consequences to individual and/or society. Analysis shows that widespread of rumors typically results from deliberately promoted information campaigns which aim to shape collective opinions on the concerned news events. In this paper, we attempt to fight such chaos with itself to make automatic rumor detection more robust and effective. Our idea is inspired by adversarial learning method originated from Generative Adversarial Networks (GAN). We propose a GAN-style approach, where a generator is designed to produce uncertain or conflicting voices, complicating the original conversational threads in order to pressurize the discriminator to learn stronger rumor indicative representations from the augmented, more challenging examples. Different from traditional data-driven approach to rumor detection, our method can capture low-frequency but stronger non-trivial patterns via such adversarial training. Extensive experiments on two Twitter benchmark datasets demonstrate that our rumor detection method achieves much better results than state-of-the-art methods.

谣言会对个人和/或社会造成毁灭性的后果。分析表明，谣言的广泛传播通常是故意宣传的信息活动的结果，这些活动旨在塑造人们对有关新闻事件的集体看法。在本文中，我们试图与这种混乱作斗争，使自动谣言检测更加鲁棒和有效。我们的想法受到源自生成对抗网络(GAN)的对抗学习方法的启发。我们提出了一种gan风格的方法，其中生成器被设计用于产生不确定或冲突的声音，使原始会话线程复杂化，以迫使鉴别器从增强的、更具挑战性的示例中学习更强的谣言指示表示。与传统的数据驱动的谣言检测方法不同，我们的方法可以通过这种对抗性训练捕获低频但更强的非平凡模式。在两个Twitter基准数据集上的大量实验表明，我们的谣言检测方法比最先进的方法取得了更好的结果。

引用次数: 174

Semi-supervised Multi-view Individual and Sharable Feature Learning for Webpage Classification 网页分类的半监督多视图个性化和可共享特征学习

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313492

Fei Wu, Xiaoyuan Jing, Jun Zhou, Yi-mu Ji, Chao Lan, Qinghua Huang, Ruchuan Wang

Semi-supervised multi-view feature learning (SMFL) is a feasible solution for webpage classification. However, how to fully extract the complementarity and correlation information effectively under semi-supervised setting has not been well studied. In this paper, we propose a semi-supervised multi-view individual and sharable feature learning (SMISFL) approach, which jointly learns multiple view-individual transformations and one sharable transformation to explore the view-specific property for each view and the common property across views. We design a semi-supervised multi-view similarity preserving term, which fully utilizes the label information of labeled samples and similarity information of unlabeled samples from both intra-view and inter-view aspects. To promote learning of diversity, we impose a constraint on view-individual transformation to make the learned view-specific features to be statistically uncorrelated. Furthermore, we train a linear classifier, such that view-specific and shared features can be effectively combined for classification. Experiments on widely used webpage datasets demonstrate that SMISFL can significantly outperform state-of-the-art SMFL and webpage classification methods.

半监督多视图特征学习(SMFL)是一种可行的网页分类方法。然而，如何在半监督设置下有效地充分提取互补性和相关性信息一直没有得到很好的研究。在本文中，我们提出了一种半监督的多视图个体和共享特征学习(SMISFL)方法，该方法联合学习多个视图个体转换和一个共享转换，以探索每个视图的视图特定属性和视图之间的公共属性。我们设计了一种半监督的多视图相似保持项，从视图内和视图间两个方面充分利用了标记样本的标签信息和未标记样本的相似信息。为了促进多样性的学习，我们对视图-个体转换施加约束，使学习到的特定于视图的特征在统计上不相关。此外，我们训练了一个线性分类器，使得视图特定特征和共享特征可以有效地结合在一起进行分类。在广泛使用的网页数据集上的实验表明，SMISFL可以显著优于最先进的SMFL和网页分类方法。

{"title":"Semi-supervised Multi-view Individual and Sharable Feature Learning for Webpage Classification","authors":"Fei Wu, Xiaoyuan Jing, Jun Zhou, Yi-mu Ji, Chao Lan, Qinghua Huang, Ruchuan Wang","doi":"10.1145/3308558.3313492","DOIUrl":"https://doi.org/10.1145/3308558.3313492","url":null,"abstract":"Semi-supervised multi-view feature learning (SMFL) is a feasible solution for webpage classification. However, how to fully extract the complementarity and correlation information effectively under semi-supervised setting has not been well studied. In this paper, we propose a semi-supervised multi-view individual and sharable feature learning (SMISFL) approach, which jointly learns multiple view-individual transformations and one sharable transformation to explore the view-specific property for each view and the common property across views. We design a semi-supervised multi-view similarity preserving term, which fully utilizes the label information of labeled samples and similarity information of unlabeled samples from both intra-view and inter-view aspects. To promote learning of diversity, we impose a constraint on view-individual transformation to make the learned view-specific features to be statistically uncorrelated. Furthermore, we train a linear classifier, such that view-specific and shared features can be effectively combined for classification. Experiments on widely used webpage datasets demonstrate that SMISFL can significantly outperform state-of-the-art SMFL and webpage classification methods.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83604853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

To Return or to Explore: Modelling Human Mobility and Dynamics in Cyberspace 回归还是探索:模拟网络空间中的人类移动性和动力学

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313686

Tianran Hu, Yinglong Xia, Jiebo Luo

With the wide adoption of multi-community structure in many popular online platforms, human mobility across online communities has drawn increasing attention from both academia and industry. In this work, we study the statistical patterns that characterize human movements in cyberspace. Inspired by previous work on human mobility in physical space, we decompose human online activities into return and exploration - two complementary types of movements. We then study how people perform these two movements, respectively. We first propose a preferential return model that uncovers the preferential properties of people returning to multiple online communities. Interestingly, this model echos the previous findings on human mobility in physical space. We then present a preferential exploration model that characterizes exploration movements from a novel online community-group perspective. Our experiments quantitatively reveal the patterns of people exploring new communities, which share striking similarities with online return movements in terms of underlying principles. By combining the mechanisms of both return and exploration together, we are able to obtain an overall model that characterizes human mobility patterns in cyberspace at the individual level. We further investigate human online activities using our models, and discover valuable insights on the mobility patterns across online communities. Our models explain the empirically observed human online movement trajectories remarkably well, and more importantly, sheds better light on the understanding of human cyberspace dynamics.

随着许多流行的网络平台广泛采用多社区结构，网络社区间的人员流动越来越受到学术界和工业界的关注。在这项工作中，我们研究了表征网络空间中人类运动的统计模式。受先前关于人类在物理空间中移动的工作的启发，我们将人类的在线活动分解为返回和探索——两种互补的运动类型。然后我们分别研究人们如何表演这两个动作。我们首先提出了一个优先返回模型，揭示了人们返回多个在线社区的优先属性。有趣的是，这个模型与之前关于人类在物理空间中的流动性的发现相呼应。然后，我们提出了一个优先探索模型，从一个新颖的在线社区群体的角度来表征探索运动。我们的实验定量地揭示了人们探索新社区的模式，就基本原则而言，这种模式与在线回归运动有着惊人的相似之处。通过将回归和探索的机制结合起来，我们能够获得一个整体模型，在个人层面上表征网络空间中人类流动模式的特征。我们使用我们的模型进一步调查了人类在线活动，并发现了关于在线社区移动模式的宝贵见解。我们的模型非常好地解释了经验观察到的人类在线运动轨迹，更重要的是，它更好地阐明了对人类网络空间动态的理解。

{"title":"To Return or to Explore: Modelling Human Mobility and Dynamics in Cyberspace","authors":"Tianran Hu, Yinglong Xia, Jiebo Luo","doi":"10.1145/3308558.3313686","DOIUrl":"https://doi.org/10.1145/3308558.3313686","url":null,"abstract":"With the wide adoption of multi-community structure in many popular online platforms, human mobility across online communities has drawn increasing attention from both academia and industry. In this work, we study the statistical patterns that characterize human movements in cyberspace. Inspired by previous work on human mobility in physical space, we decompose human online activities into return and exploration - two complementary types of movements. We then study how people perform these two movements, respectively. We first propose a preferential return model that uncovers the preferential properties of people returning to multiple online communities. Interestingly, this model echos the previous findings on human mobility in physical space. We then present a preferential exploration model that characterizes exploration movements from a novel online community-group perspective. Our experiments quantitatively reveal the patterns of people exploring new communities, which share striking similarities with online return movements in terms of underlying principles. By combining the mechanisms of both return and exploration together, we are able to obtain an overall model that characterizes human mobility patterns in cyberspace at the individual level. We further investigate human online activities using our models, and discover valuable insights on the mobility patterns across online communities. Our models explain the empirically observed human online movement trajectories remarkably well, and more importantly, sheds better light on the understanding of human cyberspace dynamics.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87201273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Evaluating User Actions as a Proxy for Email Significance 评估用户行为作为电子邮件重要性的代理

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313624

Tarfah Alrashed, Chia-Jung Lee, P. Bailey, Christopher E. Lin, Milad Shokouhi, S. Dumais

Email remains a critical channel for communicating information in both personal and work accounts. The number of emails people receive every day can be overwhelming, which in turn creates challenges for efficient information management and consumption. Having a good estimate of the significance of emails forms the foundation for many downstream tasks (e.g. email prioritization); but determining significance at scale is expensive and challenging. In this work, we hypothesize that the cumulative set of actions on any individual email can be considered as a proxy for the perceived significance of that email. We propose two approaches to summarize observed actions on emails, which we then evaluate against the perceived significance. The first approach is a fixed-form utility function parameterized on a set of weights, and we study the impact of different weight assignment strategies. In the second approach, we build machine learning models to capture users' significance directly based on the observed actions. For evaluation, we collect human judgments on email significance for both personal and work emails. Our analysis suggests that there is a positive correlation between actions and significance of emails and that actions performed on personal and work emails are different. We also find that the degree of correlation varies across people, which may reflect the individualized nature of email activity patterns or significance. Subsequently, we develop an example of real-time email significance prediction by using action summaries as implicit feedback at scale. Evaluation results suggest that the resulting significance predictions have positive agreement with human assessments, albeit not at statistically strong levels. We speculate that we may require personalized significance prediction to improve agreement levels.

电子邮件仍然是个人和工作账户沟通信息的重要渠道。人们每天收到的电子邮件数量可能是压倒性的，这反过来又给有效的信息管理和消费带来了挑战。对电子邮件的重要性有一个很好的估计是许多下游任务的基础(例如电子邮件优先级);但要确定大规模的意义既昂贵又具有挑战性。在这项工作中，我们假设对任何单个电子邮件的累积操作集可以被认为是该电子邮件感知意义的代理。我们提出了两种方法来总结观察到的电子邮件行为，然后我们根据感知的重要性进行评估。第一种方法是将固定形式的效用函数参数化在一组权值上，并研究了不同权值分配策略的影响。在第二种方法中，我们建立机器学习模型，根据观察到的行为直接捕获用户的重要性。为了评估，我们收集了人类对个人邮件和工作邮件重要性的判断。我们的分析表明，行为与电子邮件的重要性之间存在正相关关系，而个人邮件和工作邮件的行为是不同的。我们还发现，人与人之间的关联程度不同，这可能反映了电子邮件活动模式或重要性的个性化本质。随后，我们开发了一个实时电子邮件重要性预测的例子，通过使用动作摘要作为大规模的隐式反馈。评估结果表明，由此产生的显著性预测与人类评估有积极的一致性，尽管在统计上没有很强的水平。我们推测，我们可能需要个性化的显著性预测来提高协议水平。

{"title":"Evaluating User Actions as a Proxy for Email Significance","authors":"Tarfah Alrashed, Chia-Jung Lee, P. Bailey, Christopher E. Lin, Milad Shokouhi, S. Dumais","doi":"10.1145/3308558.3313624","DOIUrl":"https://doi.org/10.1145/3308558.3313624","url":null,"abstract":"Email remains a critical channel for communicating information in both personal and work accounts. The number of emails people receive every day can be overwhelming, which in turn creates challenges for efficient information management and consumption. Having a good estimate of the significance of emails forms the foundation for many downstream tasks (e.g. email prioritization); but determining significance at scale is expensive and challenging. In this work, we hypothesize that the cumulative set of actions on any individual email can be considered as a proxy for the perceived significance of that email. We propose two approaches to summarize observed actions on emails, which we then evaluate against the perceived significance. The first approach is a fixed-form utility function parameterized on a set of weights, and we study the impact of different weight assignment strategies. In the second approach, we build machine learning models to capture users' significance directly based on the observed actions. For evaluation, we collect human judgments on email significance for both personal and work emails. Our analysis suggests that there is a positive correlation between actions and significance of emails and that actions performed on personal and work emails are different. We also find that the degree of correlation varies across people, which may reflect the individualized nature of email activity patterns or significance. Subsequently, we develop an example of real-time email significance prediction by using action summaries as implicit feedback at scale. Evaluation results suggest that the resulting significance predictions have positive agreement with human assessments, albeit not at statistically strong levels. We speculate that we may require personalized significance prediction to improve agreement levels.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89728608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Web Experience in Mobile Networks: Lessons from Two Million Page Visits 移动网络的网络体验:两百万页面访问的经验教训

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313606

Mohammad Rajiullah, Andra Lutu, Ali Safari Khatouni, Mah-Rukh Fida, M. Mellia, A. Brunström, Özgü Alay, Stefan Alfredsson, V. Mancuso

Measuring and characterizing web page performance is a challenging task. When it comes to the mobile world, the highly varying technology characteristics coupled with the opaque network configuration make it even more difficult. Aiming at reproducibility, we present a large scale empirical study of web page performance collected in eleven commercial mobile networks spanning four countries. By digging into measurement from nearly two million web browsing sessions, we shed light on the impact of different web protocols, browsers, and mobile technologies on the web performance. We find that the impact of mobile broadband access is sizeable. For example, the median page load time using mobile broadband increases by a third compared to wired access. Mobility clearly stresses the system, with handover causing the most evident performance penalties. Contrariwise, our measurements show that the adoption of HTTP/2 and QUIC has practically negligible impact. To understand the intertwining of all parameters, we adopt state-of-the-art statistical methods to identify the significance of different factors on the web performance. Our analysis confirms the importance of access technology and mobility context as well as webpage composition and browser. Our work highlights the importance of large-scale measurements. Even with our controlled setup, the complexity of the mobile web ecosystem is challenging to untangle. For this, we are releasing the dataset as open data for validation and further research.

测量和描述网页性能是一项具有挑战性的任务。当涉及到移动世界时，高度变化的技术特征加上不透明的网络配置使其更加困难。为了再现性，我们在四个国家的11个商业移动网络中收集了网页性能的大规模实证研究。通过对近200万个网络浏览会话的深入研究，我们揭示了不同的网络协议、浏览器和移动技术对网络性能的影响。我们发现移动宽带接入的影响是相当大的。例如，与有线访问相比，使用移动宽带的页面加载时间中位数增加了三分之一。移动性显然会对系统造成压力，切换会导致最明显的性能损失。相反，我们的测量表明，采用HTTP/2和QUIC的影响几乎可以忽略不计。为了理解所有参数的相互关系，我们采用最先进的统计方法来确定不同因素对web性能的重要性。我们的分析证实了访问技术和移动环境以及网页组成和浏览器的重要性。我们的工作强调了大规模测量的重要性。即使有了我们控制的设置，移动网络生态系统的复杂性也很难理清。为此，我们将数据集作为开放数据发布，以供验证和进一步研究。

{"title":"Web Experience in Mobile Networks: Lessons from Two Million Page Visits","authors":"Mohammad Rajiullah, Andra Lutu, Ali Safari Khatouni, Mah-Rukh Fida, M. Mellia, A. Brunström, Özgü Alay, Stefan Alfredsson, V. Mancuso","doi":"10.1145/3308558.3313606","DOIUrl":"https://doi.org/10.1145/3308558.3313606","url":null,"abstract":"Measuring and characterizing web page performance is a challenging task. When it comes to the mobile world, the highly varying technology characteristics coupled with the opaque network configuration make it even more difficult. Aiming at reproducibility, we present a large scale empirical study of web page performance collected in eleven commercial mobile networks spanning four countries. By digging into measurement from nearly two million web browsing sessions, we shed light on the impact of different web protocols, browsers, and mobile technologies on the web performance. We find that the impact of mobile broadband access is sizeable. For example, the median page load time using mobile broadband increases by a third compared to wired access. Mobility clearly stresses the system, with handover causing the most evident performance penalties. Contrariwise, our measurements show that the adoption of HTTP/2 and QUIC has practically negligible impact. To understand the intertwining of all parameters, we adopt state-of-the-art statistical methods to identify the significance of different factors on the web performance. Our analysis confirms the importance of access technology and mobility context as well as webpage composition and browser. Our work highlights the importance of large-scale measurements. Even with our controlled setup, the complexity of the mobile web ecosystem is challenging to untangle. For this, we are releasing the dataset as open data for validation and further research.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"313 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77509876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

CnGAN: Generative Adversarial Networks for Cross-network user preference generation for non-overlapped users CnGAN:用于非重叠用户的跨网络用户偏好生成的生成对抗网络

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313733

Dilruk Perera, Roger Zimmermann

A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are ignored. As a solution, we propose CnGAN, a novel multi-task learning based, encoder-GAN-recommender architecture. The proposed model synthetically generates source network user preferences for non-overlapped users by learning the mapping from target to source network preference manifolds. The resultant user preferences are used in a Siamese network based neural recommender architecture. Furthermore, we propose a novel user-based pairwise loss function for recommendations using implicit interactions to better guide the generation process in the multi-task learning environment. We illustrate our solution by generating user preferences on the Twitter source network for recommendations on the YouTube target network. Extensive experiments show that the generated preferences can be used to improve recommendations for non-overlapped users. The resultant recommendations achieve superior performance compared to the state-of-the-art cross-network recommender solutions in terms of accuracy, novelty and diversity.

跨网络推荐解决方案的一个主要缺点是它们只能应用于跨网络重叠的用户。因此，构成大多数用户的非重叠用户被忽略。作为解决方案，我们提出了CnGAN，一种新颖的基于多任务学习的编码器- gan -推荐架构。该模型通过学习目标网络到源网络偏好流形的映射，综合生成非重叠用户的源网络用户偏好。生成的用户偏好被用于基于Siamese网络的神经推荐架构。此外，我们提出了一种新的基于用户的配对损失函数，用于使用隐式交互的推荐，以更好地指导多任务学习环境下的生成过程。我们通过在Twitter源网络上为YouTube目标网络上的推荐生成用户偏好来说明我们的解决方案。大量实验表明，生成的偏好可以用于改进对非重叠用户的推荐。与最先进的跨网络推荐解决方案相比，由此产生的推荐在准确性、新颖性和多样性方面取得了卓越的性能。

{"title":"CnGAN: Generative Adversarial Networks for Cross-network user preference generation for non-overlapped users","authors":"Dilruk Perera, Roger Zimmermann","doi":"10.1145/3308558.3313733","DOIUrl":"https://doi.org/10.1145/3308558.3313733","url":null,"abstract":"A major drawback of cross-network recommender solutions is that they can only be applied to users that are overlapped across networks. Thus, the non-overlapped users, which form the majority of users are ignored. As a solution, we propose CnGAN, a novel multi-task learning based, encoder-GAN-recommender architecture. The proposed model synthetically generates source network user preferences for non-overlapped users by learning the mapping from target to source network preference manifolds. The resultant user preferences are used in a Siamese network based neural recommender architecture. Furthermore, we propose a novel user-based pairwise loss function for recommendations using implicit interactions to better guide the generation process in the multi-task learning environment. We illustrate our solution by generating user preferences on the Twitter source network for recommendations on the YouTube target network. Extensive experiments show that the generated preferences can be used to improve recommendations for non-overlapped users. The resultant recommendations achieve superior performance compared to the state-of-the-art cross-network recommender solutions in terms of accuracy, novelty and diversity.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90101996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Detection and Analysis of Self-Disclosure in Online News Commentaries 网络新闻评论中自我表露的检测与分析

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313669

Prasanna Umar, A. Squicciarini, S. Rajtmajer

Online users engage in self-disclosure - revealing personal information to others - in pursuit of social rewards. However, there are associated costs of disclosure to users' privacy. User profiling techniques support the use of contributed content for a number of purposes, e.g., micro-targeting advertisements. In this paper, we study self-disclosure as it occurs in newspaper comment forums. We explore a longitudinal dataset of about 60,000 comments on 2202 news articles from four major English news websites. We start with detection of language indicative of various types of self-disclosure, leveraging both syntactic and semantic information present in texts. Specifically, we use dependency parsing for subject, verb, and object extraction from sentences, in conjunction with named entity recognition to extract linguistic indicators of self-disclosure. We then use these indicators to examine the effects of anonymity and topic of discussion on self-disclosure. We find that anonymous users are more likely to self-disclose than identifiable users, and that self-disclosure varies across topics of discussion. Finally, we discuss the implications of our findings for user privacy.

网络用户为了追求社会回报而进行自我披露——向他人透露个人信息。然而，泄露用户隐私也有相关的成本。用户分析技术支持将贡献的内容用于多种目的，例如，微目标广告。在本文中，我们研究自我披露，因为它发生在报纸评论论坛。我们研究了来自四个主要英语新闻网站的2202篇新闻文章的约60,000条评论的纵向数据集。我们从检测各种类型的自我表露的语言开始，利用文本中存在的句法和语义信息。具体来说，我们使用依存关系解析从句子中提取主语、动词和宾语，并结合命名实体识别来提取自我表露的语言指标。然后，我们使用这些指标来检验匿名性和讨论主题对自我披露的影响。我们发现，匿名用户比可识别用户更有可能自我披露，而自我披露的程度因讨论的主题而异。最后，我们讨论了我们的发现对用户隐私的影响。

{"title":"Detection and Analysis of Self-Disclosure in Online News Commentaries","authors":"Prasanna Umar, A. Squicciarini, S. Rajtmajer","doi":"10.1145/3308558.3313669","DOIUrl":"https://doi.org/10.1145/3308558.3313669","url":null,"abstract":"Online users engage in self-disclosure - revealing personal information to others - in pursuit of social rewards. However, there are associated costs of disclosure to users' privacy. User profiling techniques support the use of contributed content for a number of purposes, e.g., micro-targeting advertisements. In this paper, we study self-disclosure as it occurs in newspaper comment forums. We explore a longitudinal dataset of about 60,000 comments on 2202 news articles from four major English news websites. We start with detection of language indicative of various types of self-disclosure, leveraging both syntactic and semantic information present in texts. Specifically, we use dependency parsing for subject, verb, and object extraction from sentences, in conjunction with named entity recognition to extract linguistic indicators of self-disclosure. We then use these indicators to examine the effects of anonymity and topic of discussion on self-disclosure. We find that anonymous users are more likely to self-disclose than identifiable users, and that self-disclosure varies across topics of discussion. Finally, we discuss the implications of our findings for user privacy.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79727223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Event Detection using Hierarchical Multi-Aspect Attention 基于分层多面向注意的事件检测

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313659

Sneha Mehta, Mohammad Raihanul Islam, H. Rangwala, Naren Ramakrishnan

Classical event encoding and extraction methods rely on fixed dictionaries of keywords and templates or require ground truth labels for phrase/sentences. This hinders widespread application of information encoding approaches to large-scale free form (unstructured) text available on the web. Event encoding can be viewed as a hierarchical task where the coarser level task is event detection, i.e., identification of documents containing a specific event, and where the fine-grained task is one of event encoding, i.e., identifying key phrases, key sentences. Hierarchical models with attention seem like a natural choice for this problem, given their ability to differentially attend to more or less important features when constructing document representations. In this work we present a novel factorized bilinear multi-aspect attention mechanism (FBMA) that attends to different aspects of text while constructing its representation. We find that our approach outperforms state-of-the-art baselines for detecting civil unrest, military action, and non-state actor events from corpora in two different languages.

经典的事件编码和提取方法依赖于固定的关键字和模板字典，或者需要为短语/句子提供基本的真值标签。这阻碍了信息编码方法在网络上大规模自由格式(非结构化)文本中的广泛应用。事件编码可以看作是一个分层任务，其中粗层次任务是事件检测，即识别包含特定事件的文档，而细粒度任务是事件编码之一，即识别关键短语、关键句子。考虑到它们在构建文档表示时能够不同地关注或多或少重要的特征，具有注意力的分层模型似乎是解决这个问题的自然选择。在这项工作中，我们提出了一种新的分解双线性多方面注意机制(FBMA)，该机制在构建文本表征的同时关注文本的不同方面。我们发现，在从两种不同语言的语料库中检测内乱、军事行动和非国家行为者事件方面，我们的方法优于最先进的基线。

引用次数: 18

Hack for Hire: Exploring the Emerging Market for Account Hijacking 黑客雇佣:探索账户劫持的新兴市场

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313489

A. Mirian, Joe DeBlasio, S. Savage, G. Voelker, Kurt Thomas

Email accounts represent an enticing target for attackers, both for the information they contain and the root of trust they provide to other connected web services. While defense-in-depth approaches such as phishing detection, risk analysis, and two-factor authentication help to stem large-scale hijackings, targeted attacks remain a potent threat due to the customization and effort involved. In this paper, we study a segment of targeted attackers known as “hack for hire” services to understand the playbook that attackers use to gain access to victim accounts. Posing as buyers, we interacted with 27 English, Russian, and Chinese blackmarket services, only five of which succeeded in attacking synthetic (though realistic) identities we controlled. Attackers primarily relied on tailored phishing messages, with enough sophistication to bypass SMS two-factor authentication. However, despite the ability to successfully deliver account access, the market exhibited low volume, poor customer service, and had multiple scammers. As such, we surmise that retail email hijacking has yet to mature to the level of other criminal market segments.

电子邮件帐户对攻击者来说是一个诱人的目标，因为它们包含的信息和它们提供给其他连接的web服务的信任根。虽然诸如网络钓鱼检测、风险分析和双因素身份验证等深度防御方法有助于阻止大规模劫持，但由于所涉及的定制和努力，有针对性的攻击仍然是一个强大的威胁。在本文中，我们研究了一部分被称为“黑客雇佣”服务的目标攻击者，以了解攻击者用来访问受害者账户的剧本。我们假扮成买家，与27个英国、俄罗斯和中国的黑市服务机构进行了互动，其中只有5个成功地攻击了我们控制的合成(尽管是真实的)身份。攻击者主要依靠定制的网络钓鱼消息，这些消息具有足够的复杂性，可以绕过短信的双因素身份验证。然而，尽管有能力成功地提供账户访问，但市场表现出低容量，差的客户服务，并且有多个骗子。因此，我们推测零售电子邮件劫持尚未成熟到其他犯罪市场的水平。

引用次数: 35

Learning Intent to Book Metrics for Airbnb Search 学习意图为Airbnb搜索预订指标

The World Wide Web Conference

Pub Date : 2019-05-13 DOI: 10.1145/3308558.3313648

B. Turnbull

Airbnb is a two-sided rental marketplace offering a variety of unique and more traditional accommodation options. Similar to other online marketplaces we invest in optimizing the content surfaced on the search UI and ranking relevance to improve the guest online search experience. The unique Airbnb inventory, however, surfaces some major data challenges. Given the high stakes of booking less traditional accommodations, users can spend many days to weeks searching and scanning the description page of many accommodation ”listings” before making a decision to book. Moreover, much of the information about a listing is unstructured and can only be found by the user after they go through the details on the listing page. As a result, we have found traditional search metrics do not work well in the context of our platform. Basic metrics of single user actions, such as click-through-rates, number of listings viewed, or dwell time, are not consistently directionally correlated with our downstream business metrics. To address these issues we leverage machine learning to isolate signals of intent from rich behavioral data. These signals have key applications including analytical insights, ranking modeling inputs, and experimentation velocity. In this paper, we describe the development of a model-based user intent metric, ”intentful listing view”, which combines the signals of a variety of user micro-actions on the listing description page. We demonstrate this learned metric is directionally correlated with downstream conversion metrics and sensitive across a variety of historical search experiments.

Airbnb是一个双边租赁市场，提供各种独特和更传统的住宿选择。与其他在线市场类似，我们投资于优化搜索UI上的内容和排名相关性，以改善客人的在线搜索体验。然而，独特的Airbnb库存暴露了一些重大的数据挑战。考虑到预订不太传统的住宿的高风险，用户在做出预订决定之前，可能会花费数天到数周的时间搜索和浏览许多住宿“列表”的描述页面。此外，关于列表的许多信息是非结构化的，只有在用户浏览列表页面上的详细信息后才能找到。因此，我们发现传统的搜索指标在我们的平台上并不适用。单个用户行为的基本指标，如点击率、浏览列表的数量或停留时间，与我们的下游业务指标并不一致。为了解决这些问题，我们利用机器学习从丰富的行为数据中分离出意图信号。这些信号具有关键的应用，包括分析见解、建模输入排序和实验速度。在本文中，我们描述了一个基于模型的用户意图度量的发展，“有意上市视图”，它结合了上市描述页面上各种用户微动作的信号。我们证明了这种学习的度量与下游转换度量方向相关，并且在各种历史搜索实验中都很敏感。

{"title":"Learning Intent to Book Metrics for Airbnb Search","authors":"B. Turnbull","doi":"10.1145/3308558.3313648","DOIUrl":"https://doi.org/10.1145/3308558.3313648","url":null,"abstract":"Airbnb is a two-sided rental marketplace offering a variety of unique and more traditional accommodation options. Similar to other online marketplaces we invest in optimizing the content surfaced on the search UI and ranking relevance to improve the guest online search experience. The unique Airbnb inventory, however, surfaces some major data challenges. Given the high stakes of booking less traditional accommodations, users can spend many days to weeks searching and scanning the description page of many accommodation ”listings” before making a decision to book. Moreover, much of the information about a listing is unstructured and can only be found by the user after they go through the details on the listing page. As a result, we have found traditional search metrics do not work well in the context of our platform. Basic metrics of single user actions, such as click-through-rates, number of listings viewed, or dwell time, are not consistently directionally correlated with our downstream business metrics. To address these issues we leverage machine learning to isolate signals of intent from rich behavioral data. These signals have key applications including analytical insights, ranking modeling inputs, and experimentation velocity. In this paper, we describe the development of a model-based user intent metric, ”intentful listing view”, which combines the signals of a variety of user micro-actions on the listing description page. We demonstrate this learned metric is directionally correlated with downstream conversion metrics and sensitive across a variety of historical search experiments.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"97 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80235621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7