Proceedings of the 25th International Conference on World Wide Web最新文献_第10页

Detecting Evolution of Concepts based on Cause-Effect Relationships in Online Reviews 基于因果关系的在线评论概念演化检测

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883013

Yating Zhang, A. Jatowt, Katsumi Tanaka

Analyzing how technology evolves is important for understanding technological progress and its impact on society. Although the concept of evolution has been explored in many domains (e.g., evolution of topics, events or terminology, evolution of species), little research has been done on automatically analyzing the evolution of products and technology in general. In this paper, we propose a novel approach for investigating the technology evolution based on collections of product reviews. We are particularly interested in understanding social impact of technology and in discovering how changes of product features influence changes in our social lives. We address this challenge by first distinguishing two kinds of product-related terms: physical product features and terms describing situations when products are used. We then detect changes in both types of terms over time by tracking fluctuations in their popularity and usage. Finally, we discover cases when changes of physical product features trigger the changes in product's use. We experimentally demonstrate the effectiveness of our approach on the Amazon Product Review Dataset that spans over 18 years.

分析技术如何演变对于理解技术进步及其对社会的影响非常重要。虽然进化的概念已经在许多领域进行了探索(例如，主题，事件或术语的进化，物种的进化)，但在一般情况下对产品和技术的进化进行自动分析的研究很少。在本文中，我们提出了一种基于产品评论集合来研究技术演变的新方法。我们特别感兴趣的是了解技术的社会影响，并发现产品功能的变化如何影响我们社会生活的变化。我们通过首先区分两种与产品相关的术语来解决这一挑战:物理产品特性和描述产品使用情况的术语。然后，我们通过跟踪这两种类型的术语的受欢迎程度和使用情况的波动来检测它们随时间的变化。最后，我们发现了产品物理特性的变化引发产品使用变化的案例。我们通过实验证明了我们的方法在跨越18年的亚马逊产品评论数据集上的有效性。

{"title":"Detecting Evolution of Concepts based on Cause-Effect Relationships in Online Reviews","authors":"Yating Zhang, A. Jatowt, Katsumi Tanaka","doi":"10.1145/2872427.2883013","DOIUrl":"https://doi.org/10.1145/2872427.2883013","url":null,"abstract":"Analyzing how technology evolves is important for understanding technological progress and its impact on society. Although the concept of evolution has been explored in many domains (e.g., evolution of topics, events or terminology, evolution of species), little research has been done on automatically analyzing the evolution of products and technology in general. In this paper, we propose a novel approach for investigating the technology evolution based on collections of product reviews. We are particularly interested in understanding social impact of technology and in discovering how changes of product features influence changes in our social lives. We address this challenge by first distinguishing two kinds of product-related terms: physical product features and terms describing situations when products are used. We then detect changes in both types of terms over time by tracking fluctuations in their popularity and usage. Finally, we discover cases when changes of physical product features trigger the changes in product's use. We experimentally demonstrate the effectiveness of our approach on the Amazon Product Review Dataset that spans over 18 years.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80894323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Socialized Language Model Smoothing via Bi-directional Influence Propagation on Social Networks 基于社交网络双向影响传播的社会化语言模型平滑

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2874811

Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He

In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become "giants", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their "Achilles Heel", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.

近年来，在线社交网络是全球最受欢迎的高PV (Page View)网站之一，因为它更新了信息发现和传播的方式。数以百万计的用户在这些网站上注册，因此每天产生大量的用户生成的内容。社交网络成为“巨人”，可能有资格进行任何研究任务。然而，我们已经指出，这些巨人仍然遭受他们的“阿喀琉斯之踵”，即极端稀疏。与整个集合的海量数据相比，微博等个人发布文档在各种研究场景下显得过于稀疏，无法发挥作用，而实际上这些帖子是不同的。在本文中，我们提出通过影响传播平滑语言模型来解决社交网络的阿喀琉斯之踵。为了进一步解决稀疏性问题，我们利用从传播中学习到的双向影响扩展了社会化语言模型平滑。从直观上看，如果没有方向，不能区分信息源和目标之间传播的影响是不够的。因此，我们制定了一个双向社会化因素图模型，该模型既利用了文档对之间的文本相关性，也利用了文档背后的社会化增强网络，如用户关系和社会互动。将这些因素建模为文档及其相应用户之间的属性和依赖关系，然后在方向级别上进行区分。我们提出了一种有效的学习算法来学习所提出的带方向的因子图模型。最后，我们根据估计的影响将术语计数传播到平滑文档。我们在推特和微博两个本能数据集上进行实验。实验结果验证了该模型的有效性。通过将方向信息整合到社会化语言模型平滑中，我们的方法在基于困惑度、nDCG和MAP测量的内在和外在评估方面都优于几种替代方法。

{"title":"Socialized Language Model Smoothing via Bi-directional Influence Propagation on Social Networks","authors":"Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He","doi":"10.1145/2872427.2874811","DOIUrl":"https://doi.org/10.1145/2872427.2874811","url":null,"abstract":"In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become \"giants\", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their \"Achilles Heel\", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85230782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Detecting Good Abandonment in Mobile Search 在移动搜索中发现好的放弃

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883074

Kyle Williams, Julia Kiseleva, Aidan C. Crook, I. Zitouni, Ahmed Hassan Awadallah, Madian Khabsa

Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered as leading to user dissatisfaction. However, there are many cases where a user may not click on any search result page (SERP) but still be satisfied. This scenario is referred to as good abandonment and presents a challenge for most approaches measuring search satisfaction, which are usually based on clicks and dwell time. The problem is exacerbated further on mobile devices where search providers try to increase the likelihood of users being satisfied directly by the SERP. This paper proposes a solution to this problem using gesture interactions, such as reading times and touch actions, as signals for differentiating between good and bad abandonment. These signals go beyond clicks and characterize user behavior in cases where clicks are not needed to achieve satisfaction. We study different good abandonment scenarios and investigate the different elements on a SERP that may lead to good abandonment. We also present an analysis of the correlation between user gesture features and satisfaction. Finally, we use this analysis to build models to automatically identify good abandonment in mobile search achieving an accuracy of 75%, which is significantly better than considering query and session signals alone. Our findings have implications for the study and application of user satisfaction in search systems.

没有点击的Web搜索查询被称为放弃查询，通常被认为会导致用户不满意。然而，在许多情况下，用户可能不点击任何搜索结果页面(SERP)，但仍然感到满意。这种情况被称为良好的放弃，对大多数衡量搜索满意度的方法提出了挑战，这些方法通常基于点击和停留时间。这个问题在移动设备上进一步恶化，因为搜索提供商试图增加用户直接通过SERP满意的可能性。本文提出了一个解决方案，使用手势交互，如阅读时间和触摸动作，作为区分好的和坏的放弃的信号。这些信号超越了点击，并在不需要点击来获得满足感的情况下描述用户行为。我们研究了不同的良好放弃情景，并调查了SERP中可能导致良好放弃的不同因素。我们还分析了用户手势特征与满意度之间的关系。最后，我们使用这个分析来建立模型来自动识别移动搜索中的良好放弃，达到75%的准确率，这比单独考虑查询和会话信号要好得多。我们的研究结果对搜索系统中用户满意度的研究和应用具有启示意义。

{"title":"Detecting Good Abandonment in Mobile Search","authors":"Kyle Williams, Julia Kiseleva, Aidan C. Crook, I. Zitouni, Ahmed Hassan Awadallah, Madian Khabsa","doi":"10.1145/2872427.2883074","DOIUrl":"https://doi.org/10.1145/2872427.2883074","url":null,"abstract":"Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered as leading to user dissatisfaction. However, there are many cases where a user may not click on any search result page (SERP) but still be satisfied. This scenario is referred to as good abandonment and presents a challenge for most approaches measuring search satisfaction, which are usually based on clicks and dwell time. The problem is exacerbated further on mobile devices where search providers try to increase the likelihood of users being satisfied directly by the SERP. This paper proposes a solution to this problem using gesture interactions, such as reading times and touch actions, as signals for differentiating between good and bad abandonment. These signals go beyond clicks and characterize user behavior in cases where clicks are not needed to achieve satisfaction. We study different good abandonment scenarios and investigate the different elements on a SERP that may lead to good abandonment. We also present an analysis of the correlation between user gesture features and satisfaction. Finally, we use this analysis to build models to automatically identify good abandonment in mobile search achieving an accuracy of 75%, which is significantly better than considering query and session signals alone. Our findings have implications for the study and application of user satisfaction in search systems.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79331819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 57

From Social Machines to Social Protocols: Software Engineering Foundations for Sociotechnical Systems 从社会机器到社会协议:社会技术系统的软件工程基础

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883018

A. Chopra, Munindar P. Singh

The overarching vision of social machines is to facilitate social processes by having computers provide administrative support. We conceive of a social machine as a sociotechnical system (STS): a software-supported system in which autonomous principals such as humans and organizations interact to exchange information and services. Existing approaches for social machines emphasize the technical aspects and inadequately support the meanings of social processes, leaving them informally realized in human interactions. We posit that a fundamental rethinking is needed to incorporate accountability, essential for addressing the openness of the Web and the autonomy of its principals. We introduce Interaction-Oriented Software Engineering (IOSE) as a paradigm expressly suited to capturing the social basis of STSs. Motivated by promoting openness and autonomy, IOSE focuses not on implementation but on social protocols, specifying how social relationships, characterizing the accountability of the concerned parties, progress as they interact. Motivated by providing computational support, IOSE adopts the accountability representation to capture the meaning of a social machine's states and transitions. We demonstrate IOSE via examples drawn from healthcare. We reinterpret the classical software engineering (SE) principles for the STS setting and show how IOSE is better suited than traditional software engineering for supporting social processes. The contribution of this paper is a new paradigm for STSs, evaluated via conceptual analysis.

社会机器的总体愿景是通过计算机提供管理支持来促进社会进程。我们认为社会机器是一个社会技术系统(STS):一个软件支持的系统，在这个系统中，自主主体(如人类和组织)进行交互以交换信息和服务。社会机器的现有方法强调技术方面，并没有充分支持社会过程的意义，使它们在人类互动中非正式地实现。我们认为，需要进行根本性的反思，以纳入问责制，这对于解决网络的开放性和其主体的自主性至关重要。我们介绍了面向交互的软件工程(IOSE)作为一种范例，明确适合于捕获sts的社会基础。在促进开放和自治的激励下，IOSE的重点不是执行，而是社会协议，具体说明社会关系如何在互动中取得进展，并描述有关各方的问责制。出于提供计算支持的动机，IOSE采用问责制表示来捕捉社交机器的状态和转换的含义。我们通过医疗保健领域的例子来演示IOSE。我们为STS设置重新解释了经典软件工程(SE)原则，并展示了IOSE如何比传统软件工程更适合支持社会过程。本文的贡献是通过概念分析来评估STSs的新范式。

{"title":"From Social Machines to Social Protocols: Software Engineering Foundations for Sociotechnical Systems","authors":"A. Chopra, Munindar P. Singh","doi":"10.1145/2872427.2883018","DOIUrl":"https://doi.org/10.1145/2872427.2883018","url":null,"abstract":"The overarching vision of social machines is to facilitate social processes by having computers provide administrative support. We conceive of a social machine as a sociotechnical system (STS): a software-supported system in which autonomous principals such as humans and organizations interact to exchange information and services. Existing approaches for social machines emphasize the technical aspects and inadequately support the meanings of social processes, leaving them informally realized in human interactions. We posit that a fundamental rethinking is needed to incorporate accountability, essential for addressing the openness of the Web and the autonomy of its principals. We introduce Interaction-Oriented Software Engineering (IOSE) as a paradigm expressly suited to capturing the social basis of STSs. Motivated by promoting openness and autonomy, IOSE focuses not on implementation but on social protocols, specifying how social relationships, characterizing the accountability of the concerned parties, progress as they interact. Motivated by providing computational support, IOSE adopts the accountability representation to capture the meaning of a social machine's states and transitions. We demonstrate IOSE via examples drawn from healthcare. We reinterpret the classical software engineering (SE) principles for the STS setting and show how IOSE is better suited than traditional software engineering for supporting social processes. The contribution of this paper is a new paradigm for STSs, evaluated via conceptual analysis.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"71 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79581016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work? 众包网站隐私政策注释:真的可行吗?

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2883035

Shomir Wilson, F. Schaub, R. Ramanath, N. Sadeh, Fei Liu, Noah A. Smith, Frederick Liu

Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.

网站的隐私政策通常很长，很难理解。虽然研究表明，互联网用户关心他们的隐私，但他们没有时间去了解他们访问的每个网站的政策，而且大多数用户几乎从不阅读隐私政策。最近的一些努力旨在众包隐私政策的解释，并使用由此产生的注释来构建更有效的用户界面，为用户提供突出的策略摘要。然而，很少有人关注众包隐私政策注释的准确性和可扩展性，众包工作者可以有效回答的问题类型，以及如何提高他们的生产力。先前的研究表明，大多数互联网用户通常很难理解隐私政策，这表明众包方法的有效性有限。在本文中，我们评估了众包隐私策略注释的可行性。我们的结果表明，如果仔细部署，众包确实可以生成重要的注释，还可以帮助识别策略中的模糊元素。我们进一步介绍并评估了一种通过预测和突出显示与特定数据实践相关的段落来改进注释过程的方法。

{"title":"Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work?","authors":"Shomir Wilson, F. Schaub, R. Ramanath, N. Sadeh, Fei Liu, Noah A. Smith, Frederick Liu","doi":"10.1145/2872427.2883035","DOIUrl":"https://doi.org/10.1145/2872427.2883035","url":null,"abstract":"Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82693862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 99

Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach 面向移动查询自动完成:一种高效的移动应用感知方法

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-11 DOI: 10.1145/2872427.2882977

Aston Zhang, Amit Goyal, R. Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng

We study the new mobile query auto-completion (QAC) problem to exploit mobile devices' exclusive signals, such as those related to mobile applications (apps). We propose AppAware, a novel QAC model using installed app and recently opened app signals to suggest queries for matching input prefixes on mobile devices. To overcome the challenge of noisy and voluminous signals, AppAware optimizes composite objectives with a lighter processing cost at a linear rate of convergence. We conduct experiments on a large commercial data set of mobile queries and apps. Installed app and recently opened app signals consistently and significantly boost the accuracy of various baseline QAC models on mobile devices.

我们研究了新的移动查询自动完成(QAC)问题，以利用移动设备的排他性信号，例如与移动应用程序(app)相关的信号。我们提出了AppAware，这是一个新颖的QAC模型，使用已安装的应用程序和最近打开的应用程序信号来建议在移动设备上匹配输入前缀的查询。为了克服噪声和海量信号的挑战，AppAware以线性收敛速度优化了复合目标，降低了处理成本。我们在移动查询和应用程序的大型商业数据集上进行实验。安装的应用程序和最近打开的应用程序信号一致并显著提高了移动设备上各种基线QAC模型的准确性。

引用次数: 13

The QWERTY Effect on the Web: How Typing Shapes the Meaning of Words in Online Human-Computer Interaction QWERTY对网络的影响:打字如何在在线人机交互中塑造单词的意义

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-04-08 DOI: 10.1145/2872427.2883019

David García, M. Strohmaier

The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in text interpretations by web users. Furthermore, we investigate whether writing text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of "dactilar onomatopoeias" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.

QWERTY效应假设键盘布局通过将正面与使用右手和负面与使用左手联系起来来影响单词含义。例如，先前的研究已经确定，在小规模实验中，人类受试者对右手字母较多的单词的评价比左手字母较多的单词更积极。在本文中，我们对网络上的QWERTY效应进行了大规模的调查。利用11个与产品、电影、书籍和视频相关的网络平台的数据，我们对网络用户的文本解释是否存在手-意关系进行了观察性测试。此外，我们通过在另外四个数据集中分析在线评论文本与其星级评分之间的关系，研究了在网络上编写文本是否也表现出QWERTY效应。总的来说，我们在文本解释(解码)和文本创建(编码)方面都找到了QWERTY效应的有力证据。我们还发现在哪些条件下效果可能不成立。我们的发现对任何旨在评估网络上单词含义的算法方法都有启示，包括例如语义或情感分析，并显示了“动态拟声词”的存在，这些拟声词塑造了单词含义关联的动态。据我们所知，这是第一个揭示QWERTY效应在网络上大规模人机交互中存在的程度的工作。

{"title":"The QWERTY Effect on the Web: How Typing Shapes the Meaning of Words in Online Human-Computer Interaction","authors":"David García, M. Strohmaier","doi":"10.1145/2872427.2883019","DOIUrl":"https://doi.org/10.1145/2872427.2883019","url":null,"abstract":"The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in text interpretations by web users. Furthermore, we investigate whether writing text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of \"dactilar onomatopoeias\" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78533530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior 平均出错:使用时间意识分析来更好地理解行为

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-03-22 DOI: 10.1145/2872427.2883083

Samuel Barbosa, D. Cosley, Amit Sharma, R. Cesar

Online communities provide a fertile ground for analyzing people's behavior and improving our understanding of social processes. Because both people and communities change over time, we argue that analyses of these communities that take time into account will lead to deeper and more accurate results. Using Reddit as an example, we study the evolution of users based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between users---yearly cohorts---we find wide differences in people's behavior, including comment activity, effort, and survival. Further, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson's Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.

在线社区为分析人们的行为和提高我们对社会过程的理解提供了肥沃的土壤。因为人和社区都会随着时间的推移而变化，我们认为，考虑到时间因素的社区分析将导致更深入、更准确的结果。以Reddit为例，我们基于2007 - 2014年的评论和提交数据研究了用户的演变。即使使用用户之间最简单的时间差异之一——每年的队列——我们也会发现人们的行为存在很大差异，包括评论活动、努力和生存。此外，不考虑时间会导致我们误解重要的现象。例如，我们观察到，平均评论长度在任何固定时间内都会减少，但每个用户群体的评论长度在最初突然下降后的同一时期内稳步增加，这是辛普森悖论的一个例子。根据在社区中的生存时间将队列划分为子队列提供了进一步的见解;特别是，较长寿的用户开始时的活跃度更高，发表的评论也更多、更短。这些发现不仅让我们对Reddit的用户进化有了更深入的了解，而且还提出了一些关于未来研究在线行为的有趣问题。

{"title":"Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior","authors":"Samuel Barbosa, D. Cosley, Amit Sharma, R. Cesar","doi":"10.1145/2872427.2883083","DOIUrl":"https://doi.org/10.1145/2872427.2883083","url":null,"abstract":"Online communities provide a fertile ground for analyzing people's behavior and improving our understanding of social processes. Because both people and communities change over time, we argue that analyses of these communities that take time into account will lead to deeper and more accurate results. Using Reddit as an example, we study the evolution of users based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between users---yearly cohorts---we find wide differences in people's behavior, including comment activity, effort, and survival. Further, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson's Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84874775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective 意大利大城市的生与死:一个手机数据的视角

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-03-13 DOI: 10.1145/2872427.2883084

Marco De Nadai, Jacopo Staiano, Roberto Larcher, N. Sebe, D. Quercia, B. Lepri

The Death and Life of Great American Cities was written in 1961 and is now one of the most influential book in city planning. In it, Jane Jacobs proposed four conditions that promote life in a city. However, these conditions have not been empirically tested until recently. This is mainly because it is hard to collect data about "city life". The city of Seoul recently collected pedestrian activity through surveys at an unprecedented scale, with an effort spanning more than a decade, allowing researchers to conduct the first study successfully testing Jacobs's conditions. In this paper, we identify a valuable alternative to the lengthy and costly collection of activity survey data: mobile phone data. We extract human activity from such data, collect land use and socio-demographic information from the Italian Census and Open Street Map, and test the four conditions in six Italian cities. Although these cities are very different from the places for which Jacobs's conditions were spelled out (i.e., great American cities) and from the places in which they were recently tested (i.e., the Asian city of Seoul), we find those conditions to be indeed associated with urban life in Italy as well. Our methodology promises to have a great impact on urban studies, not least because, if replicated, it will make it possible to test Jacobs's theories at scale.

《美国大城市的死与生》写于1961年，现在是城市规划领域最具影响力的书籍之一。简·雅各布斯在书中提出了促进城市生活的四个条件。然而，这些条件直到最近才得到实证检验。这主要是因为很难收集有关“城市生活”的数据。首尔市最近通过调查收集了空前规模的行人活动，这一努力跨越了十多年，使研究人员能够进行第一次成功测试雅各布斯条件的研究。在本文中，我们确定了一个有价值的替代活动调查数据的冗长和昂贵的收集:移动电话数据。我们从这些数据中提取人类活动，从意大利人口普查和开放街道地图中收集土地使用和社会人口统计信息，并在六个意大利城市测试了四种条件。尽管这些城市与雅各布斯提出条件的地方(即美国大城市)以及最近测试的地方(即亚洲城市首尔)非常不同，但我们发现这些条件确实也与意大利的城市生活有关。我们的方法有望对城市研究产生重大影响，尤其是因为，如果被复制，它将有可能在规模上测试雅各布斯的理论。

{"title":"The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective","authors":"Marco De Nadai, Jacopo Staiano, Roberto Larcher, N. Sebe, D. Quercia, B. Lepri","doi":"10.1145/2872427.2883084","DOIUrl":"https://doi.org/10.1145/2872427.2883084","url":null,"abstract":"The Death and Life of Great American Cities was written in 1961 and is now one of the most influential book in city planning. In it, Jane Jacobs proposed four conditions that promote life in a city. However, these conditions have not been empirically tested until recently. This is mainly because it is hard to collect data about \"city life\". The city of Seoul recently collected pedestrian activity through surveys at an unprecedented scale, with an effort spanning more than a decade, allowing researchers to conduct the first study successfully testing Jacobs's conditions. In this paper, we identify a valuable alternative to the lengthy and costly collection of activity survey data: mobile phone data. We extract human activity from such data, collect land use and socio-demographic information from the Italian Census and Open Street Map, and test the four conditions in six Italian cities. Although these cities are very different from the places for which Jacobs's conditions were spelled out (i.e., great American cities) and from the places in which they were recently tested (i.e., the Asian city of Seoul), we find those conditions to be indeed associated with urban life in Italy as well. Our methodology promises to have a great impact on urban studies, not least because, if replicated, it will make it possible to test Jacobs's theories at scale.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90327345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 171

A Field Guide to Personalized Reserve Prices 个性化保留价格的现场指南

Proceedings of the 25th International Conference on World Wide Web

Pub Date : 2016-02-24 DOI: 10.1145/2872427.2883071

R. Leme, Martin Pál, Sergei Vassilvitskii

We study the question of setting and testing reserve prices in single item auctions when the bidders are not identical. At a high level, there are two generalizations of the standard second price auction: in the lazy version we first determine the winner, and then apply reserve prices; in the eager version we first discard the bidders not meeting their reserves, and then determine the winner among the rest. We show that the two versions have dramatically different properties: lazy reserves are easy to optimize, and A/B test in production, whereas eager reserves always lead to higher welfare, but their optimization is NP-complete, and naive A/B testing will lead to incorrect conclusions. Despite their different characteristics, we show that the overall revenue for the two scenarios is always within a factor of 2 of each other, even in the presence of correlated bids. Moreover, we prove that the eager auction dominates the lazy auction on revenue whenever the bidders are independent or symmetric. We complement our theoretical results with simulations on real world data that show that even suboptimally set eager reserve prices are preferred from a revenue standpoint.

研究了投标人不相同的单品拍卖中保留价格的设定和检验问题。在高层次上，标准的二次价格拍卖有两种概括:在惰性拍卖中，我们首先确定获胜者，然后应用保留价格;在急切版本中，我们首先放弃未达到其储备的投标人，然后在其余人中确定获胜者。我们发现这两个版本具有显著不同的性质:惰性储量易于优化，并且可以在生产中进行A/B测试，而渴望储量总是会带来更高的福利，但它们的优化是np完全的，天真的A/B测试会导致错误的结论。尽管它们的特征不同，但我们表明，即使在存在相关出价的情况下，这两种情况的总体收益总是在彼此的2倍之内。此外，我们还证明了无论竞标者是独立的还是对称的，在收益上，渴望拍卖优于懒惰拍卖。我们用对现实世界数据的模拟来补充我们的理论结果，这些数据表明，从收入的角度来看，即使是次优设置的热切保留价格也是首选的。

{"title":"A Field Guide to Personalized Reserve Prices","authors":"R. Leme, Martin Pál, Sergei Vassilvitskii","doi":"10.1145/2872427.2883071","DOIUrl":"https://doi.org/10.1145/2872427.2883071","url":null,"abstract":"We study the question of setting and testing reserve prices in single item auctions when the bidders are not identical. At a high level, there are two generalizations of the standard second price auction: in the lazy version we first determine the winner, and then apply reserve prices; in the eager version we first discard the bidders not meeting their reserves, and then determine the winner among the rest. We show that the two versions have dramatically different properties: lazy reserves are easy to optimize, and A/B test in production, whereas eager reserves always lead to higher welfare, but their optimization is NP-complete, and naive A/B testing will lead to incorrect conclusions. Despite their different characteristics, we show that the overall revenue for the two scenarios is always within a factor of 2 of each other, even in the presence of correlated bids. Moreover, we prove that the eager auction dominates the lazy auction on revenue whenever the bidders are independent or symmetric. We complement our theoretical results with simulations on real world data that show that even suboptimally set eager reserve prices are preferred from a revenue standpoint.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89867683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68