Analyzing how technology evolves is important for understanding technological progress and its impact on society. Although the concept of evolution has been explored in many domains (e.g., evolution of topics, events or terminology, evolution of species), little research has been done on automatically analyzing the evolution of products and technology in general. In this paper, we propose a novel approach for investigating the technology evolution based on collections of product reviews. We are particularly interested in understanding social impact of technology and in discovering how changes of product features influence changes in our social lives. We address this challenge by first distinguishing two kinds of product-related terms: physical product features and terms describing situations when products are used. We then detect changes in both types of terms over time by tracking fluctuations in their popularity and usage. Finally, we discover cases when changes of physical product features trigger the changes in product's use. We experimentally demonstrate the effectiveness of our approach on the Amazon Product Review Dataset that spans over 18 years.
{"title":"Detecting Evolution of Concepts based on Cause-Effect Relationships in Online Reviews","authors":"Yating Zhang, A. Jatowt, Katsumi Tanaka","doi":"10.1145/2872427.2883013","DOIUrl":"https://doi.org/10.1145/2872427.2883013","url":null,"abstract":"Analyzing how technology evolves is important for understanding technological progress and its impact on society. Although the concept of evolution has been explored in many domains (e.g., evolution of topics, events or terminology, evolution of species), little research has been done on automatically analyzing the evolution of products and technology in general. In this paper, we propose a novel approach for investigating the technology evolution based on collections of product reviews. We are particularly interested in understanding social impact of technology and in discovering how changes of product features influence changes in our social lives. We address this challenge by first distinguishing two kinds of product-related terms: physical product features and terms describing situations when products are used. We then detect changes in both types of terms over time by tracking fluctuations in their popularity and usage. Finally, we discover cases when changes of physical product features trigger the changes in product's use. We experimentally demonstrate the effectiveness of our approach on the Amazon Product Review Dataset that spans over 18 years.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80894323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He
In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become "giants", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their "Achilles Heel", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.
{"title":"Socialized Language Model Smoothing via Bi-directional Influence Propagation on Social Networks","authors":"Rui Yan, Cheng-te Li, Hsun-Ping Hsieh, P. Hu, Xiaohua Hu, Tingting He","doi":"10.1145/2872427.2874811","DOIUrl":"https://doi.org/10.1145/2872427.2874811","url":null,"abstract":"In recent years, online social networks are among the most popular websites with high PV (Page View) all over the world, as they have renewed the way for information discovery and distribution. Millions of users have registered on these websites and hence generate formidable amount of user-generated contents every day. The social networks become \"giants\", likely eligible to carry on any research tasks. However, we have pointed out that these giants still suffer from their \"Achilles Heel\", i.e., extreme sparsity. Compared with the extremely large data over the whole collection, individual posting documents such as microblogs seem to be too sparse to make a difference under various research scenarios, while actually these postings are different. In this paper we propose to tackle the Achilles Heel of social networks by smoothing the language model via influence propagation. To further our previously proposed work to tackle the sparsity issue, we extend the socialized language model smoothing with bi-directional influence learned from propagation. Intuitively, it is insufficient not to distinguish the influence propagated between information source and target without directions. Hence, we formulate a bi-directional socialized factor graph model, which utilizes both the textual correlations between document pairs and the socialized augmentation networks behind the documents, such as user relationships and social interactions. These factors are modeled as attributes and dependencies among documents and their corresponding users, and then are distinguished on the direction level. We propose an effective learning algorithm to learn the proposed factor graph model with directions. Finally we propagate term counts to smooth documents based on the estimated influence. We run experiments on two instinctive datasets of Twitter and Weibo. The results validate the effectiveness of the proposed model. By incorporating direction information into the socialized language model smoothing, our approach obtains improvement over several alternative methods on both intrinsic and extrinsic evaluations measured in terms of perplexity, nDCG and MAP measurements.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85230782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyle Williams, Julia Kiseleva, Aidan C. Crook, I. Zitouni, Ahmed Hassan Awadallah, Madian Khabsa
Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered as leading to user dissatisfaction. However, there are many cases where a user may not click on any search result page (SERP) but still be satisfied. This scenario is referred to as good abandonment and presents a challenge for most approaches measuring search satisfaction, which are usually based on clicks and dwell time. The problem is exacerbated further on mobile devices where search providers try to increase the likelihood of users being satisfied directly by the SERP. This paper proposes a solution to this problem using gesture interactions, such as reading times and touch actions, as signals for differentiating between good and bad abandonment. These signals go beyond clicks and characterize user behavior in cases where clicks are not needed to achieve satisfaction. We study different good abandonment scenarios and investigate the different elements on a SERP that may lead to good abandonment. We also present an analysis of the correlation between user gesture features and satisfaction. Finally, we use this analysis to build models to automatically identify good abandonment in mobile search achieving an accuracy of 75%, which is significantly better than considering query and session signals alone. Our findings have implications for the study and application of user satisfaction in search systems.
{"title":"Detecting Good Abandonment in Mobile Search","authors":"Kyle Williams, Julia Kiseleva, Aidan C. Crook, I. Zitouni, Ahmed Hassan Awadallah, Madian Khabsa","doi":"10.1145/2872427.2883074","DOIUrl":"https://doi.org/10.1145/2872427.2883074","url":null,"abstract":"Web search queries for which there are no clicks are referred to as abandoned queries and are usually considered as leading to user dissatisfaction. However, there are many cases where a user may not click on any search result page (SERP) but still be satisfied. This scenario is referred to as good abandonment and presents a challenge for most approaches measuring search satisfaction, which are usually based on clicks and dwell time. The problem is exacerbated further on mobile devices where search providers try to increase the likelihood of users being satisfied directly by the SERP. This paper proposes a solution to this problem using gesture interactions, such as reading times and touch actions, as signals for differentiating between good and bad abandonment. These signals go beyond clicks and characterize user behavior in cases where clicks are not needed to achieve satisfaction. We study different good abandonment scenarios and investigate the different elements on a SERP that may lead to good abandonment. We also present an analysis of the correlation between user gesture features and satisfaction. Finally, we use this analysis to build models to automatically identify good abandonment in mobile search achieving an accuracy of 75%, which is significantly better than considering query and session signals alone. Our findings have implications for the study and application of user satisfaction in search systems.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79331819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The overarching vision of social machines is to facilitate social processes by having computers provide administrative support. We conceive of a social machine as a sociotechnical system (STS): a software-supported system in which autonomous principals such as humans and organizations interact to exchange information and services. Existing approaches for social machines emphasize the technical aspects and inadequately support the meanings of social processes, leaving them informally realized in human interactions. We posit that a fundamental rethinking is needed to incorporate accountability, essential for addressing the openness of the Web and the autonomy of its principals. We introduce Interaction-Oriented Software Engineering (IOSE) as a paradigm expressly suited to capturing the social basis of STSs. Motivated by promoting openness and autonomy, IOSE focuses not on implementation but on social protocols, specifying how social relationships, characterizing the accountability of the concerned parties, progress as they interact. Motivated by providing computational support, IOSE adopts the accountability representation to capture the meaning of a social machine's states and transitions. We demonstrate IOSE via examples drawn from healthcare. We reinterpret the classical software engineering (SE) principles for the STS setting and show how IOSE is better suited than traditional software engineering for supporting social processes. The contribution of this paper is a new paradigm for STSs, evaluated via conceptual analysis.
{"title":"From Social Machines to Social Protocols: Software Engineering Foundations for Sociotechnical Systems","authors":"A. Chopra, Munindar P. Singh","doi":"10.1145/2872427.2883018","DOIUrl":"https://doi.org/10.1145/2872427.2883018","url":null,"abstract":"The overarching vision of social machines is to facilitate social processes by having computers provide administrative support. We conceive of a social machine as a sociotechnical system (STS): a software-supported system in which autonomous principals such as humans and organizations interact to exchange information and services. Existing approaches for social machines emphasize the technical aspects and inadequately support the meanings of social processes, leaving them informally realized in human interactions. We posit that a fundamental rethinking is needed to incorporate accountability, essential for addressing the openness of the Web and the autonomy of its principals. We introduce Interaction-Oriented Software Engineering (IOSE) as a paradigm expressly suited to capturing the social basis of STSs. Motivated by promoting openness and autonomy, IOSE focuses not on implementation but on social protocols, specifying how social relationships, characterizing the accountability of the concerned parties, progress as they interact. Motivated by providing computational support, IOSE adopts the accountability representation to capture the meaning of a social machine's states and transitions. We demonstrate IOSE via examples drawn from healthcare. We reinterpret the classical software engineering (SE) principles for the STS setting and show how IOSE is better suited than traditional software engineering for supporting social processes. The contribution of this paper is a new paradigm for STSs, evaluated via conceptual analysis.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79581016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shomir Wilson, F. Schaub, R. Ramanath, N. Sadeh, Fei Liu, Noah A. Smith, Frederick Liu
Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.
{"title":"Crowdsourcing Annotations for Websites' Privacy Policies: Can It Really Work?","authors":"Shomir Wilson, F. Schaub, R. Ramanath, N. Sadeh, Fei Liu, Noah A. Smith, Frederick Liu","doi":"10.1145/2872427.2883035","DOIUrl":"https://doi.org/10.1145/2872427.2883035","url":null,"abstract":"Website privacy policies are often long and difficult to understand. While research shows that Internet users care about their privacy, they do not have time to understand the policies of every website they visit, and most users hardly ever read privacy policies. Several recent efforts aim to crowdsource the interpretation of privacy policies and use the resulting annotations to build more effective user interfaces that provide users with salient policy summaries. However, very little attention has been devoted to studying the accuracy and scalability of crowdsourced privacy policy annotations, the types of questions crowdworkers can effectively answer, and the ways in which their productivity can be enhanced. Prior research indicates that most Internet users often have great difficulty understanding privacy policies, suggesting limits to the effectiveness of crowdsourcing approaches. In this paper, we assess the viability of crowdsourcing privacy policy annotations. Our results suggest that, if carefully deployed, crowdsourcing can indeed result in the generation of non-trivial annotations and can also help identify elements of ambiguity in policies. We further introduce and evaluate a method to improve the annotation process by predicting and highlighting paragraphs relevant to specific data practices.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82693862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aston Zhang, Amit Goyal, R. Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng
We study the new mobile query auto-completion (QAC) problem to exploit mobile devices' exclusive signals, such as those related to mobile applications (apps). We propose AppAware, a novel QAC model using installed app and recently opened app signals to suggest queries for matching input prefixes on mobile devices. To overcome the challenge of noisy and voluminous signals, AppAware optimizes composite objectives with a lighter processing cost at a linear rate of convergence. We conduct experiments on a large commercial data set of mobile queries and apps. Installed app and recently opened app signals consistently and significantly boost the accuracy of various baseline QAC models on mobile devices.
{"title":"Towards Mobile Query Auto-Completion: An Efficient Mobile Application-Aware Approach","authors":"Aston Zhang, Amit Goyal, R. Baeza-Yates, Yi Chang, Jiawei Han, Carl A. Gunter, Hongbo Deng","doi":"10.1145/2872427.2882977","DOIUrl":"https://doi.org/10.1145/2872427.2882977","url":null,"abstract":"We study the new mobile query auto-completion (QAC) problem to exploit mobile devices' exclusive signals, such as those related to mobile applications (apps). We propose AppAware, a novel QAC model using installed app and recently opened app signals to suggest queries for matching input prefixes on mobile devices. To overcome the challenge of noisy and voluminous signals, AppAware optimizes composite objectives with a lighter processing cost at a linear rate of convergence. We conduct experiments on a large commercial data set of mobile queries and apps. Installed app and recently opened app signals consistently and significantly boost the accuracy of various baseline QAC models on mobile devices.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84305926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in text interpretations by web users. Furthermore, we investigate whether writing text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of "dactilar onomatopoeias" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.
{"title":"The QWERTY Effect on the Web: How Typing Shapes the Meaning of Words in Online Human-Computer Interaction","authors":"David García, M. Strohmaier","doi":"10.1145/2872427.2883019","DOIUrl":"https://doi.org/10.1145/2872427.2883019","url":null,"abstract":"The QWERTY effect postulates that the keyboard layout influences word meanings by linking positivity to the use of the right hand and negativity to the use of the left hand. For example, previous research has established that words with more right hand letters are rated more positively than words with more left hand letters by human subjects in small scale experiments. In this paper, we perform large scale investigations of the QWERTY effect on the web. Using data from eleven web platforms related to products, movies, books, and videos, we conduct observational tests whether a hand-meaning relationship can be found in text interpretations by web users. Furthermore, we investigate whether writing text on the web exhibits the QWERTY effect as well, by analyzing the relationship between the text of online reviews and their star ratings in four additional datasets. Overall, we find robust evidence for the QWERTY effect both at the point of text interpretation (decoding) and at the point of text creation (encoding). We also find under which conditions the effect might not hold. Our findings have implications for any algorithmic method aiming to evaluate the meaning of words on the web, including for example semantic or sentiment analysis, and show the existence of \"dactilar onomatopoeias\" that shape the dynamics of word-meaning associations. To the best of our knowledge, this is the first work to reveal the extent to which the QWERTY effect exists in large scale human-computer interaction on the web.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78533530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ellery Wulczyn, Robert West, Leila Zia, Jure Leskovec
The different Wikipedia language editions vary dramatically in how comprehensive they are. As a result, most language editions contain only a small fraction of the sum of information that exists across all Wikipedias. In this paper, we present an approach to filling gaps in article coverage across different Wikipedia editions. Our main contribution is an end-to-end system for recommending articles for creation that exist in one language but are missing in another. The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests. We empirically validate our models in a controlled experiment involving 12,000 French Wikipedia editors. We find that personalizing recommendations increases editor engagement by a factor of two. Moreover, recommending articles increases their chance of being created by a factor of 3.2. Finally, articles created as a result of our recommendations are of comparable quality to organically created articles. Overall, our system leads to more engaged editors and faster growth of Wikipedia with no effect on its quality.
{"title":"Growing Wikipedia Across Languages via Recommendation.","authors":"Ellery Wulczyn, Robert West, Leila Zia, Jure Leskovec","doi":"10.1145/2872427.2883077","DOIUrl":"10.1145/2872427.2883077","url":null,"abstract":"<p><p>The different Wikipedia language editions vary dramatically in how comprehensive they are. As a result, most language editions contain only a small fraction of the sum of information that exists across all Wikipedias. In this paper, we present an approach to filling gaps in article coverage across different Wikipedia editions. Our main contribution is an end-to-end system for recommending articles for creation that exist in one language but are missing in another. The system involves identifying missing articles, ranking the missing articles according to their importance, and recommending important missing articles to editors based on their interests. We empirically validate our models in a controlled experiment involving 12,000 French Wikipedia editors. We find that personalizing recommendations increases editor engagement by a factor of two. Moreover, recommending articles increases their chance of being created by a factor of 3.2. Finally, articles created as a result of our recommendations are of comparable quality to organically created articles. Overall, our system leads to more engaged editors and faster growth of Wikipedia with no effect on its quality.</p>","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5092237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83220586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online communities provide a fertile ground for analyzing people's behavior and improving our understanding of social processes. Because both people and communities change over time, we argue that analyses of these communities that take time into account will lead to deeper and more accurate results. Using Reddit as an example, we study the evolution of users based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between users---yearly cohorts---we find wide differences in people's behavior, including comment activity, effort, and survival. Further, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson's Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.
{"title":"Averaging Gone Wrong: Using Time-Aware Analyses to Better Understand Behavior","authors":"Samuel Barbosa, D. Cosley, Amit Sharma, R. Cesar","doi":"10.1145/2872427.2883083","DOIUrl":"https://doi.org/10.1145/2872427.2883083","url":null,"abstract":"Online communities provide a fertile ground for analyzing people's behavior and improving our understanding of social processes. Because both people and communities change over time, we argue that analyses of these communities that take time into account will lead to deeper and more accurate results. Using Reddit as an example, we study the evolution of users based on comment and submission data from 2007 to 2014. Even using one of the simplest temporal differences between users---yearly cohorts---we find wide differences in people's behavior, including comment activity, effort, and survival. Further, not accounting for time can lead us to misinterpret important phenomena. For instance, we observe that average comment length decreases over any fixed period of time, but comment length in each cohort of users steadily increases during the same period after an abrupt initial drop, an example of Simpson's Paradox. Dividing cohorts into sub-cohorts based on the survival time in the community provides further insights; in particular, longer-lived users start at a higher activity level and make more and shorter comments than those who leave earlier. These findings both give more insight into user evolution in Reddit in particular, and raise a number of interesting questions around studying online behavior going forward.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84874775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marco De Nadai, Jacopo Staiano, Roberto Larcher, N. Sebe, D. Quercia, B. Lepri
The Death and Life of Great American Cities was written in 1961 and is now one of the most influential book in city planning. In it, Jane Jacobs proposed four conditions that promote life in a city. However, these conditions have not been empirically tested until recently. This is mainly because it is hard to collect data about "city life". The city of Seoul recently collected pedestrian activity through surveys at an unprecedented scale, with an effort spanning more than a decade, allowing researchers to conduct the first study successfully testing Jacobs's conditions. In this paper, we identify a valuable alternative to the lengthy and costly collection of activity survey data: mobile phone data. We extract human activity from such data, collect land use and socio-demographic information from the Italian Census and Open Street Map, and test the four conditions in six Italian cities. Although these cities are very different from the places for which Jacobs's conditions were spelled out (i.e., great American cities) and from the places in which they were recently tested (i.e., the Asian city of Seoul), we find those conditions to be indeed associated with urban life in Italy as well. Our methodology promises to have a great impact on urban studies, not least because, if replicated, it will make it possible to test Jacobs's theories at scale.
{"title":"The Death and Life of Great Italian Cities: A Mobile Phone Data Perspective","authors":"Marco De Nadai, Jacopo Staiano, Roberto Larcher, N. Sebe, D. Quercia, B. Lepri","doi":"10.1145/2872427.2883084","DOIUrl":"https://doi.org/10.1145/2872427.2883084","url":null,"abstract":"The Death and Life of Great American Cities was written in 1961 and is now one of the most influential book in city planning. In it, Jane Jacobs proposed four conditions that promote life in a city. However, these conditions have not been empirically tested until recently. This is mainly because it is hard to collect data about \"city life\". The city of Seoul recently collected pedestrian activity through surveys at an unprecedented scale, with an effort spanning more than a decade, allowing researchers to conduct the first study successfully testing Jacobs's conditions. In this paper, we identify a valuable alternative to the lengthy and costly collection of activity survey data: mobile phone data. We extract human activity from such data, collect land use and socio-demographic information from the Italian Census and Open Street Map, and test the four conditions in six Italian cities. Although these cities are very different from the places for which Jacobs's conditions were spelled out (i.e., great American cities) and from the places in which they were recently tested (i.e., the Asian city of Seoul), we find those conditions to be indeed associated with urban life in Italy as well. Our methodology promises to have a great impact on urban studies, not least because, if replicated, it will make it possible to test Jacobs's theories at scale.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90327345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}