Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.
{"title":"Joint Recognition and Linking of Fine-Grained Locations from Tweets","authors":"Zongcheng Ji, Aixin Sun, G. Cong, Jialong Han","doi":"10.1145/2872427.2883067","DOIUrl":"https://doi.org/10.1145/2872427.2883067","url":null,"abstract":"Many users casually reveal their locations such as restaurants, landmarks, and shops in their tweets. Recognizing such fine-grained locations from tweets and then linking the location mentions to well-defined location profiles (e.g., with formal name, detailed address, and geo-coordinates etc.) offer a tremendous opportunity for many applications. Different from existing solutions which perform location recognition and linking as two sub-tasks sequentially in a pipeline setting, in this paper, we propose a novel joint framework to perform location recognition and location linking simultaneously in a joint search space. We formulate this end-to-end location linking problem as a structured prediction problem and propose a beam-search based algorithm. Based on the concept of multi-view learning, we further enable the algorithm to learn from unlabeled data to alleviate the dearth of labeled data. Extensive experiments are conducted to recognize locations mentioned in tweets and link them to location profiles in Foursquare. Experimental results show that the proposed joint learning algorithm outperforms the state-of-the-art solutions, and learning from unlabeled data improves both the recognition and linking accuracy.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75148404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inappropriate tweets can cause severe damages on authors' reputation or privacy. However, many users do not realize the negative consequences until they publish these tweets. Published tweets have lasting effects that may not be eliminated by simple deletion because other users may have read them or third-party tweet analysis platforms have cached them. Regrettable tweets, i.e., tweets with identifiable regrettable contents, cause the most damage on their authors because other users can easily notice them. In this paper, we study how to identify the regrettable tweets published by emph{normal individual users} via the contents and users' historical deletion patterns. We identify normal individual users based on their publishing, deleting, followers and friends statistics. We manually examine a set of randomly sampled deleted tweets from these users to identify regrettable tweets and understand the corresponding regrettable reasons. By applying content-based features and personalized history-based features, we develop classifiers that can effectively predict regrettable tweets.
{"title":"Tweet Properly: Analyzing Deleted Tweets to Understand and Identify Regrettable Ones","authors":"Lu Zhou, Wenbo Wang, Keke Chen","doi":"10.1145/2872427.2883052","DOIUrl":"https://doi.org/10.1145/2872427.2883052","url":null,"abstract":"Inappropriate tweets can cause severe damages on authors' reputation or privacy. However, many users do not realize the negative consequences until they publish these tweets. Published tweets have lasting effects that may not be eliminated by simple deletion because other users may have read them or third-party tweet analysis platforms have cached them. Regrettable tweets, i.e., tweets with identifiable regrettable contents, cause the most damage on their authors because other users can easily notice them. In this paper, we study how to identify the regrettable tweets published by emph{normal individual users} via the contents and users' historical deletion patterns. We identify normal individual users based on their publishing, deleting, followers and friends statistics. We manually examine a set of randomly sampled deleted tweets from these users to identify regrettable tweets and understand the corresponding regrettable reasons. By applying content-based features and personalized history-based features, we develop classifiers that can effectively predict regrettable tweets.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75277381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huoran Li, W. Ai, Xuanzhe Liu, Jian Tang, Gang Huang, Feng Feng, Q. Mei
Smartphone users have adopted an explosive number of mobile applications (a.k.a., apps) in the recent years. App marketplaces for iOS, Android and Windows Phone platforms host millions of apps which have been downloaded for more than 100 billion times. Investigating how people manage mobile apps in their everyday lives creates a unique opportunity to understand the behavior and preferences of mobile users, to infer the quality of apps, and to improve the user experience. Existing literature provides very limited knowledge about app management activities, due to the lack of user behavioral data at scale. This paper takes the initiative to analyze a very large app management log collected through a leading Android app marketplace. The data set covers five months of detailed downloading, updating, and uninstallation activities, involving 17 million anonymized users and one million apps. We present a surprising finding that the metrics commonly used by app stores to rank apps do not truly reflect the users' real attitudes towards the apps. We then identify useful patterns from the app management activities that much more accurately predict the user preferences of an app even when no user rating is available.
{"title":"Voting with Their Feet: Inferring User Preferences from App Management Activities","authors":"Huoran Li, W. Ai, Xuanzhe Liu, Jian Tang, Gang Huang, Feng Feng, Q. Mei","doi":"10.1145/2872427.2874814","DOIUrl":"https://doi.org/10.1145/2872427.2874814","url":null,"abstract":"Smartphone users have adopted an explosive number of mobile applications (a.k.a., apps) in the recent years. App marketplaces for iOS, Android and Windows Phone platforms host millions of apps which have been downloaded for more than 100 billion times. Investigating how people manage mobile apps in their everyday lives creates a unique opportunity to understand the behavior and preferences of mobile users, to infer the quality of apps, and to improve the user experience. Existing literature provides very limited knowledge about app management activities, due to the lack of user behavioral data at scale. This paper takes the initiative to analyze a very large app management log collected through a leading Android app marketplace. The data set covers five months of detailed downloading, updating, and uninstallation activities, involving 17 million anonymized users and one million apps. We present a surprising finding that the metrics commonly used by app stores to rank apps do not truly reflect the users' real attitudes towards the apps. We then identify useful patterns from the app management activities that much more accurately predict the user preferences of an app even when no user rating is available.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75404457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Native advertising is a specific form of online advertising where ads replicate the look-and-feel of their serving platform. In such context, providing a good user experience with the served ads is crucial to ensure long-term user engagement. In this work, we explore the notion of ad quality, namely the effectiveness of advertising from a user experience perspective. We design a learning framework to predict the pre-click quality of native ads. More specifically, we look at detecting offensive native ads, showing that, to quantify ad quality, ad offensive user feedback rates are more reliable than the commonly used click-through rate metrics. We then conduct a crowd-sourcing study to identify which criteria drive user preferences in native advertising. We translate these criteria into a set of ad quality features that we extract from the ad text, image and advertiser, and then use them to train a model able to identify offensive ads. We show that our model is very effective in detecting offensive ads, and provide in-depth insights on how different features affect ad quality. Finally, we deploy a preliminary version of such model and show its effectiveness in the reduction of the offensive ad feedback rate.
{"title":"Predicting Pre-click Quality for Native Advertisements","authors":"K. Zhou, Miriam Redi, Andrew Haines, M. Lalmas","doi":"10.1145/2872427.2883053","DOIUrl":"https://doi.org/10.1145/2872427.2883053","url":null,"abstract":"Native advertising is a specific form of online advertising where ads replicate the look-and-feel of their serving platform. In such context, providing a good user experience with the served ads is crucial to ensure long-term user engagement. In this work, we explore the notion of ad quality, namely the effectiveness of advertising from a user experience perspective. We design a learning framework to predict the pre-click quality of native ads. More specifically, we look at detecting offensive native ads, showing that, to quantify ad quality, ad offensive user feedback rates are more reliable than the commonly used click-through rate metrics. We then conduct a crowd-sourcing study to identify which criteria drive user preferences in native advertising. We translate these criteria into a set of ad quality features that we extract from the ad text, image and advertiser, and then use them to train a model able to identify offensive ads. We show that our model is very effective in detecting offensive ads, and provide in-depth insights on how different features affect ad quality. Finally, we deploy a preliminary version of such model and show its effectiveness in the reduction of the offensive ad feedback rate.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75563015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Open has meant a lot of things in the web thus far. The openness of the web has had profound implications for web security, from the beginning through to today. Each time the underlying web technology changes, we do a reset on the security it provides. Patterns and differences emerge in each round of security responses and challenges. What has that brought us as web users, technologists, researchers, and as a global community? What can we expect going forward? And what should we work towards as web technologists and caretakers?
{"title":"La Sécurité Ouverte How We Doin? So Far?","authors":"M. Zurko","doi":"10.1145/2872427.2883583","DOIUrl":"https://doi.org/10.1145/2872427.2883583","url":null,"abstract":"Open has meant a lot of things in the web thus far. The openness of the web has had profound implications for web security, from the beginning through to today. Each time the underlying web technology changes, we do a reset on the security it provides. Patterns and differences emerge in each round of security responses and challenges. What has that brought us as web users, technologists, researchers, and as a global community? What can we expect going forward? And what should we work towards as web technologists and caretakers?","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78414582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sociologists have long been interested in the ways that identities, or labels for people, are created, used and applied across various social contexts. The present work makes two contributions to the study of identity, in particular the study of identity in text. We first consider the following novel NLP task: given a set of text data (here, from Twitter), label each word in the text as being representative of a (possibly multi-word) identity. To address this task, we develop a comprehensive feature set that leverages several avenues of recent NLP work on Twitter and use these features to train a supervised classifier. Our model outperforms a surprisingly strong rule-based baseline by 33%. We then use our model for a case study, applying it to a large corpora of Twitter data from users who actively discussed the Eric Garner and Michael Brown cases. Among other findings, we observe that the identities used by individuals differ in interesting ways based on social context measures derived from census data.
{"title":"Exploring Patterns of Identity Usage in Tweets: A New Problem, Solution and Case Study","authors":"K. Joseph, Wei Wei, Kathleen M. Carley","doi":"10.1145/2872427.2883027","DOIUrl":"https://doi.org/10.1145/2872427.2883027","url":null,"abstract":"Sociologists have long been interested in the ways that identities, or labels for people, are created, used and applied across various social contexts. The present work makes two contributions to the study of identity, in particular the study of identity in text. We first consider the following novel NLP task: given a set of text data (here, from Twitter), label each word in the text as being representative of a (possibly multi-word) identity. To address this task, we develop a comprehensive feature set that leverages several avenues of recent NLP work on Twitter and use these features to train a supervised classifier. Our model outperforms a surprisingly strong rule-based baseline by 33%. We then use our model for a case study, applying it to a large corpora of Twitter data from users who actively discussed the Eric Garner and Michael Brown cases. Among other findings, we observe that the identities used by individuals differ in interesting ways based on social context measures derived from census data.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77903756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
People nowadays usually participate in multiple online social networks simultaneously to enjoy more social network services. Besides the common users, social networks providing similar services can also share many other kinds of information entities, e.g., locations, videos and products. However, these shared information entities in different networks are mostly isolated without any known corresponding connections. In this paper, we aim at inferring such potential corresponding connections linking multiple kinds of shared entities across networks simultaneously. Formally, the problem is referred to as the network "Partial Co-alignmenT" (PCT) problem. PCT is an important problem and can be the prerequisite for many concrete cross-network applications, like social network fusion, mutual information exchange and transfer. Meanwhile, the PCT problem is also very challenging to address due to various reasons, like (1) the heterogeneity of social networks, (2) lack of training instances to build models, and (3) one-to-one constraint on the correspondence connections. To resolve these challenges, a novel unsupervised network alignment framework, UNICOAT (UNsupervIsed COncurrent AlignmenT)), is introduced in this paper. Based on the heterogeneous information, UNICOAT transforms the PCT problem into a joint optimization problem. To solve the objective function, the one-to-one constraint on the corresponding relationships is relaxed, and the redundant non-existing corresponding connections introduced by such a relaxation will be pruned with a novel network co-matching algorithm proposed in this paper. Extensive experiments conducted on real-world co-aligned social network datasets demonstrate the effectiveness of UNICOAT in addressing the PCT problem.
{"title":"PCT: Partial Co-Alignment of Social Networks","authors":"Jiawei Zhang, Philip S. Yu","doi":"10.1145/2872427.2883038","DOIUrl":"https://doi.org/10.1145/2872427.2883038","url":null,"abstract":"People nowadays usually participate in multiple online social networks simultaneously to enjoy more social network services. Besides the common users, social networks providing similar services can also share many other kinds of information entities, e.g., locations, videos and products. However, these shared information entities in different networks are mostly isolated without any known corresponding connections. In this paper, we aim at inferring such potential corresponding connections linking multiple kinds of shared entities across networks simultaneously. Formally, the problem is referred to as the network \"Partial Co-alignmenT\" (PCT) problem. PCT is an important problem and can be the prerequisite for many concrete cross-network applications, like social network fusion, mutual information exchange and transfer. Meanwhile, the PCT problem is also very challenging to address due to various reasons, like (1) the heterogeneity of social networks, (2) lack of training instances to build models, and (3) one-to-one constraint on the correspondence connections. To resolve these challenges, a novel unsupervised network alignment framework, UNICOAT (UNsupervIsed COncurrent AlignmenT)), is introduced in this paper. Based on the heterogeneous information, UNICOAT transforms the PCT problem into a joint optimization problem. To solve the objective function, the one-to-one constraint on the corresponding relationships is relaxed, and the redundant non-existing corresponding connections introduced by such a relaxation will be pruned with a novel network co-matching algorithm proposed in this paper. Extensive experiments conducted on real-world co-aligned social network datasets demonstrate the effectiveness of UNICOAT in addressing the PCT problem.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85373503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan
Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.
{"title":"Table Cell Search for Question Answering","authors":"Huan Sun, Hao Ma, Xiaodong He, Wen-tau Yih, Yu Su, Xifeng Yan","doi":"10.1145/2872427.2883080","DOIUrl":"https://doi.org/10.1145/2872427.2883080","url":null,"abstract":"Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83601179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advance in web tracking technologies has raised many privacy concerns. To combat users' fear of privacy invasion, online vendors have taken measures such as being more transparent with users about their data use and providing options for users to manage their online activities. Such efforts gain users' trust in online vendors and improve their willingness to share their digital footprints. However, there are still a significant amount of users who actively limit involuntarily sharing of data because vendor provided management tools only restrict the use of collected data and users worry vendors do not have enough measures in place to protect their privacy sensitive information. In this paper, we propose TrackMeOrNot, a new anti-tracking mechanism. It allows users to selectively share their online footprints with vendors. With TrackMeOrNot, users are no longer concerned with privacy. Using it, users can specify their privacy sensitive activities and selectively disclose their activities to vendors based on their specified privacy demands. We implemented TrackMeOrNot on Chromium browser and systematically evaluated its performance using a large set of test cases. We show that TrackMeOrNot can efficiently and effectively shield privacy sensitive browsing activities.
{"title":"TrackMeOrNot: Enabling Flexible Control on Web Tracking","authors":"W. Meng, Byoungyoung Lee, Xinyu Xing, Wenke Lee","doi":"10.1145/2872427.2883034","DOIUrl":"https://doi.org/10.1145/2872427.2883034","url":null,"abstract":"Recent advance in web tracking technologies has raised many privacy concerns. To combat users' fear of privacy invasion, online vendors have taken measures such as being more transparent with users about their data use and providing options for users to manage their online activities. Such efforts gain users' trust in online vendors and improve their willingness to share their digital footprints. However, there are still a significant amount of users who actively limit involuntarily sharing of data because vendor provided management tools only restrict the use of collected data and users worry vendors do not have enough measures in place to protect their privacy sensitive information. In this paper, we propose TrackMeOrNot, a new anti-tracking mechanism. It allows users to selectively share their online footprints with vendors. With TrackMeOrNot, users are no longer concerned with privacy. Using it, users can specify their privacy sensitive activities and selectively disclose their activities to vendors based on their specified privacy demands. We implemented TrackMeOrNot on Chromium browser and systematically evaluated its performance using a large set of test cases. We show that TrackMeOrNot can efficiently and effectively shield privacy sensitive browsing activities.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85394820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, the problem of opinion spam has been widespread and has attracted a lot of research attention. While the problem has been approached on a variety of dimensions, the temporal dynamics in which opinion spamming operates is unclear. Are there specific spamming policies that spammers employ? What kind of changes happen with respect to the dynamics to the truthful ratings on entities. How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success? We analyze these questions in the light of time-series analysis on Yelp. Our analyses discover various temporal patterns and their relationships with the rate at which fake reviews are posted. Building on our analyses, we employ vector autoregression to predict the rate of deception across different spamming policies. Next, we explore the effect of filtered reviews on (long-term and imminent) future rating and popularity prediction of entities. Our results discover novel temporal dynamics of spamming which are intuitive, arguable and also render confidence on Yelp's filtering. Lastly, we leverage our discovered temporal patterns in deception detection. Experimental results on large-scale reviews show the effectiveness of our approach that significantly improves the existing approaches.
{"title":"On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp","authors":"C. SantoshK., Arjun Mukherjee","doi":"10.1145/2872427.2883087","DOIUrl":"https://doi.org/10.1145/2872427.2883087","url":null,"abstract":"Recently, the problem of opinion spam has been widespread and has attracted a lot of research attention. While the problem has been approached on a variety of dimensions, the temporal dynamics in which opinion spamming operates is unclear. Are there specific spamming policies that spammers employ? What kind of changes happen with respect to the dynamics to the truthful ratings on entities. How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success? We analyze these questions in the light of time-series analysis on Yelp. Our analyses discover various temporal patterns and their relationships with the rate at which fake reviews are posted. Building on our analyses, we employ vector autoregression to predict the rate of deception across different spamming policies. Next, we explore the effect of filtered reviews on (long-term and imminent) future rating and popularity prediction of entities. Our results discover novel temporal dynamics of spamming which are intuitive, arguable and also render confidence on Yelp's filtering. Lastly, we leverage our discovered temporal patterns in deception detection. Experimental results on large-scale reviews show the effectiveness of our approach that significantly improves the existing approaches.","PeriodicalId":20455,"journal":{"name":"Proceedings of the 25th International Conference on World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82571395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}