Zhe Xu, Chang Men, Pengbo Li, Bicheng Jin, Ge Li, Yue Yang, Chunyang Liu, Ben Wang, X. Qie
E-hailing platforms have become an important component of public transportation in recent years. The supply (online drivers) and demand (passenger requests) are intrinsically imbalanced because of the pattern of human behavior, especially in time and locations such as peak hours and train stations. Hence, how to balance supply and demand is one of the key problems to satisfy passengers and drivers and increase social welfare. As an intuitive and effective approach to address this problem, driver repositioning has been employed by some real-world e-hailing platforms. In this paper, we describe a novel framework of driver repositioning system, which meets various requirements in practical situations, including robust driver experience satisfaction and multi-driver collaboration. We introduce an effective and user-friendly driver interaction design called “driver repositioning task”. A novel modularized algorithm is developed to generate the repositioning tasks in real time. To our knowledge, this is the first industry-level application of driver repositioning. We evaluate the proposed method in real-world experiments, achieving a 2% improvement of driver income. Our framework has been fully deployed in the online system of DiDi Chuxing and serves millions of drivers on a daily basis.
{"title":"When Recommender Systems Meet Fleet Management: Practical Study in Online Driver Repositioning System","authors":"Zhe Xu, Chang Men, Pengbo Li, Bicheng Jin, Ge Li, Yue Yang, Chunyang Liu, Ben Wang, X. Qie","doi":"10.1145/3366423.3380287","DOIUrl":"https://doi.org/10.1145/3366423.3380287","url":null,"abstract":"E-hailing platforms have become an important component of public transportation in recent years. The supply (online drivers) and demand (passenger requests) are intrinsically imbalanced because of the pattern of human behavior, especially in time and locations such as peak hours and train stations. Hence, how to balance supply and demand is one of the key problems to satisfy passengers and drivers and increase social welfare. As an intuitive and effective approach to address this problem, driver repositioning has been employed by some real-world e-hailing platforms. In this paper, we describe a novel framework of driver repositioning system, which meets various requirements in practical situations, including robust driver experience satisfaction and multi-driver collaboration. We introduce an effective and user-friendly driver interaction design called “driver repositioning task”. A novel modularized algorithm is developed to generate the repositioning tasks in real time. To our knowledge, this is the first industry-level application of driver repositioning. We evaluate the proposed method in real-world experiments, achieving a 2% improvement of driver income. Our framework has been fully deployed in the online system of DiDi Chuxing and serves millions of drivers on a daily basis.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80903514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social network users often express their discontent with a product or a service from a company on social media. Such a reaction is more pronounced in the aftermath of a corporate scandal such as a corruption scandal or food poisoning in a chain restaurant. In our work, we focus on identifying negative purchase intent in a tweet, i.e. the intent of a user of not purchasing any product or consuming any service from a company. We develop a binary classifier for such a task, which consists of a generalization of logistic regression leveraging the locality of purchase intent in posts from Twitter. We conduct an extensive experimental evaluation against state-of-the-art approaches on a large collection of tweets, showing the effectiveness of our approach in terms of F1 score. We also provide some preliminary results on which kinds of corporate scandals might affect the purchase intent of customers the most.
{"title":"Negative Purchase Intent Identification in Twitter","authors":"Samed Atouati, Xiao Lu, Mauro Sozio","doi":"10.1145/3366423.3380040","DOIUrl":"https://doi.org/10.1145/3366423.3380040","url":null,"abstract":"Social network users often express their discontent with a product or a service from a company on social media. Such a reaction is more pronounced in the aftermath of a corporate scandal such as a corruption scandal or food poisoning in a chain restaurant. In our work, we focus on identifying negative purchase intent in a tweet, i.e. the intent of a user of not purchasing any product or consuming any service from a company. We develop a binary classifier for such a task, which consists of a generalization of logistic regression leveraging the locality of purchase intent in posts from Twitter. We conduct an extensive experimental evaluation against state-of-the-art approaches on a large collection of tweets, showing the effectiveness of our approach in terms of F1 score. We also provide some preliminary results on which kinds of corporate scandals might affect the purchase intent of customers the most.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81624438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been a growing concern about online users using social media to incite prejudice and hatred against other individuals or groups. While there has been research in developing automated techniques to identify online prejudice acts and hate speech, how to effectively counter online prejudice remains a societal challenge. Social protests, on the other hand, have been frequently used as an intervention for countering prejudice. However, research to date has not examined the relationship between protests and online prejudice. Using large-scale panel data collected from Twitter, we examine the changes in users’ tweeting behaviors relating to prejudice against immigrants following recent protests in the U.S. on immigration related topics. This is the first empirical study examining the effect of protests on reducing online prejudice. Our results show that there were both negative and positive changes in the measured prejudice after a protest, suggesting protest might have a mixed effect on reducing prejudice. We further identify users who are likely to change (or resist change) after a protest. This work contributes to the understanding of online prejudice and its intervention effect. The findings of this research have implications for designing targeted intervention.
{"title":"Examining Protest as An Intervention to Reduce Online Prejudice: A Case Study of Prejudice Against Immigrants","authors":"Kai Wei, Y. Lin, Muheng Yan","doi":"10.1145/3366423.3380307","DOIUrl":"https://doi.org/10.1145/3366423.3380307","url":null,"abstract":"There has been a growing concern about online users using social media to incite prejudice and hatred against other individuals or groups. While there has been research in developing automated techniques to identify online prejudice acts and hate speech, how to effectively counter online prejudice remains a societal challenge. Social protests, on the other hand, have been frequently used as an intervention for countering prejudice. However, research to date has not examined the relationship between protests and online prejudice. Using large-scale panel data collected from Twitter, we examine the changes in users’ tweeting behaviors relating to prejudice against immigrants following recent protests in the U.S. on immigration related topics. This is the first empirical study examining the effect of protests on reducing online prejudice. Our results show that there were both negative and positive changes in the measured prejudice after a protest, suggesting protest might have a mixed effect on reducing prejudice. We further identify users who are likely to change (or resist change) after a protest. This work contributes to the understanding of online prejudice and its intervention effect. The findings of this research have implications for designing targeted intervention.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"186 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85090224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The past decade has witnessed a rapid growth of research on fairness in machine learning. In contrast, fairness has been formally studied for almost a century in microeconomics in the context of resource allocation, during which many general-purpose notions of fairness have been proposed. This paper explore the applicability of two such notions — envy-freeness and equitability — in machine learning. We propose novel relaxations of these fairness notions which apply to groups rather than individuals, and are compelling in a broad range of settings. Our approach provides a unifying framework by incorporating several recently proposed fairness definitions as special cases. We provide generalization bounds for our approach, and theoretically and experimentally evaluate the tradeoff between loss minimization and our fairness guarantees.
{"title":"Designing Fairly Fair Classifiers Via Economic Fairness Notions","authors":"Safwan Hossain, Andjela Mladenovic, Nisarg Shah","doi":"10.1145/3366423.3380228","DOIUrl":"https://doi.org/10.1145/3366423.3380228","url":null,"abstract":"The past decade has witnessed a rapid growth of research on fairness in machine learning. In contrast, fairness has been formally studied for almost a century in microeconomics in the context of resource allocation, during which many general-purpose notions of fairness have been proposed. This paper explore the applicability of two such notions — envy-freeness and equitability — in machine learning. We propose novel relaxations of these fairness notions which apply to groups rather than individuals, and are compelling in a broad range of settings. Our approach provides a unifying framework by incorporating several recently proposed fairness definitions as special cases. We provide generalization bounds for our approach, and theoretically and experimentally evaluate the tradeoff between loss minimization and our fairness guarantees.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81031909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, We revisit the sponsored search auction as a repeated auction. We view it as a learning and exploiting task of the seller against the private data distribution of the buyers. We model such a game between the seller and buyers by a Private Data Manipulation (PDM) game: the auction seller first announces an auction for which allocation and payment rules are based on the value distributions submitted by buyers. The seller’s expected revenue depends on the design of the protocol and the game played among the buyers in their choice on the submitted (fake) value distributions. Under the PDM game, we re-evaluate the theory, methodology, and techniques in the sponsored search auctions that have been the most intensively studied in Internet economics.
{"title":"Private Data Manipulation in Optimal Sponsored Search Auction","authors":"Xiaotie Deng, Tao Lin, Tao Xiao","doi":"10.1145/3366423.3380023","DOIUrl":"https://doi.org/10.1145/3366423.3380023","url":null,"abstract":"In this paper, We revisit the sponsored search auction as a repeated auction. We view it as a learning and exploiting task of the seller against the private data distribution of the buyers. We model such a game between the seller and buyers by a Private Data Manipulation (PDM) game: the auction seller first announces an auction for which allocation and payment rules are based on the value distributions submitted by buyers. The seller’s expected revenue depends on the design of the protocol and the game played among the buyers in their choice on the submitted (fake) value distributions. Under the PDM game, we re-evaluate the theory, methodology, and techniques in the sponsored search auctions that have been the most intensively studied in Internet economics.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77851039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Today, the Web connects over half the world's population, many of whom use it to stay connected to a multiplicity of vital digital public and private services, impacting every aspect of their lives. Access to the Web and underlying Internet is seen as essential for all—even a fundamental human right [7]. However, many contend that the power structure on large swaths of the Web has become inverted; they argue that instead of being run for and by users, it has been made to serve the platforms themselves, and the powerful actors that sponsor such platforms to run targeted advertising on their behalf. In such an ad-driven platform ecosystem, users, including their beliefs, data, and attention, have become traded commodities [13]. There is concern that the emergence of powerful data analytics and AI techniques threaten to further entrench the power of these same platforms, by putting the control of powerful and valuable new capabilities in their hands rather than the users who produce the data [10]. The fear is that it is giving rise to data and AI monopolies [2,6]. Individuals have no long-term control or agency over their personal data or many of the decisions made using it. This may be one reason we are witnessing a so called Renaissance of Ethics - a plethora of initiatives and activities that call out the range of threats to individual autonomy, self-determination and privacy, the lack of transparency and accountability, a concern around bias and fairness, equity and access in our data driven ecosystem. This keynote will argue as the remaining half of the world's population comes online, we need digital infrastructures that will promote a plurality of methods of data sovereignty and governance instead of imposing a ’single policy fits-all’ platform governance model, which has strained and undermined the ability for governments to protect and support their citizens digital rights. This is an opportunity to re-imagine and re-architect elements of the Web, data, algorithms and institutions so as to ensure a more equitable distribution of these new digital potentialities. Based on our existing research we have been developing methods and tech-nologies pertaining to the following core principles: informational self-determination and autonomy, balanced and equitable access to AI and data, accountability and redress of AI/algorithmic decisions, and new models of ethical participation and contribution. The technology that underpins the modern web has seen exponential rates of change that have continuously improved the capabilities of the processors, memory and communications upon which it depends. This has enabled huge amounts of data to be linked and stored as well as providing for increasing use of AI. A variety of projects will be described where we sought to unlock the potential of this increasingly powerful infrastructure [1, 4, 5, 9]. The lessons learnt through various efforts to develop the Seman-tic Web [8] and the insights gained through the rel
{"title":"Architectures for Autonomy: Towards an Equitable Web of Data in the Age of AI","authors":"Sir Nigel Shadbolt","doi":"10.1145/3366423.3382668","DOIUrl":"https://doi.org/10.1145/3366423.3382668","url":null,"abstract":"Today, the Web connects over half the world's population, many of whom use it to stay connected to a multiplicity of vital digital public and private services, impacting every aspect of their lives. Access to the Web and underlying Internet is seen as essential for all—even a fundamental human right [7]. However, many contend that the power structure on large swaths of the Web has become inverted; they argue that instead of being run for and by users, it has been made to serve the platforms themselves, and the powerful actors that sponsor such platforms to run targeted advertising on their behalf. In such an ad-driven platform ecosystem, users, including their beliefs, data, and attention, have become traded commodities [13]. There is concern that the emergence of powerful data analytics and AI techniques threaten to further entrench the power of these same platforms, by putting the control of powerful and valuable new capabilities in their hands rather than the users who produce the data [10]. The fear is that it is giving rise to data and AI monopolies [2,6]. Individuals have no long-term control or agency over their personal data or many of the decisions made using it. This may be one reason we are witnessing a so called Renaissance of Ethics - a plethora of initiatives and activities that call out the range of threats to individual autonomy, self-determination and privacy, the lack of transparency and accountability, a concern around bias and fairness, equity and access in our data driven ecosystem. This keynote will argue as the remaining half of the world's population comes online, we need digital infrastructures that will promote a plurality of methods of data sovereignty and governance instead of imposing a ’single policy fits-all’ platform governance model, which has strained and undermined the ability for governments to protect and support their citizens digital rights. This is an opportunity to re-imagine and re-architect elements of the Web, data, algorithms and institutions so as to ensure a more equitable distribution of these new digital potentialities. Based on our existing research we have been developing methods and tech-nologies pertaining to the following core principles: informational self-determination and autonomy, balanced and equitable access to AI and data, accountability and redress of AI/algorithmic decisions, and new models of ethical participation and contribution. The technology that underpins the modern web has seen exponential rates of change that have continuously improved the capabilities of the processors, memory and communications upon which it depends. This has enabled huge amounts of data to be linked and stored as well as providing for increasing use of AI. A variety of projects will be described where we sought to unlock the potential of this increasingly powerful infrastructure [1, 4, 5, 9]. The lessons learnt through various efforts to develop the Seman-tic Web [8] and the insights gained through the rel","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75543581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marc Anthony Warrior, Yunming Xiao, Matteo Varvello, A. Kuzmanovic
Free and open source media centers are currently experiencing a boom in popularity for the convenience and flexibility they offer users seeking to remotely consume digital content. This newfound fame is matched by increasing notoriety—for their potential to serve as hubs for illegal content—and a presumably ever-increasing network footprint. It is fair to say that a complex ecosystem has developed around Kodi, composed of millions of users, thousands of “add-ons”—Kodi extensions from 3rd-party developers—and content providers. Motivated by these observations, this paper conducts the first analysis of the Kodi ecosystem. Our approach is to build “crawling” software around Kodi which can automatically install an addon, explore its menu, and locate (video) content. This is challenging for many reasons. First, Kodi largely relies on visual information and user input which intrinsically complicates automation. Second, no central aggregators for Kodi addons exist. Third, the potential sheer size of this ecosystem requires a highly scalable crawling solution. We address these challenges with de-Kodi, a full fledged crawling system capable of discovering and crawling large cross-sections of Kodi’s decentralized ecosystem. With de-Kodi, we discovered and tested over 9,000 distinct Kodi addons. Our results demonstrate de-Kodi, which we make available to the general public, to be an essential asset in studying one of the largest multimedia platforms in the world. Our work further serves as the first ever transparent and repeatable analysis of the Kodi ecosystem at large.
{"title":"De-Kodi: Understanding the Kodi Ecosystem","authors":"Marc Anthony Warrior, Yunming Xiao, Matteo Varvello, A. Kuzmanovic","doi":"10.1145/3366423.3380194","DOIUrl":"https://doi.org/10.1145/3366423.3380194","url":null,"abstract":"Free and open source media centers are currently experiencing a boom in popularity for the convenience and flexibility they offer users seeking to remotely consume digital content. This newfound fame is matched by increasing notoriety—for their potential to serve as hubs for illegal content—and a presumably ever-increasing network footprint. It is fair to say that a complex ecosystem has developed around Kodi, composed of millions of users, thousands of “add-ons”—Kodi extensions from 3rd-party developers—and content providers. Motivated by these observations, this paper conducts the first analysis of the Kodi ecosystem. Our approach is to build “crawling” software around Kodi which can automatically install an addon, explore its menu, and locate (video) content. This is challenging for many reasons. First, Kodi largely relies on visual information and user input which intrinsically complicates automation. Second, no central aggregators for Kodi addons exist. Third, the potential sheer size of this ecosystem requires a highly scalable crawling solution. We address these challenges with de-Kodi, a full fledged crawling system capable of discovering and crawling large cross-sections of Kodi’s decentralized ecosystem. With de-Kodi, we discovered and tested over 9,000 distinct Kodi addons. Our results demonstrate de-Kodi, which we make available to the general public, to be an essential asset in studying one of the largest multimedia platforms in the world. Our work further serves as the first ever transparent and repeatable analysis of the Kodi ecosystem at large.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"390 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80313481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autoencoder recommenders have recently shown state-of-the-art performance in the recommendation task due to their ability to model non-linear item relationships effectively. However, existing autoencoder recommenders use fully-connected neural network layers and do not employ structure learning. This can lead to inefficient training, especially when the data is sparse as commonly found in collaborative filtering. The aforementioned results in lower generalization ability and reduced performance. In this paper, we introduce structure learning for autoencoder recommenders by taking advantage of the inherent item groups present in the collaborative filtering domain. Due to the nature of items in general, we know that certain items are more related to each other than to other items. Based on this, we propose a method that first learns groups of related items and then uses this information to determine the connectivity structure of an auto-encoding neural network. This results in a network that is sparsely connected. This sparse structure can be viewed as a prior that guides the network training. Empirically we demonstrate that the proposed structure learning enables the autoencoder to converge to a local optimum with a much smaller spectral norm and generalization error bound than the fully-connected network. The resultant sparse network considerably outperforms the state-of-the-art methods like Mult-vae/Mult-dae on multiple benchmarked datasets even when the same number of parameters and flops are used. It also has a better cold-start performance.
自动编码器推荐器最近在推荐任务中表现出了最先进的性能,因为它们能够有效地模拟非线性项目关系。然而,现有的自动编码器推荐使用全连接的神经网络层,而不使用结构学习。这可能会导致训练效率低下,尤其是在协同过滤中常见的数据稀疏的情况下。上述结果会导致较低的泛化能力和性能下降。在本文中,我们利用协同过滤域中存在的固有条目组,为自动编码器推荐器引入结构学习。由于一般项目的性质,我们知道某些项目彼此之间的关系比其他项目更密切。在此基础上,我们提出了一种首先学习相关项组,然后利用这些信息确定自编码神经网络连接结构的方法。这就造成了一个稀疏连接的网络。这种稀疏结构可以看作是指导网络训练的先验。我们的经验证明,所提出的结构学习使自编码器收敛到局部最优,具有比全连接网络小得多的谱范数和泛化误差界。由此产生的稀疏网络在多个基准数据集上,即使使用相同数量的参数和flops,其性能也大大优于multi -vae/ multi -dae等最先进的方法。它还具有更好的冷启动性能。
{"title":"Learning the Structure of Auto-Encoding Recommenders","authors":"Farhan Khawar, Leonard K. M. Poon, N. Zhang","doi":"10.1145/3366423.3380135","DOIUrl":"https://doi.org/10.1145/3366423.3380135","url":null,"abstract":"Autoencoder recommenders have recently shown state-of-the-art performance in the recommendation task due to their ability to model non-linear item relationships effectively. However, existing autoencoder recommenders use fully-connected neural network layers and do not employ structure learning. This can lead to inefficient training, especially when the data is sparse as commonly found in collaborative filtering. The aforementioned results in lower generalization ability and reduced performance. In this paper, we introduce structure learning for autoencoder recommenders by taking advantage of the inherent item groups present in the collaborative filtering domain. Due to the nature of items in general, we know that certain items are more related to each other than to other items. Based on this, we propose a method that first learns groups of related items and then uses this information to determine the connectivity structure of an auto-encoding neural network. This results in a network that is sparsely connected. This sparse structure can be viewed as a prior that guides the network training. Empirically we demonstrate that the proposed structure learning enables the autoencoder to converge to a local optimum with a much smaller spectral norm and generalization error bound than the fully-connected network. The resultant sparse network considerably outperforms the state-of-the-art methods like Mult-vae/Mult-dae on multiple benchmarked datasets even when the same number of parameters and flops are used. It also has a better cold-start performance.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74887145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern web applications are distributed across a browser-based client and a cloud-based server. Distribution provides access to remote resources, accessed over the web and shared by clients. Much of the complexity of inspecting and evolving web applications lies in their distributed nature. Also, the majority of mature program analysis and transformation tools works only with centralized software. Inspired by business process re-engineering, in which remote operations can be insourced back in house to restructure and outsource anew, we bring an analogous approach to the re-engineering of web applications. Our target domain are full-stack JavaScript applications that implement both the client and server code in this language. Our approach is enabled by Client Insourcing, a novel automatic refactoring that creates a semantically equivalent centralized version of a distributed application. This centralized version is then inspected, modified, and redistributed to meet new requirements. After describing the design and implementation of Client Insourcing, we demonstrate its utility and value in addressing changes in security, reliability, and performance requirements. By reducing the complexity of the non-trivial program inspection and evolution tasks performed to meet these requirements, our approach can become a helpful aid in the re-engineering of web applications in this domain.
{"title":"Client Insourcing: Bringing Ops In-House for Seamless Re-engineering of Full-Stack JavaScript Applications","authors":"Kijin An, E. Tilevich","doi":"10.1145/3366423.3380105","DOIUrl":"https://doi.org/10.1145/3366423.3380105","url":null,"abstract":"Modern web applications are distributed across a browser-based client and a cloud-based server. Distribution provides access to remote resources, accessed over the web and shared by clients. Much of the complexity of inspecting and evolving web applications lies in their distributed nature. Also, the majority of mature program analysis and transformation tools works only with centralized software. Inspired by business process re-engineering, in which remote operations can be insourced back in house to restructure and outsource anew, we bring an analogous approach to the re-engineering of web applications. Our target domain are full-stack JavaScript applications that implement both the client and server code in this language. Our approach is enabled by Client Insourcing, a novel automatic refactoring that creates a semantically equivalent centralized version of a distributed application. This centralized version is then inspected, modified, and redistributed to meet new requirements. After describing the design and implementation of Client Insourcing, we demonstrate its utility and value in addressing changes in security, reliability, and performance requirements. By reducing the complexity of the non-trivial program inspection and evolution tasks performed to meet these requirements, our approach can become a helpful aid in the re-engineering of web applications in this domain.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80159027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon S. Woo, Hanbin Jang, Woojung Ji, Hyoungshick Kim
A package tracking number (PTN) is widely used to monitor and track a shipment. Through the lenses of security and privacy, however, a package tracking number can possibly reveal certain personal information, leading to security and privacy breaches. In this work, we examine the privacy issues associated with online package tracking systems used in the top three most popular package delivery service providers (FedEx, DHL, and UPS) in the world and found that those websites inadvertently leak users’ personal data with a PTN. Moreover, we discovered that PTNs are highly structured and predictable. Therefore, customers’ personal data can be massively collected via PTN enumeration attacks. We analyzed more than one million package tracking records obtained from Fedex, DHL, and UPS, and showed that within 5 attempts, an attacker can efficiently guess more than 90% of PTNs for FedEx and DHL, and close to 50% of PTNs for UPS. In addition, we present two practical attack scenarios: 1) to infer business transactions information and 2) to uniquely identify recipients. Also, we found that more than 109 recipients can be uniquely identified with less than 10 comparisons by linking the PTN information with the online people search service, Whitepages.
{"title":"I’ve Got Your Packages: Harvesting Customers’ Delivery Order Information using Package Tracking Number Enumeration Attacks","authors":"Simon S. Woo, Hanbin Jang, Woojung Ji, Hyoungshick Kim","doi":"10.1145/3366423.3380062","DOIUrl":"https://doi.org/10.1145/3366423.3380062","url":null,"abstract":"A package tracking number (PTN) is widely used to monitor and track a shipment. Through the lenses of security and privacy, however, a package tracking number can possibly reveal certain personal information, leading to security and privacy breaches. In this work, we examine the privacy issues associated with online package tracking systems used in the top three most popular package delivery service providers (FedEx, DHL, and UPS) in the world and found that those websites inadvertently leak users’ personal data with a PTN. Moreover, we discovered that PTNs are highly structured and predictable. Therefore, customers’ personal data can be massively collected via PTN enumeration attacks. We analyzed more than one million package tracking records obtained from Fedex, DHL, and UPS, and showed that within 5 attempts, an attacker can efficiently guess more than 90% of PTNs for FedEx and DHL, and close to 50% of PTNs for UPS. In addition, we present two practical attack scenarios: 1) to infer business transactions information and 2) to uniquely identify recipients. Also, we found that more than 109 recipients can be uniquely identified with less than 10 comparisons by linking the PTN information with the online people search service, Whitepages.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84133541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}