Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3559759
Yuanjun Dai, An Wang, Yang Guo, Songqing Chen
Distributed denial of service (DDoS) attacks have been prevalent on the Internet for decades. Albeit various defenses, they keep growing in size, frequency, and duration. The new network paradigm, Software-defined networking (SDN), is also vulnerable to DDoS attacks. SDN uses logically centralized control, bringing the advantages in maintaining a global network view and simplifying programmability. When attacks happen, the control path between the switches and their associated controllers may become congested due to their limited capacity. However, the data plane visibility of SDN provides new opportunities to defend against DDoS attacks in the cloud computing environment. To this end, we conduct measurements to evaluate the throughput of the software control agents on some of the hardware switches when they are under attacks. Then, we design a new mechanism, called Scotch, to enable the network to scale up its capability and handle the DDoS attack traffic. In our design, the congestion works as an indicator to trigger the mitigation mechanism. Scotch elastically scales up the control plane capacity by using an Open vSwitch-based overlay. Scotch takes advantage of both the high control plane capacity of a large number of vSwitches and the high data plane capacity of commodity physical switches to increase the SDN network scalability and resiliency under abnormal (e.g., DDoS attacks) traffic surges. We have implemented a prototype and experimentally evaluated Scotch. Our experiments in the small-scale lab environment and large-scale GENI testbed demonstrate that Scotch can elastically scale up the control channel bandwidth upon attacks.
{"title":"Elastically Augmenting the Control-path Throughput in SDN to Deal with Internet DDoS Attacks","authors":"Yuanjun Dai, An Wang, Yang Guo, Songqing Chen","doi":"https://dl.acm.org/doi/10.1145/3559759","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3559759","url":null,"abstract":"<p>Distributed denial of service (DDoS) attacks have been prevalent on the Internet for decades. Albeit various defenses, they keep growing in size, frequency, and duration. The new network paradigm, Software-defined networking (SDN), is also vulnerable to DDoS attacks. SDN uses logically centralized control, bringing the advantages in maintaining a global network view and simplifying programmability. When attacks happen, the control path between the switches and their associated controllers may become congested due to their limited capacity. However, the data plane visibility of SDN provides new opportunities to defend against DDoS attacks in the cloud computing environment. To this end, we conduct measurements to evaluate the throughput of the software control agents on some of the hardware switches when they are under attacks. Then, we design a new mechanism, called <i>Scotch</i>, to enable the network to scale up its capability and handle the DDoS attack traffic. In our design, the congestion works as an indicator to trigger the mitigation mechanism. <i>Scotch</i> elastically scales up the control plane capacity by using an Open vSwitch-based overlay. <i>Scotch</i> takes advantage of both the high control plane capacity of a large number of vSwitches and the high data plane capacity of commodity physical switches to increase the SDN network scalability and resiliency under abnormal (e.g., DDoS attacks) traffic surges. We have implemented a prototype and experimentally evaluated <i>Scotch</i>. Our experiments in the small-scale lab environment and large-scale GENI testbed demonstrate that <i>Scotch</i> can elastically scale up the control channel bandwidth upon attacks.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"8 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3546192
Jianwei Hao, Piyush Subedi, Lakshmish Ramaswamy, In Kee Kim
The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge of the Internet over the past decade. Intelligent real-time analysis of such a high volume of data, particularly leveraging highly accurate deep learning (DL) models, often requires the data to be processed as close to the data sources (or at the edge of the Internet) to minimize the network and processing latency. The advent of specialized, low-cost, and power-efficient edge devices has greatly facilitated DL inference tasks at the edge. However, limited research has been done to improve the inference throughput (e.g., number of inferences per second) by exploiting various system techniques. This study investigates system techniques, such as batched inferencing, AI multi-tenancy, and cluster of AI accelerators, which can significantly enhance the overall inference throughput on edge devices with DL models for image classification tasks. In particular, AI multi-tenancy enables collective utilization of edge devices’ system resources (CPU, GPU) and AI accelerators (e.g., Edge Tensor Processing Units; EdgeTPUs). The evaluation results show that batched inferencing results in more than 2.4× throughput improvement on devices equipped with high-performance GPUs like Jetson Xavier NX. Moreover, with multi-tenancy approaches, e.g., concurrent model executions (CME) and dynamic model placements (DMP), the DL inference throughput on edge devices (with GPUs) and EdgeTPU can be further improved by up to 3× and 10×, respectively. Furthermore, we present a detailed analysis of hardware and software factors that change the DL inference throughput on edge devices and EdgeTPUs, thereby shedding light on areas that could be further improved to achieve high-performance DL inference at the edge.
{"title":"Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy","authors":"Jianwei Hao, Piyush Subedi, Lakshmish Ramaswamy, In Kee Kim","doi":"https://dl.acm.org/doi/10.1145/3546192","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3546192","url":null,"abstract":"<p>The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge of the Internet over the past decade. Intelligent real-time analysis of such a high volume of data, particularly leveraging highly accurate deep learning (DL) models, often requires the data to be processed as close to the data sources (or at the edge of the Internet) to minimize the network and processing latency. The advent of specialized, low-cost, and power-efficient edge devices has greatly facilitated DL inference tasks at the edge. However, limited research has been done to improve the inference throughput (e.g., number of inferences per second) by exploiting various system techniques. This study investigates system techniques, such as batched inferencing, AI multi-tenancy, and cluster of AI accelerators, which can significantly enhance the overall inference throughput on edge devices with DL models for image classification tasks. In particular, AI multi-tenancy enables collective utilization of edge devices’ system resources (CPU, GPU) and AI accelerators (e.g., Edge Tensor Processing Units; EdgeTPUs). The evaluation results show that batched inferencing results in more than 2.4× throughput improvement on devices equipped with high-performance GPUs like Jetson Xavier NX. Moreover, with multi-tenancy approaches, e.g., concurrent model executions (CME) and dynamic model placements (DMP), the DL inference throughput on edge devices (with GPUs) and EdgeTPU can be further improved by up to 3× and 10×, respectively. Furthermore, we present a detailed analysis of hardware and software factors that change the DL inference throughput on edge devices and EdgeTPUs, thereby shedding light on areas that could be further improved to achieve high-performance DL inference at the edge.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"17 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3554979
Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos
Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.
{"title":"White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance","authors":"Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos","doi":"https://dl.acm.org/doi/10.1145/3554979","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3554979","url":null,"abstract":"<p>Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3555312
Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland
Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2].
In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.
{"title":"Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web","authors":"Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland","doi":"https://dl.acm.org/doi/10.1145/3555312","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555312","url":null,"abstract":"<p>Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. </p><p>In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from <i>(a)</i> any file format expressible in BNF syntax, as well as <i>(b)</i> any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"118 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3551638
Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong
The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.
{"title":"Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters","authors":"Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong","doi":"https://dl.acm.org/doi/10.1145/3551638","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3551638","url":null,"abstract":"<p>The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3570726
Martino Trevisan, Francesca Soro, Marco Mellia, Idilio Drago, Ricardo Morla
Privacy on the Internet has become a priority, and several efforts have been devoted to limit the leakage of personal information. Domain names, both in the TLS Client Hello and DNS traffic, are among the last pieces of information still visible to an observer in the network. The Encrypted Client Hello extension for TLS, DNS over HTTPS or over QUIC protocols aim to further increase network confidentiality by encrypting the domain names of the visited servers.
In this article, we check whether an attacker able to passively observe the traffic of users could still recover the domain name of websites they visit even if names are encrypted. By relying on large-scale network traces, we show that simplistic features and off-the-shelf machine learning models are sufficient to achieve surprisingly high precision and recall when recovering encrypted domain names. We consider three attack scenarios, i.e., recovering the per-flow name, rebuilding the set of visited websites by a user, and checking which users visit a given target website. We next evaluate the efficacy of padding-based mitigation, finding that all three attacks are still effective, despite resources wasted with padding. We conclude that current proposals for domain encryption may produce a false sense of privacy, and more robust techniques should be envisioned to offer protection to end users.
{"title":"Attacking DoH and ECH: Does Server Name Encryption Protect Users’ Privacy?","authors":"Martino Trevisan, Francesca Soro, Marco Mellia, Idilio Drago, Ricardo Morla","doi":"https://dl.acm.org/doi/10.1145/3570726","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570726","url":null,"abstract":"<p>Privacy on the Internet has become a priority, and several efforts have been devoted to limit the leakage of personal information. Domain names, both in the TLS Client Hello and DNS traffic, are among the last pieces of information still visible to an observer in the network. The Encrypted Client Hello extension for TLS, DNS over HTTPS or over QUIC protocols aim to further increase network confidentiality by encrypting the domain names of the visited servers.</p><p>In this article, we check whether an attacker able to passively observe the traffic of users could still recover the domain name of websites they visit even if names are encrypted. By relying on large-scale network traces, we show that simplistic features and off-the-shelf machine learning models are sufficient to achieve surprisingly high precision and recall when recovering encrypted domain names. We consider three attack scenarios, i.e., recovering the per-flow name, rebuilding the set of visited websites by a user, and checking which users visit a given target website. We next evaluate the efficacy of padding-based mitigation, finding that all three attacks are still effective, despite resources wasted with padding. We conclude that current proposals for domain encryption may produce a false sense of privacy, and more robust techniques should be envisioned to offer protection to end users.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"31 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3547140
Sharon Hirsch, Slava Novgorodov, Ido Guy, Alexander Nus
Product reviews play a key role in e-commerce platforms. Studies show that many users read product reviews before a purchase and trust them to the same extent as personal recommendations. However, in many cases, the number of reviews per product is large and extracting useful information becomes a challenging task. Several websites have recently added an option to post tips—short, concise, practical, and self-contained pieces of advice about the products. These tips are complementary to the reviews and usually add a new non-trivial insight about the product, beyond its title, attributes, and description. Yet, most if not all major e-commerce platforms lack the notion of a tip as a first-class citizen and customers typically express their advice through other means, such as reviews.
In this work, we propose an extractive method for tip generation from product reviews. We focus on five popular e-commerce domains whose reviews tend to contain useful non-trivial tips that are beneficial for potential customers. We formally define the task of tip extraction in e-commerce by providing the list of tip types, tip timing (before and/or after the purchase), and connection to the surrounding context sentences. To extract the tips, we propose a supervised approach and leverage a publicly available dataset, annotated by human editors, containing 14,000 product reviews. To demonstrate the potential of our approach, we compare different tip generation methods and evaluate them both manually and over the labeled set. Our approach demonstrates particularly high performance for popular products in the Baby, Home Improvement, and Sports & Outdoors domains, with precision of over 95% for the top 3 tips per product. In addition, we evaluate the performance of our methods on previously unseen domains. Finally, we discuss the practical usage of our approach in real-world applications. Concretely, we explain how tips generated from user reviews can be integrated in various use cases within e-commerce platforms and benefit both buyers and sellers.
{"title":"The Tip of the Buyer: Extracting Product Tips from Reviews","authors":"Sharon Hirsch, Slava Novgorodov, Ido Guy, Alexander Nus","doi":"https://dl.acm.org/doi/10.1145/3547140","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3547140","url":null,"abstract":"<p>Product reviews play a key role in e-commerce platforms. Studies show that many users read product reviews before a purchase and trust them to the same extent as personal recommendations. However, in many cases, the number of reviews per product is large and extracting useful information becomes a challenging task. Several websites have recently added an option to post <i>tips</i>—short, concise, practical, and self-contained pieces of advice about the products. These tips are complementary to the reviews and usually add a new non-trivial insight about the product, beyond its title, attributes, and description. Yet, most if not all major e-commerce platforms lack the notion of a tip as a first-class citizen and customers typically express their advice through other means, such as reviews. </p><p>In this work, we propose an extractive method for tip generation from product reviews. We focus on five popular e-commerce domains whose reviews tend to contain useful non-trivial tips that are beneficial for potential customers. We formally define the task of tip extraction in e-commerce by providing the list of tip types, tip timing (before and/or after the purchase), and connection to the surrounding context sentences. To extract the tips, we propose a supervised approach and leverage a publicly available dataset, annotated by human editors, containing 14,000 product reviews. To demonstrate the potential of our approach, we compare different tip generation methods and evaluate them both manually and over the labeled set. Our approach demonstrates particularly high performance for popular products in the Baby, Home Improvement, and Sports & Outdoors domains, with precision of over 95% for the top 3 tips per product. In addition, we evaluate the performance of our methods on previously unseen domains. Finally, we discuss the practical usage of our approach in real-world applications. Concretely, we explain how tips generated from user reviews can be integrated in various use cases within e-commerce platforms and benefit both buyers and sellers.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"8 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3565884
Feijie Wu, Ho Yin Yuen, Henry Chan, Victor C. M. Leung, Wei Cai
Applying peer-to-peer (P2P) architecture to online video games has already attracted both academic and industrial interests, since it removes the need for expensive server maintenance. However, there are two major issues preventing the use of a P2P architecture, namely how to provide an effective distributed data storage solution, and how to tackle potential cheating behaviors. Inspired by emerging blockchain techniques, we propose a novel consensus model called Proof-of-Play (PoP) to provide a decentralized data storage system that incorporates an anti-cheating mechanism for P2P games, by rewarding players that interact with the game as intended, along with consideration of security measures to address the Nothing-at-stake Problem and the Long-range Attack. To validate our design, we utilize a game-theory model to show that under certain assumptions, the integrity of the PoP system would not be undermined due to the best interests of any user. Then, as a proof-of-concept, we developed a P2P game (Infinity Battle) to demonstrate how a game can be integrated with PoP in practice. Finally, experiments were conducted to study PoP in comparison with Proof-of-Work (PoW) to show its advantages in various aspects.
{"title":"Facilitating Serverless Match-based Online Games with Novel Blockchain Technologies","authors":"Feijie Wu, Ho Yin Yuen, Henry Chan, Victor C. M. Leung, Wei Cai","doi":"https://dl.acm.org/doi/10.1145/3565884","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3565884","url":null,"abstract":"<p>Applying <b>peer-to-peer (P2P)</b> architecture to online video games has already attracted both academic and industrial interests, since it removes the need for expensive server maintenance. However, there are two major issues preventing the use of a P2P architecture, namely how to provide an effective distributed data storage solution, and how to tackle potential cheating behaviors. Inspired by emerging blockchain techniques, we propose a novel consensus model called <b>Proof-of-Play (PoP)</b> to provide a decentralized data storage system that incorporates an anti-cheating mechanism for P2P games, by rewarding players that interact with the game as intended, along with consideration of security measures to address the Nothing-at-stake Problem and the Long-range Attack. To validate our design, we utilize a game-theory model to show that under certain assumptions, the integrity of the PoP system would not be undermined due to the best interests of any user. Then, as a proof-of-concept, we developed a P2P game (<i>Infinity Battle</i>) to demonstrate how a game can be integrated with PoP in practice. Finally, experiments were conducted to study PoP in comparison with <b>Proof-of-Work (PoW)</b> to show its advantages in various aspects.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3561051
Man Zeng, Dandan Li, Pei Zhang, Kun Xie, Xiaohong Huang
In the inter-domain network, route leaks can disrupt the Internet traffic and cause large outages. The accurate detection of route leaks requires the sharing of AS business relationship information. However, the business relationship information between ASes is confidential. ASes are usually unwilling to reveal this information to the other ASes, especially their competitors. In this paper, we propose a method named FL-RLD to detect route leaks while maintaining the privacy of business relationships between ASes by using a blockchain-based federated learning framework, where ASes can collaboratively train a global detection model without directly disclosing their specific business relationships. To mitigate the lack of ground-truth validation data in route leaks, FL-RLD provides a self-validation scheme by labeling AS triples with local routing policies. We evaluate FL-RLD under a variety of datasets including imbalanced and balanced datasets, and examine different deployment strategies of FL-RLD under different topologies. According to the results, FL-RLD performs better in detecting route leaks than the single AS detection, whether the datasets are balanced or imbalanced. Additionally, the results indicate that selecting ASes with the most peers to first deploy FL-RLD brings more significant benefits in detecting route leaks than selecting ASes with the most providers and customers.
{"title":"Federated Route Leak Detection in Inter-domain Routing with Privacy Guarantee","authors":"Man Zeng, Dandan Li, Pei Zhang, Kun Xie, Xiaohong Huang","doi":"https://dl.acm.org/doi/10.1145/3561051","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3561051","url":null,"abstract":"<p>In the inter-domain network, route leaks can disrupt the Internet traffic and cause large outages. The accurate detection of route leaks requires the sharing of AS business relationship information. However, the business relationship information between ASes is confidential. ASes are usually unwilling to reveal this information to the other ASes, especially their competitors. In this paper, we propose a method named FL-RLD to detect route leaks while maintaining the privacy of business relationships between ASes by using a blockchain-based federated learning framework, where ASes can collaboratively train a global detection model without directly disclosing their specific business relationships. To mitigate the lack of ground-truth validation data in route leaks, FL-RLD provides a self-validation scheme by labeling AS triples with local routing policies. We evaluate FL-RLD under a variety of datasets including imbalanced and balanced datasets, and examine different deployment strategies of FL-RLD under different topologies. According to the results, FL-RLD performs better in detecting route leaks than the single AS detection, whether the datasets are balanced or imbalanced. Additionally, the results indicate that selecting ASes with the most peers to first deploy FL-RLD brings more significant benefits in detecting route leaks than selecting ASes with the most providers and customers.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-02-23DOI: https://dl.acm.org/doi/10.1145/3561300
Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa
Cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these assets, and nowadays, cryptocurrency exchanges process transactions for over 100 billion US dollars per month. Despite this, many cryptocurrencies have low liquidity and are highly prone to market manipulation. This paper performs an in-depth analysis of two market manipulations organized by communities over the Internet: The pump and dump and the crowd pump. The pump and dump scheme is a fraud as old as the stock market. Now, it has new vitality in the loosely regulated market of cryptocurrencies. Groups of highly coordinated people systematically arrange this scam, usually on Telegram and Discord. We monitored these groups for more than 3 years, detecting around 900 individual events. We report on three case studies related to pump and dump groups. We leverage our unique dataset of the verified pump and dumps to build a machine learning model able to detect a pump and dump in 25 seconds from the moment it starts, achieving the results of 94.5% of F1-score. Then, we move on to the crowd pump, a new phenomenon that hit the news in the first months of 2021, when a Reddit community inflated the price of the GameStop stocks (GME) by over 1,900% on Wall Street, the world’s largest stock exchange. Later, other Reddit communities replicated the operation on the cryptocurrency markets. The targets were DogeCoin (DOGE) and Ripple (XRP). We reconstruct how these operations developed and discuss differences and analogies with the standard pump and dump. We believe this study helps understand a widespread phenomenon affecting cryptocurrency markets. The detection algorithms we develop effectively detect these events in real-time and helps investors stay out of the market when these frauds are in action.
{"title":"The Doge of Wall Street: Analysis and Detection of Pump and Dump Cryptocurrency Manipulations","authors":"Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa","doi":"https://dl.acm.org/doi/10.1145/3561300","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3561300","url":null,"abstract":"<p>Cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these assets, and nowadays, cryptocurrency exchanges process transactions for over 100 billion US dollars per month. Despite this, many cryptocurrencies have low liquidity and are highly prone to market manipulation. This paper performs an in-depth analysis of two market manipulations organized by communities over the Internet: The pump and dump and the crowd pump. The pump and dump scheme is a fraud as old as the stock market. Now, it has new vitality in the loosely regulated market of cryptocurrencies. Groups of highly coordinated people systematically arrange this scam, usually on Telegram and Discord. We monitored these groups for more than 3 years, detecting around 900 individual events. We report on three case studies related to pump and dump groups. We leverage our unique dataset of the verified pump and dumps to build a machine learning model able to detect a pump and dump in 25 seconds from the moment it starts, achieving the results of 94.5% of F1-score. Then, we move on to the crowd pump, a new phenomenon that hit the news in the first months of 2021, when a Reddit community inflated the price of the GameStop stocks (GME) by over 1,900% on Wall Street, the world’s largest stock exchange. Later, other Reddit communities replicated the operation on the cryptocurrency markets. The targets were DogeCoin (DOGE) and Ripple (XRP). We reconstruct how these operations developed and discuss differences and analogies with the standard pump and dump. We believe this study helps understand a widespread phenomenon affecting cryptocurrency markets. The detection algorithms we develop effectively detect these events in real-time and helps investors stay out of the market when these frauds are in action.</p>","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"19 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}