ACM Transactions on Internet Technology最新文献_第8页

Elastically Augmenting the Control-path Throughput in SDN to Deal with Internet DDoS Attacks 弹性增强SDN控制路径吞吐量以应对Internet DDoS攻击

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3559759

Yuanjun Dai, An Wang, Yang Guo, Songqing Chen

Distributed denial of service (DDoS) attacks have been prevalent on the Internet for decades. Albeit various defenses, they keep growing in size, frequency, and duration. The new network paradigm, Software-defined networking (SDN), is also vulnerable to DDoS attacks. SDN uses logically centralized control, bringing the advantages in maintaining a global network view and simplifying programmability. When attacks happen, the control path between the switches and their associated controllers may become congested due to their limited capacity. However, the data plane visibility of SDN provides new opportunities to defend against DDoS attacks in the cloud computing environment. To this end, we conduct measurements to evaluate the throughput of the software control agents on some of the hardware switches when they are under attacks. Then, we design a new mechanism, called Scotch, to enable the network to scale up its capability and handle the DDoS attack traffic. In our design, the congestion works as an indicator to trigger the mitigation mechanism. Scotch elastically scales up the control plane capacity by using an Open vSwitch-based overlay. Scotch takes advantage of both the high control plane capacity of a large number of vSwitches and the high data plane capacity of commodity physical switches to increase the SDN network scalability and resiliency under abnormal (e.g., DDoS attacks) traffic surges. We have implemented a prototype and experimentally evaluated Scotch. Our experiments in the small-scale lab environment and large-scale GENI testbed demonstrate that Scotch can elastically scale up the control channel bandwidth upon attacks.

分布式拒绝服务(DDoS)攻击已经在互联网上流行了几十年。尽管有各种各样的防御措施，但它们的规模、频率和持续时间都在不断增长。新的网络模式，软件定义网络(SDN)，也容易受到DDoS攻击。SDN采用逻辑上的集中控制，具有保持全局网络视图和简化可编程性的优点。当攻击发生时，交换机及其关联控制器之间的控制路径可能会因容量有限而拥塞。然而，SDN的数据平面可见性为防范云计算环境下的DDoS攻击提供了新的契机。为此，我们进行了测量，以评估某些硬件交换机受到攻击时软件控制代理的吞吐量。然后，我们设计了一种名为Scotch的新机制，使网络能够扩展其能力并处理DDoS攻击流量。在我们的设计中，拥塞作为触发缓解机制的指示器。Scotch通过使用基于Open vswitch的覆盖弹性扩展控制平面容量。Scotch利用大量虚拟交换机的高控制平面容量和商品物理交换机的高数据平面容量，增加SDN网络在异常流量(如DDoS攻击)激增时的可扩展性和弹性。我们已经实现了一个原型，并对Scotch进行了实验评估。我们在小型实验室环境和大型GENI测试平台上的实验表明，Scotch可以在攻击时弹性地扩展控制信道带宽。

{"title":"Elastically Augmenting the Control-path Throughput in SDN to Deal with Internet DDoS Attacks","authors":"Yuanjun Dai, An Wang, Yang Guo, Songqing Chen","doi":"https://dl.acm.org/doi/10.1145/3559759","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3559759","url":null,"abstract":"Distributed denial of service (DDoS) attacks have been prevalent on the Internet for decades. Albeit various defenses, they keep growing in size, frequency, and duration. The new network paradigm, Software-defined networking (SDN), is also vulnerable to DDoS attacks. SDN uses logically centralized control, bringing the advantages in maintaining a global network view and simplifying programmability. When attacks happen, the control path between the switches and their associated controllers may become congested due to their limited capacity. However, the data plane visibility of SDN provides new opportunities to defend against DDoS attacks in the cloud computing environment. To this end, we conduct measurements to evaluate the throughput of the software control agents on some of the hardware switches when they are under attacks. Then, we design a new mechanism, called Scotch, to enable the network to scale up its capability and handle the DDoS attack traffic. In our design, the congestion works as an indicator to trigger the mitigation mechanism. Scotch elastically scales up the control plane capacity by using an Open vSwitch-based overlay. Scotch takes advantage of both the high control plane capacity of a large number of vSwitches and the high data plane capacity of commodity physical switches to increase the SDN network scalability and resiliency under abnormal (e.g., DDoS attacks) traffic surges. We have implemented a prototype and experimentally evaluated Scotch. Our experiments in the small-scale lab environment and large-scale GENI testbed demonstrate that Scotch can elastically scale up the control channel bandwidth upon attacks.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"8 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy 触及天空:利用AI多租户最大化边缘设备上的深度学习推理吞吐量

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3546192

Jianwei Hao, Piyush Subedi, Lakshmish Ramaswamy, In Kee Kim

The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge of the Internet over the past decade. Intelligent real-time analysis of such a high volume of data, particularly leveraging highly accurate deep learning (DL) models, often requires the data to be processed as close to the data sources (or at the edge of the Internet) to minimize the network and processing latency. The advent of specialized, low-cost, and power-efficient edge devices has greatly facilitated DL inference tasks at the edge. However, limited research has been done to improve the inference throughput (e.g., number of inferences per second) by exploiting various system techniques. This study investigates system techniques, such as batched inferencing, AI multi-tenancy, and cluster of AI accelerators, which can significantly enhance the overall inference throughput on edge devices with DL models for image classification tasks. In particular, AI multi-tenancy enables collective utilization of edge devices’ system resources (CPU, GPU) and AI accelerators (e.g., Edge Tensor Processing Units; EdgeTPUs). The evaluation results show that batched inferencing results in more than 2.4× throughput improvement on devices equipped with high-performance GPUs like Jetson Xavier NX. Moreover, with multi-tenancy approaches, e.g., concurrent model executions (CME) and dynamic model placements (DMP), the DL inference throughput on edge devices (with GPUs) and EdgeTPU can be further improved by up to 3× and 10×, respectively. Furthermore, we present a detailed analysis of hardware and software factors that change the DL inference throughput on edge devices and EdgeTPUs, thereby shedding light on areas that could be further improved to achieve high-performance DL inference at the edge.

在过去十年中，智能设备和物联网(IoT)传感器的广泛采用导致了互联网边缘数据生成的大规模增长。对如此大量的数据进行智能实时分析，特别是利用高度精确的深度学习(DL)模型，通常需要在靠近数据源(或在互联网边缘)的地方处理数据，以最大限度地减少网络和处理延迟。专业、低成本和节能的边缘设备的出现极大地促进了边缘的深度学习推理任务。然而，通过利用各种系统技术来提高推理吞吐量(例如，每秒推理次数)的研究有限。本研究探讨了批处理推理、人工智能多租户和人工智能加速器集群等系统技术，这些技术可以显著提高边缘设备上使用深度学习模型进行图像分类任务的整体推理吞吐量。特别是，AI多租户允许集体利用边缘设备的系统资源(CPU, GPU)和AI加速器(例如，边缘张量处理单元;EdgeTPUs)。评估结果表明，在配备Jetson Xavier NX等高性能gpu的设备上，批处理推理使吞吐量提高了2.4倍以上。此外，通过多租户方法，例如并发模型执行(CME)和动态模型放置(DMP)，边缘设备(带有gpu)和EdgeTPU上的DL推理吞吐量可以分别进一步提高3倍和10倍。此外，我们还详细分析了改变边缘设备和edgetpu上深度学习推理吞吐量的硬件和软件因素，从而揭示了可以进一步改进的领域，以在边缘实现高性能深度学习推理。

{"title":"Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy","authors":"Jianwei Hao, Piyush Subedi, Lakshmish Ramaswamy, In Kee Kim","doi":"https://dl.acm.org/doi/10.1145/3546192","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3546192","url":null,"abstract":"The wide adoption of smart devices and Internet-of-Things (IoT) sensors has led to massive growth in data generation at the edge of the Internet over the past decade. Intelligent real-time analysis of such a high volume of data, particularly leveraging highly accurate deep learning (DL) models, often requires the data to be processed as close to the data sources (or at the edge of the Internet) to minimize the network and processing latency. The advent of specialized, low-cost, and power-efficient edge devices has greatly facilitated DL inference tasks at the edge. However, limited research has been done to improve the inference throughput (e.g., number of inferences per second) by exploiting various system techniques. This study investigates system techniques, such as batched inferencing, AI multi-tenancy, and cluster of AI accelerators, which can significantly enhance the overall inference throughput on edge devices with DL models for image classification tasks. In particular, AI multi-tenancy enables collective utilization of edge devices’ system resources (CPU, GPU) and AI accelerators (e.g., Edge Tensor Processing Units; EdgeTPUs). The evaluation results show that batched inferencing results in more than 2.4× throughput improvement on devices equipped with high-performance GPUs like Jetson Xavier NX. Moreover, with multi-tenancy approaches, e.g., concurrent model executions (CME) and dynamic model placements (DMP), the DL inference throughput on edge devices (with GPUs) and EdgeTPU can be further improved by up to 3× and 10×, respectively. Furthermore, we present a detailed analysis of hardware and software factors that change the DL inference throughput on edge devices and EdgeTPUs, thereby shedding light on areas that could be further improved to achieve high-performance DL inference at the edge.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"17 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance 白盒:协同过滤推荐系统性能的预测

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3554979

Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos

Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.

协同过滤(CF)推荐算法是解决信息过载问题的一种流行方法，它可以帮助用户在项目选择过程中进行选择。长期以来，相关研究一直专注于精炼和改进这些模型，以产生更好(更有效)的建议，并汇集了一种方法，通过在后者的随机样本上评估它们来预测它们在目标数据集上的有效性。然而，预测解决方案的效率——特别是在时间和资源匮乏的训练阶段，其需求使预测/推荐阶段相形见绌——在文献中几乎没有得到关注。本文为许多具有代表性和非常流行的CF模型解决了这一差距，包括基于矩阵分解、k近邻、共聚类和斜率为1的方案的算法。为此，我们首先研究了所述CF模型训练阶段的计算复杂度，并推导了时间和空间复杂度方程。然后，利用输入的特征和上述方程，我们提出了一种预测其训练阶段的处理时间和内存使用的方法。我们的贡献还包括自适应采样策略，以解决资源使用成本和预测准确性之间的权衡，以及量化CF的效率和有效性的框架。最后，系统的实验评估表明，我们的方法在相当大的范围内优于最先进的回归方案，开销仅占CF训练总体需求的一小部分。

{"title":"White Box: On the Prediction of Collaborative Filtering Recommendation Systems’ Performance","authors":"Iulia Paun, Yashar Moshfeghi, Nikos Ntarmos","doi":"https://dl.acm.org/doi/10.1145/3554979","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3554979","url":null,"abstract":"Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions—especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase—has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web 基于farade的知识图谱构建:一种访问Web异构数据源的统一方法

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3555312

Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland

Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2].

In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.

数据集成是RDF知识图的主要用例。但是，Web资源的格式具有较弱的语义(例如，CSV和JSON)，或者特定于给定应用程序的格式(例如，BibTex、HTML和Markdown)。为了解决这个问题，知识图构造(Knowledge Graph Construction, KGC)正在获得动力，因为它专注于支持用户将数据转换为RDF。然而，使用现有的KGC框架会导致复杂的数据处理管道，其中混合了结构和语义映射，其开发和维护构成了KG工程师的重大瓶颈。这样的框架迫使用户依赖不同的工具(有时是基于异构语言)来检查源、设计映射和生成三元组，从而使过程不必要地复杂化。我们认为，通过依赖他们在RDF和完善的SPARQL查询语言方面的专业知识，使KG工程师具备与Web数据格式交互的能力是可能的，也是可取的[2]。在本文中，我们研究了使用Facade-X对异构数据源进行数据访问的统一方法，Facade-X是在名为SPARQL Anything的新数据集成系统中实现的元模型。我们证明了我们的方法在理论上是合理的，因为它允许一个基于RDF的元模型来表示来自(a)任何可以用BNF语法表示的文件格式以及(b)任何关系数据库的数据。我们将我们的方法与最先进的方法在可用性(映射的认知复杂性)和一般性能方面进行比较。最后，我们通过参考用户社区讨论了这种新方法的好处和挑战。

{"title":"Knowledge Graph Construction with a Façade: A Unified Method to Access Heterogeneous Data Sources on the Web","authors":"Luigi Asprino, Enrico Daga, Aldo Gangemi, Paul Mulholland","doi":"https://dl.acm.org/doi/10.1145/3555312","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3555312","url":null,"abstract":"Data integration is the dominant use case for RDF Knowledge Graphs. However, Web resources come in formats with weak semantics (for example, CSV and JSON), or formats specific to a given application (for example, BibTex, HTML, and Markdown). To solve this problem, Knowledge Graph Construction (KGC) is gaining momentum due to its focus on supporting users in transforming data into RDF. However, using existing KGC frameworks result in complex data processing pipelines, which mix structural and semantic mappings, whose development and maintenance constitute a significant bottleneck for KG engineers. Such frameworks force users to rely on different tools, sometimes based on heterogeneous languages, for inspecting sources, designing mappings, and generating triples, thus making the process unnecessarily complicated. We argue that it is possible and desirable to equip KG engineers with the ability of interacting with Web data formats by relying on their expertise in RDF and the well-established SPARQL query language [2]. In this article, we study a unified method for data access to heterogeneous data sources with Facade-X, a meta-model implemented in a new data integration system called SPARQL Anything. We demonstrate that our approach is theoretically sound, since it allows a single meta-model, based on RDF, to represent data from (a) any file format expressible in BNF syntax, as well as (b) any relational database. We compare our method to state-of-the-art approaches in terms of usability (cognitive complexity of the mappings) and general performance. Finally, we discuss the benefits and challenges of this novel approach by engaging with the reference user community.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"118 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters 工业物联网集群DNN推理联合架构设计与工作负载划分

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3551638

Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong

The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.

深度神经网络(dnn)的出现为许多计算机视觉应用提供了动力。由于深度神经网络模型的高计算强度，以及工业物联网(IIoT)设备的资源约束性质，在工业场景中有效地部署和执行深度神经网络通常非常具有挑战性。大量的研究集中在模型压缩或边缘云卸载上，它们分别以准确性换取效率或依赖于高质量的基础设施支持。在本文中，我们提出了EdgeDI，这是一个在IIoT设备集群上以分区、分布式方式执行DNN推理的框架。为了提高推理性能，EdgeDI利用了两个关键的优化方法，包括:(1)基于深度架构设计的模型压缩，将目标DNN模型转换为紧凑的模型，在不牺牲精度的情况下减少IIoT设备的资源需求;(2)基于自适应工作负载分区的分布式推理，在异构资源条件下，通过自适应平衡IIoT设备之间的工作负载分布，实现高并行性。我们基于PyTorch实现了EdgeDI，并在异构树莓派设备集群上使用nue - cls缺陷分类任务和两种典型DNN模型(即VGG和ResNet)对其性能进行了评估。结果表明，所提出的两种优化方法在其特定领域内明显优于现有的解决方案。当它们很好地结合在一起时，EdgeDI可以提供非常接近甚至远远高于理论加速界限的可扩展DNN推理加速，同时仍然保持所需的准确性。

{"title":"Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters","authors":"Weiwei Fang, Wenyuan Xu, Chongchong Yu, Neal. N. Xiong","doi":"https://dl.acm.org/doi/10.1145/3551638","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3551638","url":null,"abstract":"The advent of Deep Neural Networks (DNNs) has empowered numerous computer-vision applications. Due to the high computational intensity of DNN models, as well as the resource constrained nature of Industrial Internet-of-Things (IIoT) devices, it is generally very challenging to deploy and execute DNNs efficiently in the industrial scenarios. Substantial research has focused on model compression or edge-cloud offloading, which trades off accuracy for efficiency or depends on high-quality infrastructure support, respectively. In this article, we present EdgeDI, a framework for executing DNN inference in a partitioned, distributed manner on a cluster of IIoT devices. To improve the inference performance, EdgeDI exploits two key optimization knobs, including: (1) Model compression based on deep architecture design, which transforms the target DNN model into a compact one that reduces the resource requirements for IIoT devices without sacrificing accuracy; (2) Distributed inference based on adaptive workload partitioning, which achieves high parallelism by adaptively balancing the workload distribution among IIoT devices under heterogeneous resource conditions. We have implemented EdgeDI based on PyTorch, and evaluated its performance with the NEU-CLS defect classification task and two typical DNN models (i.e., VGG and ResNet) on a cluster of heterogeneous Raspberry Pi devices. The results indicate that the proposed two optimization approaches significantly outperform the existing solutions in their specific domains. When they are well combined, EdgeDI can provide scalable DNN inference speedups that are very close to or even much higher than the theoretical speedup bounds, while still maintaining the desired accuracy.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"1 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attacking DoH and ECH: Does Server Name Encryption Protect Users’ Privacy? 攻击DoH和ECH:服务器名加密能保护用户隐私吗?

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3570726

Martino Trevisan, Francesca Soro, Marco Mellia, Idilio Drago, Ricardo Morla

Privacy on the Internet has become a priority, and several efforts have been devoted to limit the leakage of personal information. Domain names, both in the TLS Client Hello and DNS traffic, are among the last pieces of information still visible to an observer in the network. The Encrypted Client Hello extension for TLS, DNS over HTTPS or over QUIC protocols aim to further increase network confidentiality by encrypting the domain names of the visited servers.

In this article, we check whether an attacker able to passively observe the traffic of users could still recover the domain name of websites they visit even if names are encrypted. By relying on large-scale network traces, we show that simplistic features and off-the-shelf machine learning models are sufficient to achieve surprisingly high precision and recall when recovering encrypted domain names. We consider three attack scenarios, i.e., recovering the per-flow name, rebuilding the set of visited websites by a user, and checking which users visit a given target website. We next evaluate the efficacy of padding-based mitigation, finding that all three attacks are still effective, despite resources wasted with padding. We conclude that current proposals for domain encryption may produce a false sense of privacy, and more robust techniques should be envisioned to offer protection to end users.

互联网上的隐私已经成为一个优先考虑的问题，人们已经做出了一些努力来限制个人信息的泄露。在TLS客户端Hello和DNS流量中，域名都是网络中观察者仍然可见的最后信息之一。通过HTTPS或QUIC协议的TLS、DNS的加密客户端Hello扩展旨在通过加密访问服务器的域名进一步提高网络机密性。在本文中，我们检查攻击者是否能够被动地观察用户的流量，即使名称被加密，仍然可以恢复他们访问的网站的域名。通过依赖大规模的网络痕迹，我们表明，在恢复加密域名时，简单的特征和现成的机器学习模型足以达到惊人的高精度和召回率。我们考虑了三种攻击场景，即恢复每个流名称，重建用户访问过的网站集，以及检查哪些用户访问了给定的目标网站。接下来，我们评估了基于填充的缓解效果，发现尽管填充浪费了资源，但所有三种攻击仍然有效。我们的结论是，目前的域名加密建议可能会产生一种错误的隐私感，应该设想更强大的技术来为最终用户提供保护。

{"title":"Attacking DoH and ECH: Does Server Name Encryption Protect Users’ Privacy?","authors":"Martino Trevisan, Francesca Soro, Marco Mellia, Idilio Drago, Ricardo Morla","doi":"https://dl.acm.org/doi/10.1145/3570726","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3570726","url":null,"abstract":"Privacy on the Internet has become a priority, and several efforts have been devoted to limit the leakage of personal information. Domain names, both in the TLS Client Hello and DNS traffic, are among the last pieces of information still visible to an observer in the network. The Encrypted Client Hello extension for TLS, DNS over HTTPS or over QUIC protocols aim to further increase network confidentiality by encrypting the domain names of the visited servers.In this article, we check whether an attacker able to passively observe the traffic of users could still recover the domain name of websites they visit even if names are encrypted. By relying on large-scale network traces, we show that simplistic features and off-the-shelf machine learning models are sufficient to achieve surprisingly high precision and recall when recovering encrypted domain names. We consider three attack scenarios, i.e., recovering the per-flow name, rebuilding the set of visited websites by a user, and checking which users visit a given target website. We next evaluate the efficacy of padding-based mitigation, finding that all three attacks are still effective, despite resources wasted with padding. We conclude that current proposals for domain encryption may produce a false sense of privacy, and more robust techniques should be envisioned to offer protection to end users.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"31 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Tip of the Buyer: Extracting Product Tips from Reviews 买家提示:从评论中提取产品提示

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3547140

Sharon Hirsch, Slava Novgorodov, Ido Guy, Alexander Nus

Product reviews play a key role in e-commerce platforms. Studies show that many users read product reviews before a purchase and trust them to the same extent as personal recommendations. However, in many cases, the number of reviews per product is large and extracting useful information becomes a challenging task. Several websites have recently added an option to post tips—short, concise, practical, and self-contained pieces of advice about the products. These tips are complementary to the reviews and usually add a new non-trivial insight about the product, beyond its title, attributes, and description. Yet, most if not all major e-commerce platforms lack the notion of a tip as a first-class citizen and customers typically express their advice through other means, such as reviews.

In this work, we propose an extractive method for tip generation from product reviews. We focus on five popular e-commerce domains whose reviews tend to contain useful non-trivial tips that are beneficial for potential customers. We formally define the task of tip extraction in e-commerce by providing the list of tip types, tip timing (before and/or after the purchase), and connection to the surrounding context sentences. To extract the tips, we propose a supervised approach and leverage a publicly available dataset, annotated by human editors, containing 14,000 product reviews. To demonstrate the potential of our approach, we compare different tip generation methods and evaluate them both manually and over the labeled set. Our approach demonstrates particularly high performance for popular products in the Baby, Home Improvement, and Sports & Outdoors domains, with precision of over 95% for the top 3 tips per product. In addition, we evaluate the performance of our methods on previously unseen domains. Finally, we discuss the practical usage of our approach in real-world applications. Concretely, we explain how tips generated from user reviews can be integrated in various use cases within e-commerce platforms and benefit both buyers and sellers.

产品评论在电子商务平台中扮演着关键的角色。研究表明，许多用户在购买前会阅读产品评论，并对其信任程度与个人推荐相同。然而，在许多情况下，每个产品的评论数量很大，提取有用的信息成为一项具有挑战性的任务。一些网站最近增加了发布提示的选项——简短、简洁、实用、独立的关于产品的建议。这些提示是对评论的补充，通常会在产品的标题、属性和描述之外，增加对产品的新见解。然而，大多数(如果不是全部的话)大型电子商务平台都没有作为一等公民给小费的概念，顾客通常通过其他方式表达他们的建议，比如评论。在这项工作中，我们提出了一种从产品评论中提取提示的方法。我们专注于五个流行的电子商务领域，它们的评论往往包含对潜在客户有益的有用而非琐碎的提示。我们通过提供提示类型、提示时间(在购买之前和/或之后)以及与周围上下文句子的连接的列表，正式定义了电子商务中提示提取的任务。为了提取提示，我们提出了一种监督方法，并利用一个公开可用的数据集，由人工编辑注释，包含14,000个产品评论。为了证明我们的方法的潜力，我们比较了不同的提示生成方法，并在手动和标记集上对它们进行评估。我们的方法在婴儿用品、家居用品和运动用品等流行产品中表现出了特别高的性能。户外领域，每个产品的前3个提示精度超过95%。此外，我们评估了我们的方法在以前看不见的领域的性能。最后，我们讨论了我们的方法在实际应用程序中的实际用法。具体来说，我们解释了如何将用户评论生成的提示集成到电子商务平台的各种用例中，从而使买卖双方都受益。

{"title":"The Tip of the Buyer: Extracting Product Tips from Reviews","authors":"Sharon Hirsch, Slava Novgorodov, Ido Guy, Alexander Nus","doi":"https://dl.acm.org/doi/10.1145/3547140","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3547140","url":null,"abstract":"Product reviews play a key role in e-commerce platforms. Studies show that many users read product reviews before a purchase and trust them to the same extent as personal recommendations. However, in many cases, the number of reviews per product is large and extracting useful information becomes a challenging task. Several websites have recently added an option to post tips—short, concise, practical, and self-contained pieces of advice about the products. These tips are complementary to the reviews and usually add a new non-trivial insight about the product, beyond its title, attributes, and description. Yet, most if not all major e-commerce platforms lack the notion of a tip as a first-class citizen and customers typically express their advice through other means, such as reviews. In this work, we propose an extractive method for tip generation from product reviews. We focus on five popular e-commerce domains whose reviews tend to contain useful non-trivial tips that are beneficial for potential customers. We formally define the task of tip extraction in e-commerce by providing the list of tip types, tip timing (before and/or after the purchase), and connection to the surrounding context sentences. To extract the tips, we propose a supervised approach and leverage a publicly available dataset, annotated by human editors, containing 14,000 product reviews. To demonstrate the potential of our approach, we compare different tip generation methods and evaluate them both manually and over the labeled set. Our approach demonstrates particularly high performance for popular products in the Baby, Home Improvement, and Sports & Outdoors domains, with precision of over 95% for the top 3 tips per product. In addition, we evaluate the performance of our methods on previously unseen domains. Finally, we discuss the practical usage of our approach in real-world applications. Concretely, we explain how tips generated from user reviews can be integrated in various use cases within e-commerce platforms and benefit both buyers and sellers.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"8 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Facilitating Serverless Match-based Online Games with Novel Blockchain Technologies 利用新颖的区块链技术促进无服务器匹配的在线游戏

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3565884

Feijie Wu, Ho Yin Yuen, Henry Chan, Victor C. M. Leung, Wei Cai

Applying peer-to-peer (P2P) architecture to online video games has already attracted both academic and industrial interests, since it removes the need for expensive server maintenance. However, there are two major issues preventing the use of a P2P architecture, namely how to provide an effective distributed data storage solution, and how to tackle potential cheating behaviors. Inspired by emerging blockchain techniques, we propose a novel consensus model called Proof-of-Play (PoP) to provide a decentralized data storage system that incorporates an anti-cheating mechanism for P2P games, by rewarding players that interact with the game as intended, along with consideration of security measures to address the Nothing-at-stake Problem and the Long-range Attack. To validate our design, we utilize a game-theory model to show that under certain assumptions, the integrity of the PoP system would not be undermined due to the best interests of any user. Then, as a proof-of-concept, we developed a P2P game (Infinity Battle) to demonstrate how a game can be integrated with PoP in practice. Finally, experiments were conducted to study PoP in comparison with Proof-of-Work (PoW) to show its advantages in various aspects.

将点对点(P2P)架构应用于在线视频游戏已经吸引了学术界和工业界的兴趣，因为它消除了对昂贵的服务器维护的需要。然而，有两个主要问题阻碍了P2P架构的使用，即如何提供有效的分布式数据存储解决方案，以及如何解决潜在的作弊行为。受新兴区块链技术的启发，我们提出了一种新的共识模型，称为游戏证明(PoP)，通过奖励与游戏互动的玩家，为P2P游戏提供一个分散的数据存储系统，该系统包含反作弊机制，同时考虑安全措施，以解决无风险问题和远程攻击。为了验证我们的设计，我们利用博弈论模型来证明，在某些假设下，PoP系统的完整性不会因为任何用户的最大利益而受到破坏。然后，作为概念验证，我们开发了一款P2P游戏(Infinity Battle)来演示游戏如何在实践中与PoP相结合。最后，通过实验对PoP与工作量证明(PoW)进行了对比研究，展示了其在各个方面的优势。

{"title":"Facilitating Serverless Match-based Online Games with Novel Blockchain Technologies","authors":"Feijie Wu, Ho Yin Yuen, Henry Chan, Victor C. M. Leung, Wei Cai","doi":"https://dl.acm.org/doi/10.1145/3565884","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3565884","url":null,"abstract":"Applying peer-to-peer (P2P) architecture to online video games has already attracted both academic and industrial interests, since it removes the need for expensive server maintenance. However, there are two major issues preventing the use of a P2P architecture, namely how to provide an effective distributed data storage solution, and how to tackle potential cheating behaviors. Inspired by emerging blockchain techniques, we propose a novel consensus model called Proof-of-Play (PoP) to provide a decentralized data storage system that incorporates an anti-cheating mechanism for P2P games, by rewarding players that interact with the game as intended, along with consideration of security measures to address the Nothing-at-stake Problem and the Long-range Attack. To validate our design, we utilize a game-theory model to show that under certain assumptions, the integrity of the PoP system would not be undermined due to the best interests of any user. Then, as a proof-of-concept, we developed a P2P game (Infinity Battle) to demonstrate how a game can be integrated with PoP in practice. Finally, experiments were conducted to study PoP in comparison with Proof-of-Work (PoW) to show its advantages in various aspects.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated Route Leak Detection in Inter-domain Routing with Privacy Guarantee 隐私保证域间路由中的联邦路由泄漏检测

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3561051

Man Zeng, Dandan Li, Pei Zhang, Kun Xie, Xiaohong Huang

In the inter-domain network, route leaks can disrupt the Internet traffic and cause large outages. The accurate detection of route leaks requires the sharing of AS business relationship information. However, the business relationship information between ASes is confidential. ASes are usually unwilling to reveal this information to the other ASes, especially their competitors. In this paper, we propose a method named FL-RLD to detect route leaks while maintaining the privacy of business relationships between ASes by using a blockchain-based federated learning framework, where ASes can collaboratively train a global detection model without directly disclosing their specific business relationships. To mitigate the lack of ground-truth validation data in route leaks, FL-RLD provides a self-validation scheme by labeling AS triples with local routing policies. We evaluate FL-RLD under a variety of datasets including imbalanced and balanced datasets, and examine different deployment strategies of FL-RLD under different topologies. According to the results, FL-RLD performs better in detecting route leaks than the single AS detection, whether the datasets are balanced or imbalanced. Additionally, the results indicate that selecting ASes with the most peers to first deploy FL-RLD brings more significant benefits in detecting route leaks than selecting ASes with the most providers and customers.

在跨域网络中，路由泄漏会导致Internet流量中断，并造成大规模的网络中断。为了准确检测路由泄漏，需要共享AS业务关系信息。但是，ase之间的业务关系信息是保密的。ase通常不愿意将这些信息透露给其他ase，尤其是它们的竞争对手。在本文中，我们提出了一种名为FL-RLD的方法，通过使用基于区块链的联邦学习框架来检测路由泄漏，同时维护ase之间业务关系的隐私，其中ase可以协作训练全局检测模型，而无需直接披露其特定的业务关系。为了缓解路由泄漏中缺乏真实验证数据的问题，FL-RLD提供了一种通过标记本地路由策略的AS三元组来进行自我验证的方案。我们评估了多种数据集下的FL-RLD，包括不平衡数据集和平衡数据集，并研究了不同拓扑下FL-RLD的不同部署策略。结果表明，无论数据集是均衡的还是不均衡的，FL-RLD检测路由泄漏的性能都优于单个AS检测。此外，结果表明，选择具有最多对等体的as来首次部署FL-RLD，在检测路由泄漏方面比选择具有最多提供者和客户的as带来更大的好处。

{"title":"Federated Route Leak Detection in Inter-domain Routing with Privacy Guarantee","authors":"Man Zeng, Dandan Li, Pei Zhang, Kun Xie, Xiaohong Huang","doi":"https://dl.acm.org/doi/10.1145/3561051","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3561051","url":null,"abstract":"In the inter-domain network, route leaks can disrupt the Internet traffic and cause large outages. The accurate detection of route leaks requires the sharing of AS business relationship information. However, the business relationship information between ASes is confidential. ASes are usually unwilling to reveal this information to the other ASes, especially their competitors. In this paper, we propose a method named FL-RLD to detect route leaks while maintaining the privacy of business relationships between ASes by using a blockchain-based federated learning framework, where ASes can collaboratively train a global detection model without directly disclosing their specific business relationships. To mitigate the lack of ground-truth validation data in route leaks, FL-RLD provides a self-validation scheme by labeling AS triples with local routing policies. We evaluate FL-RLD under a variety of datasets including imbalanced and balanced datasets, and examine different deployment strategies of FL-RLD under different topologies. According to the results, FL-RLD performs better in detecting route leaks than the single AS detection, whether the datasets are balanced or imbalanced. Additionally, the results indicate that selecting ASes with the most peers to first deploy FL-RLD brings more significant benefits in detecting route leaks than selecting ASes with the most providers and customers.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"9 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Doge of Wall Street: Analysis and Detection of Pump and Dump Cryptocurrency Manipulations 华尔街的总督:泵和转储加密货币操纵的分析和检测

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-02-23 DOI: https://dl.acm.org/doi/10.1145/3561300

Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa

Cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these assets, and nowadays, cryptocurrency exchanges process transactions for over 100 billion US dollars per month. Despite this, many cryptocurrencies have low liquidity and are highly prone to market manipulation. This paper performs an in-depth analysis of two market manipulations organized by communities over the Internet: The pump and dump and the crowd pump. The pump and dump scheme is a fraud as old as the stock market. Now, it has new vitality in the loosely regulated market of cryptocurrencies. Groups of highly coordinated people systematically arrange this scam, usually on Telegram and Discord. We monitored these groups for more than 3 years, detecting around 900 individual events. We report on three case studies related to pump and dump groups. We leverage our unique dataset of the verified pump and dumps to build a machine learning model able to detect a pump and dump in 25 seconds from the moment it starts, achieving the results of 94.5% of F1-score. Then, we move on to the crowd pump, a new phenomenon that hit the news in the first months of 2021, when a Reddit community inflated the price of the GameStop stocks (GME) by over 1,900% on Wall Street, the world’s largest stock exchange. Later, other Reddit communities replicated the operation on the cryptocurrency markets. The targets were DogeCoin (DOGE) and Ripple (XRP). We reconstruct how these operations developed and discuss differences and analogies with the standard pump and dump. We believe this study helps understand a widespread phenomenon affecting cryptocurrency markets. The detection algorithms we develop effectively detect these events in real-time and helps investors stay out of the market when these frauds are in action.

加密货币越来越受欢迎。即使不是专家的人也开始投资这些资产，如今，加密货币交易所每月处理的交易超过1000亿美元。尽管如此，许多加密货币的流动性很低，很容易受到市场操纵。本文对两种由互联网上的社区组织的市场操纵行为进行了深入的分析:泵和转储和人群泵。哄抬股价的骗局是一种和股票市场一样古老的骗局。现在，它在监管宽松的加密货币市场中焕发了新的活力。一群高度协调的人系统地安排这种骗局，通常在Telegram和Discord上。我们对这些群体进行了3年多的监测，发现了大约900个单独的事件。我们报告了与泵和转储组相关的三个案例研究。我们利用我们独特的验证泵和转储数据集来构建一个机器学习模型，该模型能够从泵和转储开始的那一刻起在25秒内检测到泵和转储，达到f1分数的94.5%。然后，我们再来看看众筹，这是一个在2021年的头几个月成为新闻热点的新现象，当时Reddit社区将GameStop股票(GME)的价格在全球最大的证券交易所华尔街推高了1900%以上。后来，Reddit的其他社区在加密货币市场上复制了这一操作。目标是狗狗币(DOGE)和Ripple (XRP)。我们重建了这些操作是如何发展的，并讨论了与标准泵和转储的区别和相似之处。我们认为这项研究有助于理解影响加密货币市场的普遍现象。我们开发的检测算法可以实时有效地检测这些事件，并帮助投资者在这些欺诈行为发生时远离市场。

{"title":"The Doge of Wall Street: Analysis and Detection of Pump and Dump Cryptocurrency Manipulations","authors":"Massimo La Morgia, Alessandro Mei, Francesco Sassi, Julinda Stefa","doi":"https://dl.acm.org/doi/10.1145/3561300","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3561300","url":null,"abstract":"Cryptocurrencies are increasingly popular. Even people who are not experts have started to invest in these assets, and nowadays, cryptocurrency exchanges process transactions for over 100 billion US dollars per month. Despite this, many cryptocurrencies have low liquidity and are highly prone to market manipulation. This paper performs an in-depth analysis of two market manipulations organized by communities over the Internet: The pump and dump and the crowd pump. The pump and dump scheme is a fraud as old as the stock market. Now, it has new vitality in the loosely regulated market of cryptocurrencies. Groups of highly coordinated people systematically arrange this scam, usually on Telegram and Discord. We monitored these groups for more than 3 years, detecting around 900 individual events. We report on three case studies related to pump and dump groups. We leverage our unique dataset of the verified pump and dumps to build a machine learning model able to detect a pump and dump in 25 seconds from the moment it starts, achieving the results of 94.5% of F1-score. Then, we move on to the crowd pump, a new phenomenon that hit the news in the first months of 2021, when a Reddit community inflated the price of the GameStop stocks (GME) by over 1,900% on Wall Street, the world’s largest stock exchange. Later, other Reddit communities replicated the operation on the cryptocurrency markets. The targets were DogeCoin (DOGE) and Ripple (XRP). We reconstruct how these operations developed and discuss differences and analogies with the standard pump and dump. We believe this study helps understand a widespread phenomenon affecting cryptocurrency markets. The detection algorithms we develop effectively detect these events in real-time and helps investors stay out of the market when these frauds are in action.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"19 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0