ACM Transactions on Internet Technology最新文献_第6页

Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention Networks 基于BERT和以姿态为中心的图注意网络的姿态级讽刺检测

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-18 DOI: https://dl.acm.org/doi/10.1145/3533430

Yazhou Zhang, Dan Ma, Prayag Tiwari, Chen Zhang, Mehedi Masud, Mohammad Shorfuzzaman, Dawei Song

Computational Linguistics (CL) associated with the Internet of Multimedia Things (IoMT)-enabled multimedia computing applications brings several research challenges, such as real-time speech understanding, deep fake video detection, emotion recognition, home automation, and so on. Due to the emergence of machine translation, CL solutions have increased tremendously for different natural language processing (NLP) applications. Nowadays, NLP-enabled IoMT is essential for its success. Sarcasm detection, a recently emerging artificial intelligence (AI) and NLP task, aims at discovering sarcastic, ironic, and metaphoric information implied in texts that are generated in the IoMT. It has drawn much attention from the AI and IoMT research community. The advance of sarcasm detection and NLP techniques will provide a cost-effective, intelligent way to work together with machine devices and high-level human-to-device interactions. However, existing sarcasm detection approaches neglect the hidden stance behind texts, thus insufficient to exploit the full potential of the task. Indeed, the stance, i.e., whether the author of a text is in favor of, against, or neutral toward the proposition or target talked in the text, largely determines the text’s actual sarcasm orientation. To fill the gap, in this research, we propose a new task: stance-level sarcasm detection (SLSD), where the goal is to uncover the author’s latent stance and based on it to identify the sarcasm polarity expressed in the text. We then propose an integral framework, which consists of Bidirectional Encoder Representations from Transformers (BERT) and a novel stance-centered graph attention networks (SCGAT). Specifically, BERT is used to capture the sentence representation, and SCGAT is designed to capture the stance information on specific target. Extensive experiments are conducted on a Chinese sarcasm sentiment dataset we created and the SemEval-2018 Task 3 English sarcasm dataset. The experimental results prove the effectiveness of the SCGAT framework over state-of-the-art baselines by a large margin.

计算语言学(CL)与多媒体物联网(IoMT)相关的多媒体计算应用带来了一些研究挑战，如实时语音理解、深度假视频检测、情感识别、家庭自动化等。由于机器翻译的出现，CL解决方案在不同的自然语言处理(NLP)应用程序中得到了极大的发展。如今，支持nlp的IoMT对其成功至关重要。讽刺检测是一项新兴的人工智能(AI)和NLP任务，旨在发现IoMT生成的文本中隐含的讽刺、讽刺和隐喻信息。它引起了人工智能和物联网研究界的广泛关注。讽刺检测和NLP技术的进步将提供一种具有成本效益的智能方式，与机器设备和高水平的人机交互一起工作。然而，现有的讽刺检测方法忽略了文本背后隐藏的立场，不足以充分挖掘任务的潜力。事实上，立场，即文章作者对文章中所谈论的命题或对象是赞成、反对还是中立，在很大程度上决定了文章的实际讽刺取向。为了填补这一空白，在本研究中，我们提出了一个新的任务:立场级讽刺检测(SLSD)，其目标是揭示作者的潜在立场，并在此基础上识别文本中表达的讽刺极性。然后，我们提出了一个完整的框架，该框架由来自变形金刚的双向编码器表示(BERT)和一个新的以姿态为中心的图注意网络(SCGAT)组成。其中，BERT用于捕获句子表示，SCGAT用于捕获特定目标的立场信息。在我们创建的中文讽刺情绪数据集和SemEval-2018 Task 3英语讽刺数据集上进行了广泛的实验。实验结果证明了SCGAT框架在最先进的基线上的有效性。

{"title":"Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention Networks","authors":"Yazhou Zhang, Dan Ma, Prayag Tiwari, Chen Zhang, Mehedi Masud, Mohammad Shorfuzzaman, Dawei Song","doi":"https://dl.acm.org/doi/10.1145/3533430","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3533430","url":null,"abstract":"Computational Linguistics (CL) associated with the Internet of Multimedia Things (IoMT)-enabled multimedia computing applications brings several research challenges, such as real-time speech understanding, deep fake video detection, emotion recognition, home automation, and so on. Due to the emergence of machine translation, CL solutions have increased tremendously for different natural language processing (NLP) applications. Nowadays, NLP-enabled IoMT is essential for its success. Sarcasm detection, a recently emerging artificial intelligence (AI) and NLP task, aims at discovering sarcastic, ironic, and metaphoric information implied in texts that are generated in the IoMT. It has drawn much attention from the AI and IoMT research community. The advance of sarcasm detection and NLP techniques will provide a cost-effective, intelligent way to work together with machine devices and high-level human-to-device interactions. However, existing sarcasm detection approaches neglect the hidden stance behind texts, thus insufficient to exploit the full potential of the task. Indeed, the stance, i.e., whether the author of a text is in favor of, against, or neutral toward the proposition or target talked in the text, largely determines the text’s actual sarcasm orientation. To fill the gap, in this research, we propose a new task: stance-level sarcasm detection (SLSD), where the goal is to uncover the author’s latent stance and based on it to identify the sarcasm polarity expressed in the text. We then propose an integral framework, which consists of Bidirectional Encoder Representations from Transformers (BERT) and a novel stance-centered graph attention networks (SCGAT). Specifically, BERT is used to capture the sentence representation, and SCGAT is designed to capture the stance information on specific target. Extensive experiments are conducted on a Chinese sarcasm sentiment dataset we created and the SemEval-2018 Task 3 English sarcasm dataset. The experimental results prove the effectiveness of the SCGAT framework over state-of-the-art baselines by a large margin.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"16 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Deep Learning Models to Detect Fake News about COVID-19 使用深度学习模型检测关于COVID-19的假新闻

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-18 DOI: https://dl.acm.org/doi/10.1145/3533431

Mu-Yen Chen, Yi-Wei Lai, Jiunn-Woei Lian

The proliferation of mobile networked devices has made it easier and faster than ever for people to obtain and share information. However, this occasionally results in the propagation of erroneous information, which may be difficult to distinguish from the truth. The widespread diffusion of such information can result in irrational and poor decision making on potentially important issues. In 2020, this coincided with the global outbreak of Coronavirus Disease (COVID-19), a highly contagious and deadly virus. The proliferation of misinformation about COVID-19 on social media has already been identified as an “infodemic” by the World Health Organization (WHO), posing significant challenges for global governments seeking to manage the pandemic. This has driven an urgent need for methods to automatically detect and identify such misinformation. The research uses multiple deep learning model frameworks to detect misinformation in Chinese and English, and compare them based on different text feature selections. The model learns the textual characteristics of each type of true and misinformation for subsequent true/false prediction. The long and short-term memory (LSTM) model, the gated recurrent unit (GRU) model, and the bidirectional long and short-term memory (BiLSTM) model were selected for fake news detection. BiLSTM produces the best detection result, with detection accuracy reaching 94% for short-sentence English texts, and 99% for long-sentence English texts, while the accuracy for Chinese texts was 82%.

移动网络设备的激增使得人们获取和分享信息比以往任何时候都更容易、更快捷。然而，这偶尔会导致错误信息的传播，这可能很难与事实区分开来。这种信息的广泛传播可能导致在潜在重要问题上做出不合理和糟糕的决策。2020年，全球爆发了冠状病毒病(COVID-19)，这是一种高度传染性和致命的病毒。世界卫生组织(世卫组织)已经将社交媒体上关于COVID-19的错误信息的扩散确定为“信息流行病”，这给寻求管理大流行的全球政府带来了重大挑战。这促使人们迫切需要自动检测和识别此类错误信息的方法。该研究使用多个深度学习模型框架来检测中文和英文的错误信息，并基于不同的文本特征选择对它们进行比较。该模型学习每一种真假信息的文本特征，用于后续的真假预测。采用长短期记忆(LSTM)模型、门控循环单元(GRU)模型和双向长短期记忆(BiLSTM)模型进行假新闻检测。BiLSTM的检测效果最好，短句英语文本的检测准确率达到94%，长句英语文本的检测准确率达到99%，中文文本的检测准确率为82%。

{"title":"Using Deep Learning Models to Detect Fake News about COVID-19","authors":"Mu-Yen Chen, Yi-Wei Lai, Jiunn-Woei Lian","doi":"https://dl.acm.org/doi/10.1145/3533431","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3533431","url":null,"abstract":"The proliferation of mobile networked devices has made it easier and faster than ever for people to obtain and share information. However, this occasionally results in the propagation of erroneous information, which may be difficult to distinguish from the truth. The widespread diffusion of such information can result in irrational and poor decision making on potentially important issues. In 2020, this coincided with the global outbreak of Coronavirus Disease (COVID-19), a highly contagious and deadly virus. The proliferation of misinformation about COVID-19 on social media has already been identified as an “infodemic” by the World Health Organization (WHO), posing significant challenges for global governments seeking to manage the pandemic. This has driven an urgent need for methods to automatically detect and identify such misinformation. The research uses multiple deep learning model frameworks to detect misinformation in Chinese and English, and compare them based on different text feature selections. The model learns the textual characteristics of each type of true and misinformation for subsequent true/false prediction. The long and short-term memory (LSTM) model, the gated recurrent unit (GRU) model, and the bidirectional long and short-term memory (BiLSTM) model were selected for fake news detection. BiLSTM produces the best detection result, with detection accuracy reaching 94% for short-sentence English texts, and 99% for long-sentence English texts, while the accuracy for Chinese texts was 82%.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"22 2","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Potential of Cyber Manufacturing Systems in the Digital Age 探索数字时代网络制造系统的潜力

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-08 DOI: 10.1145/3596602

Usman Ahmed, Jerry Chun‐wei Lin, Gautam Srivastava

Cyber-manufacturing Systems (CMS) have been growing in popularity. Transitioning from conventional manufacturing to an innovative paradigm that emphasizes innovation, automation, better customer service, and intelligent systems. A new manufacturing model can improve efficiency and productivity, and provide better customer service and response times. In addition, it may revolutionize the way products are produced, from design to completion. Thus, it is likely that this new manufacturing model will become increasingly popular shortly. By building new technologies on top of existing CMS, these systems ensure that data exchange and integration between decentralized systems are reliable and secure. Recently published case studies from industry and the literature support this claim. There are still some challenges to overcome, such as ensuring data reliability, but these can be overcome with further research and development. In summary, the use of CMS can revolutionize the manufacturing industry. This paper comprehensively analyses these systems and their potential applications and implications. The article gives an overview of the field and then explores the various aspects of CMS in greater detail. A taxonomy of the most common and current approaches to cyber-manufacturing systems is presented, including networked cyber-manufacturing systems, distributed cyber-manufacturing systems, cloud-based cyber-manufacturing systems, and cyber-physical systems (CPS). Furthermore, the paper identifies several popular open-source software and datasets and discusses how these resources can reduce barriers to CMS research. In addition, the paper identifies several important issues and research opportunities associated with CMS, including better integration between hardware and software, improved security and privacy protocols, communication protocols, and improved data management systems. The paper provides a comprehensive overview of current technology and valuable insights into the potential impact of the technology on society and industry.

网络制造系统（CMS）越来越受欢迎。从传统制造业向强调创新、自动化、更好的客户服务和智能系统的创新范式过渡。新的制造模式可以提高效率和生产力，并提供更好的客户服务和响应时间。此外，它可能会彻底改变产品的生产方式，从设计到完成。因此，这种新的制造模式很可能很快就会变得越来越流行。通过在现有CMS之上构建新技术，这些系统确保分散系统之间的数据交换和集成是可靠和安全的。最近发表的工业案例研究和文献支持了这一说法。仍有一些挑战需要克服，例如确保数据可靠性，但这些挑战可以通过进一步的研发来克服。总之，CMS的使用可以彻底改变制造业。本文全面分析了这些系统及其潜在的应用和意义。本文对该领域进行了概述，然后更详细地探讨了CMS的各个方面。介绍了网络制造系统最常见和最新方法的分类，包括网络化网络制造系统、分布式网络制造系统，基于云的网络制造系统和网络物理系统（CPS）。此外，本文确定了几种流行的开源软件和数据集，并讨论了这些资源如何减少CMS研究的障碍。此外，本文还确定了与CMS相关的几个重要问题和研究机会，包括更好地集成硬件和软件、改进的安全和隐私协议、通信协议以及改进的数据管理系统。本文全面概述了当前的技术，并对该技术对社会和行业的潜在影响提供了有价值的见解。

{"title":"Exploring the Potential of Cyber Manufacturing Systems in the Digital Age","authors":"Usman Ahmed, Jerry Chun‐wei Lin, Gautam Srivastava","doi":"10.1145/3596602","DOIUrl":"https://doi.org/10.1145/3596602","url":null,"abstract":"Cyber-manufacturing Systems (CMS) have been growing in popularity. Transitioning from conventional manufacturing to an innovative paradigm that emphasizes innovation, automation, better customer service, and intelligent systems. A new manufacturing model can improve efficiency and productivity, and provide better customer service and response times. In addition, it may revolutionize the way products are produced, from design to completion. Thus, it is likely that this new manufacturing model will become increasingly popular shortly. By building new technologies on top of existing CMS, these systems ensure that data exchange and integration between decentralized systems are reliable and secure. Recently published case studies from industry and the literature support this claim. There are still some challenges to overcome, such as ensuring data reliability, but these can be overcome with further research and development. In summary, the use of CMS can revolutionize the manufacturing industry. This paper comprehensively analyses these systems and their potential applications and implications. The article gives an overview of the field and then explores the various aspects of CMS in greater detail. A taxonomy of the most common and current approaches to cyber-manufacturing systems is presented, including networked cyber-manufacturing systems, distributed cyber-manufacturing systems, cloud-based cyber-manufacturing systems, and cyber-physical systems (CPS). Furthermore, the paper identifies several popular open-source software and datasets and discusses how these resources can reduce barriers to CMS research. In addition, the paper identifies several important issues and research opportunities associated with CMS, including better integration between hardware and software, improved security and privacy protocols, communication protocols, and improved data management systems. The paper provides a comprehensive overview of current technology and valuable insights into the potential impact of the technology on society and industry.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44081378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Potential of Cyber Manufacturing Systems in the Digital Age 探索数字时代网络制造系统的潜力

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-08 DOI: https://dl.acm.org/doi/10.1145/3596602

Usman Ahmed, Jerry Chun-Wei Lin, Gautam Srivastava

Cyber-manufacturing Systems (CMS) have been growing in popularity. Transitioning from conventional manufacturing to an innovative paradigm that emphasizes innovation, automation, better customer service, and intelligent systems. A new manufacturing model can improve efficiency and productivity, and provide better customer service and response times. In addition, it may revolutionize the way products are produced, from design to completion. Thus, it is likely that this new manufacturing model will become increasingly popular shortly. By building new technologies on top of existing CMS, these systems ensure that data exchange and integration between decentralized systems are reliable and secure. Recently published case studies from industry and the literature support this claim. There are still some challenges to overcome, such as ensuring data reliability, but these can be overcome with further research and development. In summary, the use of CMS can revolutionize the manufacturing industry. This paper comprehensively analyses these systems and their potential applications and implications. The article gives an overview of the field and then explores the various aspects of CMS in greater detail. A taxonomy of the most common and current approaches to cyber-manufacturing systems is presented, including networked cyber-manufacturing systems, distributed cyber-manufacturing systems, cloud-based cyber-manufacturing systems, and cyber-physical systems (CPS). Furthermore, the paper identifies several popular open-source software and datasets and discusses how these resources can reduce barriers to CMS research. In addition, the paper identifies several important issues and research opportunities associated with CMS, including better integration between hardware and software, improved security and privacy protocols, communication protocols, and improved data management systems. The paper provides a comprehensive overview of current technology and valuable insights into the potential impact of the technology on society and industry.

网络制造系统(CMS)越来越受欢迎。从传统制造业向强调创新、自动化、更好的客户服务和智能系统的创新模式过渡。一种新的制造模式可以提高效率和生产力，并提供更好的客户服务和响应时间。此外，它可能会彻底改变产品的生产方式，从设计到完成。因此，这种新的制造模式很可能很快就会变得越来越流行。通过在现有CMS之上构建新技术，这些系统确保分散系统之间的数据交换和集成是可靠和安全的。最近发表的行业案例研究和文献支持这一说法。还有一些挑战需要克服，比如确保数据的可靠性，但这些都可以通过进一步的研究和开发来克服。总之，CMS的使用可以彻底改变制造业。本文全面分析了这些系统及其潜在的应用和意义。本文概述了该领域，然后更详细地探讨了CMS的各个方面。介绍了网络制造系统最常见和当前方法的分类，包括网络化网络制造系统、分布式网络制造系统、基于云的网络制造系统和网络物理系统(CPS)。此外，本文确定了几个流行的开源软件和数据集，并讨论了这些资源如何减少CMS研究的障碍。此外，本文还指出了与CMS相关的几个重要问题和研究机会，包括硬件和软件之间更好的集成，改进的安全和隐私协议，通信协议以及改进的数据管理系统。本文全面概述了当前的技术，并对技术对社会和工业的潜在影响提供了有价值的见解。

{"title":"Exploring the Potential of Cyber Manufacturing Systems in the Digital Age","authors":"Usman Ahmed, Jerry Chun-Wei Lin, Gautam Srivastava","doi":"https://dl.acm.org/doi/10.1145/3596602","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3596602","url":null,"abstract":"Cyber-manufacturing Systems (CMS) have been growing in popularity. Transitioning from conventional manufacturing to an innovative paradigm that emphasizes innovation, automation, better customer service, and intelligent systems. A new manufacturing model can improve efficiency and productivity, and provide better customer service and response times. In addition, it may revolutionize the way products are produced, from design to completion. Thus, it is likely that this new manufacturing model will become increasingly popular shortly. By building new technologies on top of existing CMS, these systems ensure that data exchange and integration between decentralized systems are reliable and secure. Recently published case studies from industry and the literature support this claim. There are still some challenges to overcome, such as ensuring data reliability, but these can be overcome with further research and development. In summary, the use of CMS can revolutionize the manufacturing industry. This paper comprehensively analyses these systems and their potential applications and implications. The article gives an overview of the field and then explores the various aspects of CMS in greater detail. A taxonomy of the most common and current approaches to cyber-manufacturing systems is presented, including networked cyber-manufacturing systems, distributed cyber-manufacturing systems, cloud-based cyber-manufacturing systems, and cyber-physical systems (CPS). Furthermore, the paper identifies several popular open-source software and datasets and discusses how these resources can reduce barriers to CMS research. In addition, the paper identifies several important issues and research opportunities associated with CMS, including better integration between hardware and software, improved security and privacy protocols, communication protocols, and improved data management systems. The paper provides a comprehensive overview of current technology and valuable insights into the potential impact of the technology on society and industry.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"80 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis i-DarkVec：用于暗网流量分析的增量嵌入

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-03 DOI: 10.1145/3595378

L. Gioacchini, L. Vassio, M. Mellia, I. Drago, Z. B. Houidi, Dario Rossi

Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets, and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.

暗网是监听到达没有服务的IP地址的流量的探测器。到达暗网的流量是由互联网扫描仪、僵尸网络以及可能配置错误的主机的操作造成的。暗网流量的这种特殊性质使暗网成为发现恶意在线活动的宝贵工具，例如识别机器人或扫描仪执行的协调行动。然而，暗网观察到的大量数据包和来源使得很难提取有意义的见解，需要可扩展的工具来自动识别和分组具有类似行为的来源。我们在这里介绍i-DarkVec，这是一种学习暗网流量有意义表示的方法。i-DarkVec利用自然语言处理技术（例如Word2Verc）来捕捉扫描仪或机器人启动协调动作时出现的共现模式。与NLP问题一样，使用i-DarkVec学习的嵌入能够在暗网流量上执行一些新的机器学习任务，例如识别参与类似活动的发送者集群。我们对i-DarkVec进行了广泛的测试，并在一个使用真实暗网的案例研究中探索了它的设计空间。我们表明，通过对服务的正确定义，学习到的嵌入可以用于（i）解决分类问题，将未知来源的IP地址与正确的协调参与者类别相关联，以及（ii）自动识别执行类似攻击和扫描的先前未知来源的集群，从而简化安全分析师的工作。i-DarkVec利用了一种新颖的增量嵌入学习方法，该方法可扩展且对流量变化具有鲁棒性，适用于动态和大规模场景。

{"title":"i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis","authors":"L. Gioacchini, L. Vassio, M. Mellia, I. Drago, Z. B. Houidi, Dario Rossi","doi":"10.1145/3595378","DOIUrl":"https://doi.org/10.1145/3595378","url":null,"abstract":"Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets, and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":" ","pages":"1 - 28"},"PeriodicalIF":5.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49505233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis i-DarkVec:暗网流量分析的增量嵌入

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-03 DOI: https://dl.acm.org/doi/10.1145/3595378

Luca Gioacchini, Luca Vassio, Marco Mellia, Idilio Drago, Zied Ben Houidi

Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour.

We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities.

We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors, and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.

暗网是侦听到达没有主机服务的IP地址的流量的探测器。流量到达暗网的结果是互联网扫描器，僵尸网络和可能配置错误的主机的行动。暗网流量的这种特殊性质使暗网成为发现恶意在线活动的宝贵工具，例如，识别由机器人或扫描仪执行的协调行动。然而，暗网观察到的大量数据包和源使得很难提取有意义的见解，这需要可扩展的工具来自动识别和分组共享相似行为的源。我们在这里提出i-DarkVec，一种学习暗网流量的有意义表示的方法。i-DarkVec利用自然语言处理技术(例如，Word2Vec)来捕捉扫描器或机器人启动协调动作时出现的共现模式。与NLP问题一样，使用i-DarkVec学习的嵌入在暗网流量上启用了几个新的机器学习任务，例如识别从事类似活动的发送者集群。我们广泛测试i-DarkVec和探索其设计空间在一个案例研究中使用真实的黑暗。我们表明，通过适当的服务定义，学习嵌入可以用于(i)解决分类问题，将未知源的IP地址与协调参与者的正确类别相关联，以及(ii)自动识别先前未知源的集群，执行类似的攻击和扫描，从而减轻安全分析师的工作。i-DarkVec利用了一种新颖的增量嵌入学习方法，该方法对流量变化具有可扩展性和鲁棒性，使其适用于动态和大规模的场景。

{"title":"i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis","authors":"Luca Gioacchini, Luca Vassio, Marco Mellia, Idilio Drago, Zied Ben Houidi","doi":"https://dl.acm.org/doi/10.1145/3595378","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3595378","url":null,"abstract":"Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors, and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"115 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138541989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unpaired Self-supervised Learning for Industrial Cyber-Manufacturing Spectrum Blind Deconvolution 工业网络制造频谱盲反褶积的非配对自监督学习

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-03 DOI: 10.1145/3590963

Lizhen Deng, Guoxia Xu, Jiaqi Pi, Hu Zhu, Xiaokang Zhou

Cyber-Manufacturing combines industrial big data with intelligent analysis to find and understand the intangible problems in decision-making, which requires a systematic method to deal with rich signal data. With the development of spectral detection and photoelectric imaging technology, spectral blind deconvolution has achieved remarkable results. However, spectral processing is limited by one-dimensional signal, and there is no available structural information with few training samples. Moreover, in the majority of practical applications, it is entirely feasible to gather unpaired spectrum dataset for training. This training method of unpaired learning is practical and valuable. Therefore, a two-stage deconvolution scheme combining self supervised learning and feature extraction is proposed in this paper, which generates two complementary paired sets through self supervised learning to extract the final deconvolution network. In addition, a new deconvolution network is designed for feature extraction. The spectrum is pre-trained through spectral feature extraction and noise estimation network to improve the training efficiency and meet the assumed noise characteristics. Experimental results show that this method is effective in dealing with different types of synthetic noise.

Cyber Manufacturing将工业大数据与智能分析相结合，以发现和理解决策中的无形问题，这需要一种系统的方法来处理丰富的信号数据。随着光谱检测和光电成像技术的发展，光谱盲反褶积取得了显著的效果。然而，谱处理受到一维信号的限制，在训练样本很少的情况下没有可用的结构信息。此外，在大多数实际应用中，收集不成对的噪声和干净的频谱是可行的。这种非配对学习的训练方法既实用又有价值。因此，本文提出了一种结合自监督学习和特征提取的两阶段反褶积方案，通过自监督学习生成两个互补的配对集来提取最终的反褶积网络。此外，还设计了一种新的反卷积网络用于特征提取。通过频谱特征提取和噪声估计网络对频谱进行预训练，提高训练效率，满足假设的噪声特性。实验结果表明，该方法在处理不同类型的合成噪声时是有效的。

{"title":"Unpaired Self-supervised Learning for Industrial Cyber-Manufacturing Spectrum Blind Deconvolution","authors":"Lizhen Deng, Guoxia Xu, Jiaqi Pi, Hu Zhu, Xiaokang Zhou","doi":"10.1145/3590963","DOIUrl":"https://doi.org/10.1145/3590963","url":null,"abstract":"Cyber-Manufacturing combines industrial big data with intelligent analysis to find and understand the intangible problems in decision-making, which requires a systematic method to deal with rich signal data. With the development of spectral detection and photoelectric imaging technology, spectral blind deconvolution has achieved remarkable results. However, spectral processing is limited by one-dimensional signal, and there is no available structural information with few training samples. Moreover, in the majority of practical applications, it is entirely feasible to gather unpaired spectrum dataset for training. This training method of unpaired learning is practical and valuable. Therefore, a two-stage deconvolution scheme combining self supervised learning and feature extraction is proposed in this paper, which generates two complementary paired sets through self supervised learning to extract the final deconvolution network. In addition, a new deconvolution network is designed for feature extraction. The spectrum is pre-trained through spectral feature extraction and noise estimation network to improve the training efficiency and meet the assumed noise characteristics. Experimental results show that this method is effective in dealing with different types of synthetic noise.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48915509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unpaired Self-supervised Learning for Industrial Cyber-Manufacturing Spectrum Blind Deconvolution 工业网络制造频谱盲反卷积的非配对自监督学习

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-05-03 DOI: https://dl.acm.org/doi/10.1145/3590963

Lizhen Deng, Guoxia Xu, Jiaqi Pi, Hu Zhu, Xiaokang Zhou

Cyber-Manufacturing combines industrial big data with intelligent analysis to find and understand the intangible problems in decision-making, which requires a systematic method to deal with rich signal data. With the development of spectral detection and photoelectric imaging technology, spectral blind deconvolution has achieved remarkable results. However, spectral processing is limited by one-dimensional signal, there is no available structural information with little training samples. Moreover, in most practical applications, it is feasible to collect unpaired noise and clean spectrum. This training method of unpaired learning is practical and valuable. Therefore, a two-stage deconvolution scheme combining self supervised learning and feature extraction is proposed in this paper, which generates two complementary paired sets through self supervised learning to extract the final deconvolution network. In addition, a new deconvolution network is designed for feature extraction. The spectrum is pre-trained through spectral feature extraction and noise estimation network to improve the training efficiency and meet the assumed noise characteristics. Experimental results show that this method is effective in dealing with different types of synthetic noise.

Cyber-Manufacturing将工业大数据与智能分析相结合，发现和理解决策中的无形问题，这需要一种系统的方法来处理丰富的信号数据。随着光谱探测和光电成像技术的发展，光谱盲反褶积取得了显著的效果。然而，光谱处理受一维信号的限制，训练样本少，没有可用的结构信息。此外，在大多数实际应用中，采集不成对噪声和干净频谱是可行的。这种非配对学习的训练方法具有实用性和价值。因此，本文提出了一种结合自监督学习和特征提取的两阶段反褶积方案，通过自监督学习生成两个互补的配对集，提取最终的反褶积网络。此外，设计了一种新的反卷积网络用于特征提取。通过谱特征提取和噪声估计网络对谱进行预训练，提高训练效率，满足假设的噪声特征。实验结果表明，该方法可以有效地处理不同类型的合成噪声。

{"title":"Unpaired Self-supervised Learning for Industrial Cyber-Manufacturing Spectrum Blind Deconvolution","authors":"Lizhen Deng, Guoxia Xu, Jiaqi Pi, Hu Zhu, Xiaokang Zhou","doi":"https://dl.acm.org/doi/10.1145/3590963","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3590963","url":null,"abstract":"Cyber-Manufacturing combines industrial big data with intelligent analysis to find and understand the intangible problems in decision-making, which requires a systematic method to deal with rich signal data. With the development of spectral detection and photoelectric imaging technology, spectral blind deconvolution has achieved remarkable results. However, spectral processing is limited by one-dimensional signal, there is no available structural information with little training samples. Moreover, in most practical applications, it is feasible to collect unpaired noise and clean spectrum. This training method of unpaired learning is practical and valuable. Therefore, a two-stage deconvolution scheme combining self supervised learning and feature extraction is proposed in this paper, which generates two complementary paired sets through self supervised learning to extract the final deconvolution network. In addition, a new deconvolution network is designed for feature extraction. The spectrum is pre-trained through spectral feature extraction and noise estimation network to improve the training efficiency and meet the assumed noise characteristics. Experimental results show that this method is effective in dealing with different types of synthetic noise.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"31 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2023-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138533472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IRGA: An Intelligent Implicit Real-time Gait Authentication System in Heterogeneous Complex Scenarios 一种异构复杂场景下的智能隐式实时步态认证系统

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-04-26 DOI: 10.1145/3594538

Li Yang, Xi Li, Zhuoru Ma, Lu Li, Neal Xiong, J. Ma

Gait authentication as a technique that can continuously provide identity recognition on mobile devices for security has been investigated by academics in the community for decades. However, most of the existing work achieves insufficient generalization to complex real-world environments due to the complexity of the noisy real-world gait data. To address this limitation, we propose an intelligent Implicit Real-time Gait Authentication (IRGA) system based on Deep Neural Networks (DNNs) for enhancing the adaptability of gait authentication in practice. In the proposed system, the gait data (whether with complex interference signals) will first be processed sequentially by the imperceptible collection module and data preprocessing module for improving data quality. In order to illustrate and verify the suitability of our proposal, we provide analysis of the impact of individual gait changes on data feature distribution. Finally, a fusion neural network composed of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is designed to perform feature extraction and user authentication. We evaluate the proposed IRGA system in heterogeneous complex scenarios and present start-of-the-art comparisons on three datasets. Extensive experiments demonstrate that the IRGA system achieves improved performance simultaneously in several different metrics.

步态认证作为一种可以在移动设备上持续提供身份识别以实现安全的技术，几十年来一直受到社会学者的研究。然而，由于嘈杂的真实世界步态数据的复杂性，大多数现有工作对复杂的真实世界环境的泛化能力不足。为了解决这一局限性，我们提出了一种基于深度神经网络（DNN）的智能隐式实时步态认证（IRGA）系统，以增强步态认证在实践中的适应性。在所提出的系统中，步态数据（无论是否具有复杂的干扰信号）将首先由感知采集模块和数据预处理模块顺序处理，以提高数据质量。为了说明和验证我们的建议的适用性，我们分析了个体步态变化对数据特征分布的影响。最后，设计了一个由卷积神经网络（CNN）和长短期记忆（LSTM）组成的融合神经网络来进行特征提取和用户认证。我们在异构复杂场景中评估了所提出的IRGA系统，并在三个数据集上进行了现有技术的比较。大量实验表明，IRGA系统在几个不同的指标上同时实现了性能的提高。

{"title":"IRGA: An Intelligent Implicit Real-time Gait Authentication System in Heterogeneous Complex Scenarios","authors":"Li Yang, Xi Li, Zhuoru Ma, Lu Li, Neal Xiong, J. Ma","doi":"10.1145/3594538","DOIUrl":"https://doi.org/10.1145/3594538","url":null,"abstract":"Gait authentication as a technique that can continuously provide identity recognition on mobile devices for security has been investigated by academics in the community for decades. However, most of the existing work achieves insufficient generalization to complex real-world environments due to the complexity of the noisy real-world gait data. To address this limitation, we propose an intelligent Implicit Real-time Gait Authentication (IRGA) system based on Deep Neural Networks (DNNs) for enhancing the adaptability of gait authentication in practice. In the proposed system, the gait data (whether with complex interference signals) will first be processed sequentially by the imperceptible collection module and data preprocessing module for improving data quality. In order to illustrate and verify the suitability of our proposal, we provide analysis of the impact of individual gait changes on data feature distribution. Finally, a fusion neural network composed of a Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) is designed to perform feature extraction and user authentication. We evaluate the proposed IRGA system in heterogeneous complex scenarios and present start-of-the-art comparisons on three datasets. Extensive experiments demonstrate that the IRGA system achieves improved performance simultaneously in several different metrics.","PeriodicalId":50911,"journal":{"name":"ACM Transactions on Internet Technology","volume":"23 1","pages":"1 - 29"},"PeriodicalIF":5.3,"publicationDate":"2023-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43547540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Securing Scalable Real-time Multiparty Communications with Hybrid Information-centric Networking 使用以信息为中心的混合网络保护可扩展的实时多方通信

IF 5.3 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Internet Technology

Pub Date : 2023-04-19 DOI: 10.1145/3593585

L. Muscariello, M. Papalini, Olivier Roques, M. Sardara, A. Tran Van

In this article, we consider security aspects of online meeting applications based on protocols such as WebRTC that leverage the Information-centric Networking (ICN) architecture to make the system fundamentally more scalable. If the scalability properties provided by ICN have been proved in recent literature, the security challenges and implications for real-time applications have not been reviewed. We show that this class of applications can benefit from strong security and scalability jointly without any major tradeoff and with significant performance improvements over traditional WebRTC systems. To achieve this goal, some modifications to the current ICN architecture must be implemented in the way integrity and authentication are verified. Extensive performance analysis of the architecture based on the open source implementation of Hybrid-ICN proves that real-time applications can greatly benefit from this novel network architecture in terms of strong security and scalable communications.

在本文中，我们考虑基于WebRTC等协议的在线会议应用程序的安全方面，这些协议利用信息中心网络(Information-centric Networking, ICN)架构使系统从根本上更具可伸缩性。如果ICN提供的可扩展性属性已经在最近的文献中得到证明，则尚未审查实时应用程序的安全挑战和影响。我们表明，这类应用程序可以从强大的安全性和可扩展性中获益，而无需任何重大权衡，并且比传统的WebRTC系统具有显著的性能改进。为了实现这一目标，必须在验证完整性和身份验证的方式上对当前ICN体系结构进行一些修改。基于Hybrid-ICN开源实现的架构的广泛性能分析证明，实时应用程序可以从这种新颖的网络架构中获得强大的安全性和可扩展的通信。

引用次数: 1