arXiv - CS - Cryptography and Security最新文献

英文中文

The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection 迈达斯之触：触发 LLM 检测 RM-API 滥用的能力

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-14 DOI: arxiv-2409.09380

Yi Yang, Jinghua Liu, Kai Chen, Miaoqian Lin

In this paper, we propose an LLM-empowered RM-API misuse detection solution,ChatDetector, which fully automates LLMs for documentation understanding whichhelps RM-API constraints retrieval and RM-API misuse detection. To correctlyretrieve the RM-API constraints, ChatDetector is inspired by the ReActframework which is optimized based on Chain-of-Thought (CoT) to decompose thecomplex task into allocation APIs identification, RM-object (allocated/releasedby RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). Itfirst verifies the semantics of allocation APIs based on the retrieved RMsentences from API documentation through LLMs. Inspired by the LLMs'performance on various prompting methods,ChatDetector adopts a two-dimensionalprompting approach for cross-validation. At the same time, aninconsistency-checking approach between the LLMs' output and the reasoningprocess is adopted for the allocation APIs confirmation with an off-the-shelfNatural Language Processing (NLP) tool. To accurately pair the RM-APIs,ChatDetector decomposes the task again and identifies the RM-object type first,with which it can then accurately pair the releasing APIs and further constructthe RM-API constraints for misuse detection. With the diminishedhallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precisionof 98.21% compared with the state-of-the-art API detectors. By employing astatic detector CodeQL, we ethically report 115 security bugs on theapplications integrating on six popular libraries to the developers, which mayresult in severe issues, such as Denial-of-Services (DoS) and memorycorruption. Compared with the end-to-end benchmark method, the result showsthat ChatDetector can retrieve at least 47% more RM sentences and 80.85% moreRM-API constraints.

在本文中，我们提出了一种由 LLM 驱动的 RM-API 误用检测解决方案 ChatDetector，它可以完全自动化地使用 LLM 进行文档理解，从而帮助检索 RM-API 约束和检测 RM-API 误用。为了正确检索 RM-API 约束，ChatDetector 受到 ReAct 框架的启发，该框架基于思维链（CoT）进行优化，将复杂的任务分解为分配 API 识别、RM-对象（通过 RM API 分配/释放）提取和 RM-API 配对（RM API 通常成对存在）。首先，通过 LLM 从 API 文档中检索 RM 句子，验证分配 API 的语义。受 LLMs 在各种提示方法上表现的启发，ChatDetector 采用二维提示方法进行交叉验证。与此同时，对于使用现成的自然语言处理（NLP）工具确认的分配应用程序接口，还采用了在 LLMs 输出和推理过程之间进行一致性检查的方法。为了准确配对 RM-API，ChatDetector 再次分解任务，首先识别 RM 对象类型，然后与之准确配对释放 API，并进一步构建 RM-API 约束以进行误用检测。与最先进的API检测器相比，ChatDetector在减少幻觉的情况下识别出了165对RM-API，精确度达到98.21%。通过使用静态检测器 CodeQL，我们向开发人员道德地报告了集成在六个流行库上的应用程序的 115 个安全漏洞，这些漏洞可能会导致严重的问题，如拒绝服务（DoS）和内存破坏。与端到端基准方法相比，结果表明 ChatDetector 能够检索的 RM 句子至少增加了 47%，RM-API 约束至少增加了 80.85%。

{"title":"The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection","authors":"Yi Yang, Jinghua Liu, Kai Chen, Miaoqian Lin","doi":"arxiv-2409.09380","DOIUrl":"https://doi.org/arxiv-2409.09380","url":null,"abstract":"In this paper, we propose an LLM-empowered RM-API misuse detection solution,\u0000ChatDetector, which fully automates LLMs for documentation understanding which\u0000helps RM-API constraints retrieval and RM-API misuse detection. To correctly\u0000retrieve the RM-API constraints, ChatDetector is inspired by the ReAct\u0000framework which is optimized based on Chain-of-Thought (CoT) to decompose the\u0000complex task into allocation APIs identification, RM-object (allocated/released\u0000by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). It\u0000first verifies the semantics of allocation APIs based on the retrieved RM\u0000sentences from API documentation through LLMs. Inspired by the LLMs'\u0000performance on various prompting methods,ChatDetector adopts a two-dimensional\u0000prompting approach for cross-validation. At the same time, an\u0000inconsistency-checking approach between the LLMs' output and the reasoning\u0000process is adopted for the allocation APIs confirmation with an off-the-shelf\u0000Natural Language Processing (NLP) tool. To accurately pair the RM-APIs,\u0000ChatDetector decomposes the task again and identifies the RM-object type first,\u0000with which it can then accurately pair the releasing APIs and further construct\u0000the RM-API constraints for misuse detection. With the diminished\u0000hallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precision\u0000of 98.21% compared with the state-of-the-art API detectors. By employing a\u0000static detector CodeQL, we ethically report 115 security bugs on the\u0000applications integrating on six popular libraries to the developers, which may\u0000result in severe issues, such as Denial-of-Services (DoS) and memory\u0000corruption. Compared with the end-to-end benchmark method, the result shows\u0000that ChatDetector can retrieve at least 47% more RM sentences and 80.85% more\u0000RM-API constraints.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"212 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harnessing Lightweight Ciphers for PDF Encryption 利用轻量级密码进行 PDF 加密

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-14 DOI: arxiv-2409.09428

Aastha Chauhan, Deepa Verma

Portable Document Format (PDF) is a file format which is used worldwide asde-facto standard for exchanging documents. In fact this document that you arecurrently reading has been uploaded as a PDF. Confidential information is alsoexchanged through PDFs. According to PDF standard ISO 3000-2:2020, PDF supportsencryption to provide confidentiality of the information contained in it alongwith digital signatures to ensure authenticity. At present, PDF encryption onlysupports Advanced Encryption Standard (AES) to encrypt and decrypt information.However, Lightweight Cryptography, which is referred to as crypto for resourceconstrained environments has gained lot of popularity specially due to the NISTLightweight Cryptography (LWC) competition announced in 2018 for which ASCONwas announced as the winner in February 2023. The current work constitutes thefirst attempt to benchmark Java implementations of NIST LWC winner ASCON andfinalist XOODYAK against the current PDF encryption standard AES. Our researchreveals that ASCON emerges as a clear winner with regards to throughput whenprofiled using two state-of-the-art benchmarking tools YourKit and JMH.

便携式文档格式（PDF）是一种文件格式，在全球范围内被用作交换文件的事实标准。事实上，您正在阅读的这份文件就是以 PDF 格式上传的。机密信息也通过 PDF 文件进行交换。根据 PDF 标准 ISO 3000-2:2020，PDF 支持加密以提供其中所含信息的保密性，并支持数字签名以确保真实性。目前，PDF 加密仅支持高级加密标准（AES）来加密和解密信息。然而，轻量级密码学（被称为资源受限环境下的密码学）已经获得了广泛的欢迎，特别是由于 2018 年宣布的 NISTLightweight Cryptography（LWC）竞赛，ASCON 于 2023 年 2 月被宣布为该竞赛的获胜者。目前的工作是首次尝试将 NIST LWC 获奖者 ASCON 和决赛选手 XOODYAK 的 Java 实现与当前的 PDF 加密标准 AES 进行基准测试。我们的研究表明，在使用 YourKit 和 JMH 这两种最先进的基准测试工具进行测试时，ASCON 在吞吐量方面明显胜出。

{"title":"Harnessing Lightweight Ciphers for PDF Encryption","authors":"Aastha Chauhan, Deepa Verma","doi":"arxiv-2409.09428","DOIUrl":"https://doi.org/arxiv-2409.09428","url":null,"abstract":"Portable Document Format (PDF) is a file format which is used worldwide as\u0000de-facto standard for exchanging documents. In fact this document that you are\u0000currently reading has been uploaded as a PDF. Confidential information is also\u0000exchanged through PDFs. According to PDF standard ISO 3000-2:2020, PDF supports\u0000encryption to provide confidentiality of the information contained in it along\u0000with digital signatures to ensure authenticity. At present, PDF encryption only\u0000supports Advanced Encryption Standard (AES) to encrypt and decrypt information.\u0000However, Lightweight Cryptography, which is referred to as crypto for resource\u0000constrained environments has gained lot of popularity specially due to the NIST\u0000Lightweight Cryptography (LWC) competition announced in 2018 for which ASCON\u0000was announced as the winner in February 2023. The current work constitutes the\u0000first attempt to benchmark Java implementations of NIST LWC winner ASCON and\u0000finalist XOODYAK against the current PDF encryption standard AES. Our research\u0000reveals that ASCON emerges as a clear winner with regards to throughput when\u0000profiled using two state-of-the-art benchmarking tools YourKit and JMH.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms 剑鞘：基于舍入的密钥封装机制学习的硬件感知设计选择探索性研究

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-14 DOI: arxiv-2409.09481

Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede

Recently, the construction of cryptographic schemes based on hard latticeproblems has gained immense popularity. Apart from being quantum resistant,lattice-based cryptography allows a wide range of variations in the underlyinghard problem. As cryptographic schemes can work in different environments underdifferent operational constraints such as memory footprint, silicon area,efficiency, power requirement, etc., such variations in the underlying hardproblem are very useful for designers to construct different cryptographicschemes. In this work, we explore various design choices of lattice-based cryptographyand their impact on performance in the real world. In particular, we propose asuite of key-encapsulation mechanisms based on the learning with roundingproblem with a focus on improving different performance aspects oflattice-based cryptography. Our suite consists of three schemes. Our firstscheme is Florete, which is designed for efficiency. The second scheme isEspada, which is aimed at improving parallelization, flexibility, and memoryfootprint. The last scheme is Sable, which can be considered an improvedversion in terms of key sizes and parameters of the Saber key-encapsulationmechanism, one of the finalists in the National Institute of Standards andTechnology's post-quantum standardization procedure. In this work, we havedescribed our design rationale behind each scheme. Further, to demonstrate thejustification of our design decisions, we have provided software and hardwareimplementations. Our results show Florete is faster than most state-of-the-artKEMs on software and hardware platforms. The scheme Espada requires less memoryand area than the implementation of most state-of-the-art schemes. Theimplementations of Sable maintain a trade-off between Florete and Espadaregarding performance and memory requirements on the hardware and softwareplatform.

最近，基于难点格问题的加密方案的构建受到了极大的欢迎。除了具有抗量子性之外，基于网格的密码学还允许在基础硬问题上进行广泛的变化。由于加密方案可以在不同的环境下工作，并受到不同的操作限制，如内存占用、硅面积、效率、功耗要求等，因此底层硬问题的这种变化对于设计人员构建不同的加密方案非常有用。在这项工作中，我们探讨了基于晶格的密码学的各种设计选择及其在现实世界中对性能的影响。特别是，我们提出了一套基于舍入学习问题的密钥封装机制，重点是改善基于网格的密码学的不同性能方面。我们的套件包括三个方案。我们的第一个方案是 Florete，旨在提高效率。第二个方案是 Espada，旨在提高并行性、灵活性和内存足迹。最后一个方案是 Sable，它可以被视为 Saber 密钥封装机制在密钥大小和参数方面的改进版本，Saber 密钥封装机制是美国国家标准与技术研究院后量子标准化程序的最终入围者之一。在这项工作中，我们描述了每种方案背后的设计原理。此外，为了证明我们设计决策的合理性，我们还提供了软件和硬件实现。我们的结果表明，在软件和硬件平台上，Florete 比大多数最先进的 KEM 更快。与大多数最先进方案的实现相比，Espada 方案所需的内存和面积更少。在硬件和软件平台上，Sable 的实现方案在 Florete 和 Espada 之间权衡了性能和内存要求。

{"title":"Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms","authors":"Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede","doi":"arxiv-2409.09481","DOIUrl":"https://doi.org/arxiv-2409.09481","url":null,"abstract":"Recently, the construction of cryptographic schemes based on hard lattice\u0000problems has gained immense popularity. Apart from being quantum resistant,\u0000lattice-based cryptography allows a wide range of variations in the underlying\u0000hard problem. As cryptographic schemes can work in different environments under\u0000different operational constraints such as memory footprint, silicon area,\u0000efficiency, power requirement, etc., such variations in the underlying hard\u0000problem are very useful for designers to construct different cryptographic\u0000schemes. In this work, we explore various design choices of lattice-based cryptography\u0000and their impact on performance in the real world. In particular, we propose a\u0000suite of key-encapsulation mechanisms based on the learning with rounding\u0000problem with a focus on improving different performance aspects of\u0000lattice-based cryptography. Our suite consists of three schemes. Our first\u0000scheme is Florete, which is designed for efficiency. The second scheme is\u0000Espada, which is aimed at improving parallelization, flexibility, and memory\u0000footprint. The last scheme is Sable, which can be considered an improved\u0000version in terms of key sizes and parameters of the Saber key-encapsulation\u0000mechanism, one of the finalists in the National Institute of Standards and\u0000Technology's post-quantum standardization procedure. In this work, we have\u0000described our design rationale behind each scheme. Further, to demonstrate the\u0000justification of our design decisions, we have provided software and hardware\u0000implementations. Our results show Florete is faster than most state-of-the-art\u0000KEMs on software and hardware platforms. The scheme Espada requires less memory\u0000and area than the implementation of most state-of-the-art schemes. The\u0000implementations of Sable maintain a trade-off between Florete and Espada\u0000regarding performance and memory requirements on the hardware and software\u0000platform.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning Under Siege: Identifying Security Vulnerabilities and Risk Mitigation Strategies 被围攻的深度学习：识别安全漏洞和风险缓解策略

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-14 DOI: arxiv-2409.09517

Jamal Al-Karaki, Muhammad Al-Zafar Khan, Mostafa Mohamad, Dababrata Chowdhury

With the rise in the wholesale adoption of Deep Learning (DL) models innearly all aspects of society, a unique set of challenges is imposed. Primarilycentered around the architectures of these models, these risks pose asignificant challenge, and addressing these challenges is key to theirsuccessful implementation and usage in the future. In this research, we presentthe security challenges associated with the current DL models deployed intoproduction, as well as anticipate the challenges of future DL technologiesbased on the advancements in computing, AI, and hardware technologies. Inaddition, we propose risk mitigation techniques to inhibit these challenges andprovide metrical evaluations to measure the effectiveness of these metrics.

随着深度学习（DL）模型在社会各领域的广泛应用，一系列独特的挑战也随之而来。这些风险主要集中在这些模型的架构上，构成了重大挑战，而应对这些挑战是未来成功实施和使用这些模型的关键。在本研究中，我们介绍了与当前已投入生产的数字线路模型相关的安全挑战，并基于计算、人工智能和硬件技术的进步，预测了未来数字线路技术的挑战。此外，我们还提出了抑制这些挑战的风险缓解技术，并提供了衡量这些指标有效性的度量评估。

引用次数: 0

Research on Data Right Confirmation Mechanism of Federated Learning based on Blockchain 基于区块链的联盟学习数据确权机制研究

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.08476

Xiaogang Cheng, Ren Guo

Federated learning can solve the privacy protection problem in distributeddata mining and machine learning, and how to protect the ownership, use andincome rights of all parties involved in federated learning is an importantissue. This paper proposes a federated learning data ownership confirmationmechanism based on blockchain and smart contract, which uses decentralizedblockchain technology to save the contribution of each participant on theblockchain, and distributes the benefits of federated learning results throughthe blockchain. In the local simulation environment of the blockchain, therelevant smart contracts and data structures are simulated and implemented, andthe feasibility of the scheme is preliminarily demonstrated.

联盟学习可以解决分布式数据挖掘和机器学习中的隐私保护问题，如何保护联盟学习中参与各方的所有权、使用权和收益权是一个重要问题。本文提出了一种基于区块链和智能合约的联盟学习数据所有权确认机制，利用去中心化的区块链技术将各参与方的贡献保存在区块链上，并通过区块链分配联盟学习成果的收益。在区块链本地仿真环境中，模拟并实现了相关智能合约和数据结构，初步验证了方案的可行性。

引用次数: 0

1D-CNN-IDS: 1D CNN-based Intrusion Detection System for IIoT 1D-CNN-IDS：基于 1D CNN 的 IIoT 入侵检测系统

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.08529

Muhammad Arslan, Muhammad Mubeen, Muhammad Bilal, Saadullah Farooq Abbasi

The demand of the Internet of Things (IoT) has witnessed exponential growth.These progresses are made possible by the technological advancements inartificial intelligence, cloud computing, and edge computing. However, theseadvancements exhibit multiple challenges, including cyber threats, security andprivacy concerns, and the risk of potential financial losses. For this reason,this study developed a computationally inexpensive one-dimensionalconvolutional neural network (1DCNN) algorithm for cyber-attack classification.The proposed study achieved an accuracy of 99.90% to classify ninecyber-attacks. Multiple other performance metrices have been evaluated tovalidate the efficacy of the proposed scheme. In addition, comparison has beendone with existing state-of-the-art schemes. The findings of the proposed studycan significantly contribute to the development of secure intrusion detectionfor IIoT systems.

人工智能、云计算和边缘计算的技术进步使物联网（IoT）的需求呈指数级增长。然而，这些进步也带来了多重挑战，包括网络威胁、安全和隐私问题以及潜在的经济损失风险。为此，本研究开发了一种计算成本低廉的一维卷积神经网络（1DCNN）算法，用于网络攻击分类。对其他多个性能指标进行了评估，以验证所提方案的有效性。此外，还与现有的最先进方案进行了比较。这项研究的结果将极大地促进物联网系统安全入侵检测的发展。

引用次数: 0

DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification DomURLs_BERT：基于预训练 BERT 的恶意域和 URL 检测与分类模型

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.09143

Abdelkader El Mahdaouy, Salima Lamsiyah, Meryem Janati Idrissi, Hamza Alami, Zakaria Yartaoui, Ismail Berrada

Detecting and classifying suspicious or malicious domain names and URLs isfundamental task in cybersecurity. To leverage such indicators of compromise,cybersecurity vendors and practitioners often maintain and update blacklists ofknown malicious domains and URLs. However, blacklists frequently fail toidentify emerging and obfuscated threats. Over the past few decades, there hasbeen significant interest in developing machine learning models thatautomatically detect malicious domains and URLs, addressing the limitations ofblacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, apre-trained BERT-based encoder adapted for detecting and classifyingsuspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using theMasked Language Modeling (MLM) objective on a large multilingual corpus ofURLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order toassess the performance of DomURLs_BERT, we have conducted experiments onseveral binary and multi-class classification tasks involving domain names andURLs, covering phishing, malware, DGA, and DNS tunneling. The evaluationsresults show that the proposed encoder outperforms state-of-the-artcharacter-based deep learning models and cybersecurity-focused BERT modelsacross multiple tasks and datasets. The pre-training dataset, the pre-trainedDomURLs_BERT encoder, and the experiments source code are publicly available.

检测可疑或恶意域名和 URL 并对其进行分类是网络安全的基本任务。网络安全供应商和从业人员通常会维护和更新已知恶意域名和 URL 的黑名单，以利用这些入侵指标。然而，黑名单经常无法识别新出现的和被混淆的威胁。过去几十年来，人们对开发自动检测恶意域和 URL 的机器学习模型产生了浓厚的兴趣，以解决黑名单维护和更新的局限性。在本文中，我们介绍了 DomURLs_BERT，它是一种经过训练的基于 BERT 的编码器，适用于检测和分类可疑/恶意域名和 URL。DomURLs_BERT 采用掩码语言建模（MLM）目标，在包含 URL、域名和域生成算法（DGA）数据集的大型多语言语料库上进行预训练。为了评估 DomURLs_BERT 的性能，我们在涉及域名和 URL 的多个二类和多类分类任务上进行了实验，其中包括网络钓鱼、恶意软件、DGA 和 DNS 隧道。评估结果表明，所提出的编码器在多个任务和数据集上的表现优于最先进的基于字符的深度学习模型和以网络安全为重点的 BERT 模型。预训练数据集、预训练的DomURLs_BERT编码器和实验源代码均可公开获取。

{"title":"DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification","authors":"Abdelkader El Mahdaouy, Salima Lamsiyah, Meryem Janati Idrissi, Hamza Alami, Zakaria Yartaoui, Ismail Berrada","doi":"arxiv-2409.09143","DOIUrl":"https://doi.org/arxiv-2409.09143","url":null,"abstract":"Detecting and classifying suspicious or malicious domain names and URLs is\u0000fundamental task in cybersecurity. To leverage such indicators of compromise,\u0000cybersecurity vendors and practitioners often maintain and update blacklists of\u0000known malicious domains and URLs. However, blacklists frequently fail to\u0000identify emerging and obfuscated threats. Over the past few decades, there has\u0000been significant interest in developing machine learning models that\u0000automatically detect malicious domains and URLs, addressing the limitations of\u0000blacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, a\u0000pre-trained BERT-based encoder adapted for detecting and classifying\u0000suspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using the\u0000Masked Language Modeling (MLM) objective on a large multilingual corpus of\u0000URLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order to\u0000assess the performance of DomURLs_BERT, we have conducted experiments on\u0000several binary and multi-class classification tasks involving domain names and\u0000URLs, covering phishing, malware, DGA, and DNS tunneling. The evaluations\u0000results show that the proposed encoder outperforms state-of-the-art\u0000character-based deep learning models and cybersecurity-focused BERT models\u0000across multiple tasks and datasets. The pre-training dataset, the pre-trained\u0000DomURLs_BERT encoder, and the experiments source code are publicly available.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incorporation of Verifier Functionality in the Software for Operations and Network Attack Results Review and the Autonomous Penetration Testing System 在操作和网络攻击结果审查软件以及自主渗透测试系统中纳入验证器功能

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.09174

Jordan Milbrath, Jeremy Straub

The software for operations and network attack results review (SONARR) andthe autonomous penetration testing system (APTS) use facts and commonproperties in digital twin networks to represent real-world entities. However,in some cases fact values will change regularly, making it difficult forobjects in SONARR and APTS to consistently and accurately represent theirreal-world counterparts. This paper proposes and evaluates the addition ofverifiers, which check real-world conditions and update network facts, toSONARR. This inclusion allows SONARR to retrieve fact values from its executingenvironment and update its network, providing a consistent method of ensuringthat the operations and, therefore, the results align with the real-worldsystems being assessed. Verifiers allow arbitrary scripts and dynamic argumentsto be added to normal SONARR operations. This provides a layer of flexibilityand consistency that results in more reliable output from the software.

操作和网络攻击结果审查软件（SONARR）和自主渗透测试系统（APTS）使用数字孪生网络中的事实和共同属性来表示现实世界中的实体。然而，在某些情况下，事实值会定期发生变化，这使得 SONARR 和 APTS 中的对象难以一致、准确地代表现实世界中的对应实体。本文提出并评估了在 SONARR 中添加检验现实世界条件和更新网络事实的检验器的建议。这一功能允许 SONARR 从其执行环境中检索事实值并更新其网络，从而提供了一种一致的方法来确保操作以及结果与正在评估的真实世界系统保持一致。验证器允许将任意脚本和动态参数添加到正常的 SONARR 操作中。这提供了一层灵活性和一致性，使软件的输出更加可靠。

引用次数: 0

Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning 应用动作屏蔽和课程学习技术，利用强化学习提高操作技术网络安全的数据效率和整体性能

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.10563

Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead

In previous work, the IPMSRL environment (Integrated Platform ManagementSystem Reinforcement Learning environment) was developed with the aim oftraining defensive RL agents in a simulator representing a subset of an IPMS ona maritime vessel under a cyber-attack. This paper extends the use of IPMSRL toenhance realism including the additional dynamics of false positive alerts andalert delay. Applying curriculum learning, in the most difficult environmenttested, resulted in an episode reward mean increasing from a baseline result of-2.791 to -0.569. Applying action masking, in the most difficult environmenttested, resulted in an episode reward mean increasing from a baseline result of-2.791 to -0.743. Importantly, this level of performance was reached in lessthan 1 million timesteps, which was far more data efficient than vanilla PPOwhich reached a lower level of performance after 2.5 million timesteps. Thetraining method which resulted in the highest level of performance observed inthis paper was a combination of the application of curriculum learning andaction masking, with a mean episode reward of 0.137. This paper also introducesa basic hardcoded defensive agent encoding a representation of cyber securitybest practice, which provides context to the episode reward mean figuresreached by the RL agents. The hardcoded agent managed an episode reward mean of-1.895. This paper therefore shows that applications of curriculum learning andaction masking, both independently and in tandem, present a way to overcome thecomplex real-world dynamics that are present in operational technology cybersecurity threat remediation.

在之前的工作中，我们开发了 IPMSRL 环境（综合平台管理系统强化学习环境），目的是在模拟器中训练防御性 RL 代理，模拟器代表了受到网络攻击的海船上的 IPMS 子集。本文扩展了 IPMSRL 的使用范围，以增强其真实性，包括假警报和警报延迟的额外动态。在最困难的环境测试中，应用课程学习使每集奖励平均值从基线结果的-2.791 降至-0.569。在最困难的测试环境中，应用动作遮蔽技术可使情节奖励平均值从基线结果-2.791 增至-0.743。重要的是，这种性能水平是在不到 100 万个时间步的情况下达到的，这比在 250 万个时间步后达到较低性能水平的 vanilla PPO 要节省数据得多。本文观察到的性能水平最高的训练方法是课程学习和行动掩蔽的组合应用，平均集数奖励为 0.137。本文还引入了一个基本的硬编码防御代理，它对网络安全最佳实践进行了编码，为 RL 代理达到的情节奖励平均值提供了背景。硬编码代理管理的情节奖励平均值为 1.895。因此，本文表明，课程学习和行动掩码的应用，无论是独立应用还是串联应用，都为克服操作技术网络安全威胁补救中存在的复杂现实世界动态提供了一种方法。

{"title":"Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning","authors":"Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead","doi":"arxiv-2409.10563","DOIUrl":"https://doi.org/arxiv-2409.10563","url":null,"abstract":"In previous work, the IPMSRL environment (Integrated Platform Management\u0000System Reinforcement Learning environment) was developed with the aim of\u0000training defensive RL agents in a simulator representing a subset of an IPMS on\u0000a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to\u0000enhance realism including the additional dynamics of false positive alerts and\u0000alert delay. Applying curriculum learning, in the most difficult environment\u0000tested, resulted in an episode reward mean increasing from a baseline result of\u0000-2.791 to -0.569. Applying action masking, in the most difficult environment\u0000tested, resulted in an episode reward mean increasing from a baseline result of\u0000-2.791 to -0.743. Importantly, this level of performance was reached in less\u0000than 1 million timesteps, which was far more data efficient than vanilla PPO\u0000which reached a lower level of performance after 2.5 million timesteps. The\u0000training method which resulted in the highest level of performance observed in\u0000this paper was a combination of the application of curriculum learning and\u0000action masking, with a mean episode reward of 0.137. This paper also introduces\u0000a basic hardcoded defensive agent encoding a representation of cyber security\u0000best practice, which provides context to the episode reward mean figures\u0000reached by the RL agents. The hardcoded agent managed an episode reward mean of\u0000-1.895. This paper therefore shows that applications of curriculum learning and\u0000action masking, both independently and in tandem, present a way to overcome the\u0000complex real-world dynamics that are present in operational technology cyber\u0000security threat remediation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition FP-VEC：通过高效向量加法对大型语言模型进行指纹识别

arXiv - CS - Cryptography and Security

Pub Date : 2024-09-13 DOI: arxiv-2409.08846

Zhenhua Xu, Wenpeng Xing, Zhebo Wang, Chang Hu, Chen Jie, Meng Han

Training Large Language Models (LLMs) requires immense computational powerand vast amounts of data. As a result, protecting the intellectual property ofthese models through fingerprinting is essential for ownership authentication.While adding fingerprints to LLMs through fine-tuning has been attempted, itremains costly and unscalable. In this paper, we introduce FP-VEC, a pilotstudy on using fingerprint vectors as an efficient fingerprinting method forLLMs. Our approach generates a fingerprint vector that represents aconfidential signature embedded in the model, allowing the same fingerprint tobe seamlessly incorporated into an unlimited number of LLMs via vectoraddition. Results on several LLMs show that FP-VEC is lightweight by running onCPU-only devices for fingerprinting, scalable with a single training andunlimited fingerprinting process, and preserves the model's normal behavior.The project page is available at https://fingerprintvector.github.io .

训练大型语言模型（LLM）需要巨大的计算能力和海量数据。因此，通过指纹识别来保护这些模型的知识产权对于所有权认证至关重要。虽然已经有人尝试通过微调来为 LLM 添加指纹，但这仍然成本高昂且不可扩展。在本文中，我们介绍了 FP-VEC，这是一项将指纹向量用作 LLM 高效指纹识别方法的试验性研究。我们的方法生成的指纹矢量代表了嵌入模型的机密签名，通过矢量添加，同一指纹可以无缝地集成到数量不限的 LLM 中。在多个 LLM 上的研究结果表明，FP-VEC 只需在 CPU 设备上运行即可进行指纹识别，具有轻量级特点，只需一次训练和无限次指纹识别过程即可扩展，并能保留模型的正常行为。项目页面：https://fingerprintvector.github.io 。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

arXiv - CS - Cryptography and Security

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀