In this paper, we propose an LLM-empowered RM-API misuse detection solution, ChatDetector, which fully automates LLMs for documentation understanding which helps RM-API constraints retrieval and RM-API misuse detection. To correctly retrieve the RM-API constraints, ChatDetector is inspired by the ReAct framework which is optimized based on Chain-of-Thought (CoT) to decompose the complex task into allocation APIs identification, RM-object (allocated/released by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). It first verifies the semantics of allocation APIs based on the retrieved RM sentences from API documentation through LLMs. Inspired by the LLMs' performance on various prompting methods,ChatDetector adopts a two-dimensional prompting approach for cross-validation. At the same time, an inconsistency-checking approach between the LLMs' output and the reasoning process is adopted for the allocation APIs confirmation with an off-the-shelf Natural Language Processing (NLP) tool. To accurately pair the RM-APIs, ChatDetector decomposes the task again and identifies the RM-object type first, with which it can then accurately pair the releasing APIs and further construct the RM-API constraints for misuse detection. With the diminished hallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precision of 98.21% compared with the state-of-the-art API detectors. By employing a static detector CodeQL, we ethically report 115 security bugs on the applications integrating on six popular libraries to the developers, which may result in severe issues, such as Denial-of-Services (DoS) and memory corruption. Compared with the end-to-end benchmark method, the result shows that ChatDetector can retrieve at least 47% more RM sentences and 80.85% more RM-API constraints.
{"title":"The Midas Touch: Triggering the Capability of LLMs for RM-API Misuse Detection","authors":"Yi Yang, Jinghua Liu, Kai Chen, Miaoqian Lin","doi":"arxiv-2409.09380","DOIUrl":"https://doi.org/arxiv-2409.09380","url":null,"abstract":"In this paper, we propose an LLM-empowered RM-API misuse detection solution,\u0000ChatDetector, which fully automates LLMs for documentation understanding which\u0000helps RM-API constraints retrieval and RM-API misuse detection. To correctly\u0000retrieve the RM-API constraints, ChatDetector is inspired by the ReAct\u0000framework which is optimized based on Chain-of-Thought (CoT) to decompose the\u0000complex task into allocation APIs identification, RM-object (allocated/released\u0000by RM APIs) extraction and RM-APIs pairing (RM APIs usually exist in pairs). It\u0000first verifies the semantics of allocation APIs based on the retrieved RM\u0000sentences from API documentation through LLMs. Inspired by the LLMs'\u0000performance on various prompting methods,ChatDetector adopts a two-dimensional\u0000prompting approach for cross-validation. At the same time, an\u0000inconsistency-checking approach between the LLMs' output and the reasoning\u0000process is adopted for the allocation APIs confirmation with an off-the-shelf\u0000Natural Language Processing (NLP) tool. To accurately pair the RM-APIs,\u0000ChatDetector decomposes the task again and identifies the RM-object type first,\u0000with which it can then accurately pair the releasing APIs and further construct\u0000the RM-API constraints for misuse detection. With the diminished\u0000hallucinations, ChatDetector identifies 165 pairs of RM-APIs with a precision\u0000of 98.21% compared with the state-of-the-art API detectors. By employing a\u0000static detector CodeQL, we ethically report 115 security bugs on the\u0000applications integrating on six popular libraries to the developers, which may\u0000result in severe issues, such as Denial-of-Services (DoS) and memory\u0000corruption. Compared with the end-to-end benchmark method, the result shows\u0000that ChatDetector can retrieve at least 47% more RM sentences and 80.85% more\u0000RM-API constraints.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"212 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Portable Document Format (PDF) is a file format which is used worldwide as de-facto standard for exchanging documents. In fact this document that you are currently reading has been uploaded as a PDF. Confidential information is also exchanged through PDFs. According to PDF standard ISO 3000-2:2020, PDF supports encryption to provide confidentiality of the information contained in it along with digital signatures to ensure authenticity. At present, PDF encryption only supports Advanced Encryption Standard (AES) to encrypt and decrypt information. However, Lightweight Cryptography, which is referred to as crypto for resource constrained environments has gained lot of popularity specially due to the NIST Lightweight Cryptography (LWC) competition announced in 2018 for which ASCON was announced as the winner in February 2023. The current work constitutes the first attempt to benchmark Java implementations of NIST LWC winner ASCON and finalist XOODYAK against the current PDF encryption standard AES. Our research reveals that ASCON emerges as a clear winner with regards to throughput when profiled using two state-of-the-art benchmarking tools YourKit and JMH.
便携式文档格式(PDF)是一种文件格式,在全球范围内被用作交换文件的事实标准。事实上,您正在阅读的这份文件就是以 PDF 格式上传的。机密信息也通过 PDF 文件进行交换。根据 PDF 标准 ISO 3000-2:2020,PDF 支持加密以提供其中所含信息的保密性,并支持数字签名以确保真实性。目前,PDF 加密仅支持高级加密标准(AES)来加密和解密信息。然而,轻量级密码学(被称为资源受限环境下的密码学)已经获得了广泛的欢迎,特别是由于 2018 年宣布的 NISTLightweight Cryptography(LWC)竞赛,ASCON 于 2023 年 2 月被宣布为该竞赛的获胜者。目前的工作是首次尝试将 NIST LWC 获奖者 ASCON 和决赛选手 XOODYAK 的 Java 实现与当前的 PDF 加密标准 AES 进行基准测试。我们的研究表明,在使用 YourKit 和 JMH 这两种最先进的基准测试工具进行测试时,ASCON 在吞吐量方面明显胜出。
{"title":"Harnessing Lightweight Ciphers for PDF Encryption","authors":"Aastha Chauhan, Deepa Verma","doi":"arxiv-2409.09428","DOIUrl":"https://doi.org/arxiv-2409.09428","url":null,"abstract":"Portable Document Format (PDF) is a file format which is used worldwide as\u0000de-facto standard for exchanging documents. In fact this document that you are\u0000currently reading has been uploaded as a PDF. Confidential information is also\u0000exchanged through PDFs. According to PDF standard ISO 3000-2:2020, PDF supports\u0000encryption to provide confidentiality of the information contained in it along\u0000with digital signatures to ensure authenticity. At present, PDF encryption only\u0000supports Advanced Encryption Standard (AES) to encrypt and decrypt information.\u0000However, Lightweight Cryptography, which is referred to as crypto for resource\u0000constrained environments has gained lot of popularity specially due to the NIST\u0000Lightweight Cryptography (LWC) competition announced in 2018 for which ASCON\u0000was announced as the winner in February 2023. The current work constitutes the\u0000first attempt to benchmark Java implementations of NIST LWC winner ASCON and\u0000finalist XOODYAK against the current PDF encryption standard AES. Our research\u0000reveals that ASCON emerges as a clear winner with regards to throughput when\u0000profiled using two state-of-the-art benchmarking tools YourKit and JMH.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers to construct different cryptographic schemes. In this work, we explore various design choices of lattice-based cryptography and their impact on performance in the real world. In particular, we propose a suite of key-encapsulation mechanisms based on the learning with rounding problem with a focus on improving different performance aspects of lattice-based cryptography. Our suite consists of three schemes. Our first scheme is Florete, which is designed for efficiency. The second scheme is Espada, which is aimed at improving parallelization, flexibility, and memory footprint. The last scheme is Sable, which can be considered an improved version in terms of key sizes and parameters of the Saber key-encapsulation mechanism, one of the finalists in the National Institute of Standards and Technology's post-quantum standardization procedure. In this work, we have described our design rationale behind each scheme. Further, to demonstrate the justification of our design decisions, we have provided software and hardware implementations. Our results show Florete is faster than most state-of-the-art KEMs on software and hardware platforms. The scheme Espada requires less memory and area than the implementation of most state-of-the-art schemes. The implementations of Sable maintain a trade-off between Florete and Espada regarding performance and memory requirements on the hardware and software platform.
{"title":"Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms","authors":"Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede","doi":"arxiv-2409.09481","DOIUrl":"https://doi.org/arxiv-2409.09481","url":null,"abstract":"Recently, the construction of cryptographic schemes based on hard lattice\u0000problems has gained immense popularity. Apart from being quantum resistant,\u0000lattice-based cryptography allows a wide range of variations in the underlying\u0000hard problem. As cryptographic schemes can work in different environments under\u0000different operational constraints such as memory footprint, silicon area,\u0000efficiency, power requirement, etc., such variations in the underlying hard\u0000problem are very useful for designers to construct different cryptographic\u0000schemes. In this work, we explore various design choices of lattice-based cryptography\u0000and their impact on performance in the real world. In particular, we propose a\u0000suite of key-encapsulation mechanisms based on the learning with rounding\u0000problem with a focus on improving different performance aspects of\u0000lattice-based cryptography. Our suite consists of three schemes. Our first\u0000scheme is Florete, which is designed for efficiency. The second scheme is\u0000Espada, which is aimed at improving parallelization, flexibility, and memory\u0000footprint. The last scheme is Sable, which can be considered an improved\u0000version in terms of key sizes and parameters of the Saber key-encapsulation\u0000mechanism, one of the finalists in the National Institute of Standards and\u0000Technology's post-quantum standardization procedure. In this work, we have\u0000described our design rationale behind each scheme. Further, to demonstrate the\u0000justification of our design decisions, we have provided software and hardware\u0000implementations. Our results show Florete is faster than most state-of-the-art\u0000KEMs on software and hardware platforms. The scheme Espada requires less memory\u0000and area than the implementation of most state-of-the-art schemes. The\u0000implementations of Sable maintain a trade-off between Florete and Espada\u0000regarding performance and memory requirements on the hardware and software\u0000platform.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jamal Al-Karaki, Muhammad Al-Zafar Khan, Mostafa Mohamad, Dababrata Chowdhury
With the rise in the wholesale adoption of Deep Learning (DL) models in nearly all aspects of society, a unique set of challenges is imposed. Primarily centered around the architectures of these models, these risks pose a significant challenge, and addressing these challenges is key to their successful implementation and usage in the future. In this research, we present the security challenges associated with the current DL models deployed into production, as well as anticipate the challenges of future DL technologies based on the advancements in computing, AI, and hardware technologies. In addition, we propose risk mitigation techniques to inhibit these challenges and provide metrical evaluations to measure the effectiveness of these metrics.
{"title":"Deep Learning Under Siege: Identifying Security Vulnerabilities and Risk Mitigation Strategies","authors":"Jamal Al-Karaki, Muhammad Al-Zafar Khan, Mostafa Mohamad, Dababrata Chowdhury","doi":"arxiv-2409.09517","DOIUrl":"https://doi.org/arxiv-2409.09517","url":null,"abstract":"With the rise in the wholesale adoption of Deep Learning (DL) models in\u0000nearly all aspects of society, a unique set of challenges is imposed. Primarily\u0000centered around the architectures of these models, these risks pose a\u0000significant challenge, and addressing these challenges is key to their\u0000successful implementation and usage in the future. In this research, we present\u0000the security challenges associated with the current DL models deployed into\u0000production, as well as anticipate the challenges of future DL technologies\u0000based on the advancements in computing, AI, and hardware technologies. In\u0000addition, we propose risk mitigation techniques to inhibit these challenges and\u0000provide metrical evaluations to measure the effectiveness of these metrics.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"68 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated learning can solve the privacy protection problem in distributed data mining and machine learning, and how to protect the ownership, use and income rights of all parties involved in federated learning is an important issue. This paper proposes a federated learning data ownership confirmation mechanism based on blockchain and smart contract, which uses decentralized blockchain technology to save the contribution of each participant on the blockchain, and distributes the benefits of federated learning results through the blockchain. In the local simulation environment of the blockchain, the relevant smart contracts and data structures are simulated and implemented, and the feasibility of the scheme is preliminarily demonstrated.
{"title":"Research on Data Right Confirmation Mechanism of Federated Learning based on Blockchain","authors":"Xiaogang Cheng, Ren Guo","doi":"arxiv-2409.08476","DOIUrl":"https://doi.org/arxiv-2409.08476","url":null,"abstract":"Federated learning can solve the privacy protection problem in distributed\u0000data mining and machine learning, and how to protect the ownership, use and\u0000income rights of all parties involved in federated learning is an important\u0000issue. This paper proposes a federated learning data ownership confirmation\u0000mechanism based on blockchain and smart contract, which uses decentralized\u0000blockchain technology to save the contribution of each participant on the\u0000blockchain, and distributes the benefits of federated learning results through\u0000the blockchain. In the local simulation environment of the blockchain, the\u0000relevant smart contracts and data structures are simulated and implemented, and\u0000the feasibility of the scheme is preliminarily demonstrated.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Arslan, Muhammad Mubeen, Muhammad Bilal, Saadullah Farooq Abbasi
The demand of the Internet of Things (IoT) has witnessed exponential growth. These progresses are made possible by the technological advancements in artificial intelligence, cloud computing, and edge computing. However, these advancements exhibit multiple challenges, including cyber threats, security and privacy concerns, and the risk of potential financial losses. For this reason, this study developed a computationally inexpensive one-dimensional convolutional neural network (1DCNN) algorithm for cyber-attack classification. The proposed study achieved an accuracy of 99.90% to classify nine cyber-attacks. Multiple other performance metrices have been evaluated to validate the efficacy of the proposed scheme. In addition, comparison has been done with existing state-of-the-art schemes. The findings of the proposed study can significantly contribute to the development of secure intrusion detection for IIoT systems.
{"title":"1D-CNN-IDS: 1D CNN-based Intrusion Detection System for IIoT","authors":"Muhammad Arslan, Muhammad Mubeen, Muhammad Bilal, Saadullah Farooq Abbasi","doi":"arxiv-2409.08529","DOIUrl":"https://doi.org/arxiv-2409.08529","url":null,"abstract":"The demand of the Internet of Things (IoT) has witnessed exponential growth.\u0000These progresses are made possible by the technological advancements in\u0000artificial intelligence, cloud computing, and edge computing. However, these\u0000advancements exhibit multiple challenges, including cyber threats, security and\u0000privacy concerns, and the risk of potential financial losses. For this reason,\u0000this study developed a computationally inexpensive one-dimensional\u0000convolutional neural network (1DCNN) algorithm for cyber-attack classification.\u0000The proposed study achieved an accuracy of 99.90% to classify nine\u0000cyber-attacks. Multiple other performance metrices have been evaluated to\u0000validate the efficacy of the proposed scheme. In addition, comparison has been\u0000done with existing state-of-the-art schemes. The findings of the proposed study\u0000can significantly contribute to the development of secure intrusion detection\u0000for IIoT systems.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abdelkader El Mahdaouy, Salima Lamsiyah, Meryem Janati Idrissi, Hamza Alami, Zakaria Yartaoui, Ismail Berrada
Detecting and classifying suspicious or malicious domain names and URLs is fundamental task in cybersecurity. To leverage such indicators of compromise, cybersecurity vendors and practitioners often maintain and update blacklists of known malicious domains and URLs. However, blacklists frequently fail to identify emerging and obfuscated threats. Over the past few decades, there has been significant interest in developing machine learning models that automatically detect malicious domains and URLs, addressing the limitations of blacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, a pre-trained BERT-based encoder adapted for detecting and classifying suspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using the Masked Language Modeling (MLM) objective on a large multilingual corpus of URLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order to assess the performance of DomURLs_BERT, we have conducted experiments on several binary and multi-class classification tasks involving domain names and URLs, covering phishing, malware, DGA, and DNS tunneling. The evaluations results show that the proposed encoder outperforms state-of-the-art character-based deep learning models and cybersecurity-focused BERT models across multiple tasks and datasets. The pre-training dataset, the pre-trained DomURLs_BERT encoder, and the experiments source code are publicly available.
{"title":"DomURLs_BERT: Pre-trained BERT-based Model for Malicious Domains and URLs Detection and Classification","authors":"Abdelkader El Mahdaouy, Salima Lamsiyah, Meryem Janati Idrissi, Hamza Alami, Zakaria Yartaoui, Ismail Berrada","doi":"arxiv-2409.09143","DOIUrl":"https://doi.org/arxiv-2409.09143","url":null,"abstract":"Detecting and classifying suspicious or malicious domain names and URLs is\u0000fundamental task in cybersecurity. To leverage such indicators of compromise,\u0000cybersecurity vendors and practitioners often maintain and update blacklists of\u0000known malicious domains and URLs. However, blacklists frequently fail to\u0000identify emerging and obfuscated threats. Over the past few decades, there has\u0000been significant interest in developing machine learning models that\u0000automatically detect malicious domains and URLs, addressing the limitations of\u0000blacklists maintenance and updates. In this paper, we introduce DomURLs_BERT, a\u0000pre-trained BERT-based encoder adapted for detecting and classifying\u0000suspicious/malicious domains and URLs. DomURLs_BERT is pre-trained using the\u0000Masked Language Modeling (MLM) objective on a large multilingual corpus of\u0000URLs, domain names, and Domain Generation Algorithms (DGA) dataset. In order to\u0000assess the performance of DomURLs_BERT, we have conducted experiments on\u0000several binary and multi-class classification tasks involving domain names and\u0000URLs, covering phishing, malware, DGA, and DNS tunneling. The evaluations\u0000results show that the proposed encoder outperforms state-of-the-art\u0000character-based deep learning models and cybersecurity-focused BERT models\u0000across multiple tasks and datasets. The pre-training dataset, the pre-trained\u0000DomURLs_BERT encoder, and the experiments source code are publicly available.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The software for operations and network attack results review (SONARR) and the autonomous penetration testing system (APTS) use facts and common properties in digital twin networks to represent real-world entities. However, in some cases fact values will change regularly, making it difficult for objects in SONARR and APTS to consistently and accurately represent their real-world counterparts. This paper proposes and evaluates the addition of verifiers, which check real-world conditions and update network facts, to SONARR. This inclusion allows SONARR to retrieve fact values from its executing environment and update its network, providing a consistent method of ensuring that the operations and, therefore, the results align with the real-world systems being assessed. Verifiers allow arbitrary scripts and dynamic arguments to be added to normal SONARR operations. This provides a layer of flexibility and consistency that results in more reliable output from the software.
{"title":"Incorporation of Verifier Functionality in the Software for Operations and Network Attack Results Review and the Autonomous Penetration Testing System","authors":"Jordan Milbrath, Jeremy Straub","doi":"arxiv-2409.09174","DOIUrl":"https://doi.org/arxiv-2409.09174","url":null,"abstract":"The software for operations and network attack results review (SONARR) and\u0000the autonomous penetration testing system (APTS) use facts and common\u0000properties in digital twin networks to represent real-world entities. However,\u0000in some cases fact values will change regularly, making it difficult for\u0000objects in SONARR and APTS to consistently and accurately represent their\u0000real-world counterparts. This paper proposes and evaluates the addition of\u0000verifiers, which check real-world conditions and update network facts, to\u0000SONARR. This inclusion allows SONARR to retrieve fact values from its executing\u0000environment and update its network, providing a consistent method of ensuring\u0000that the operations and, therefore, the results align with the real-world\u0000systems being assessed. Verifiers allow arbitrary scripts and dynamic arguments\u0000to be added to normal SONARR operations. This provides a layer of flexibility\u0000and consistency that results in more reliable output from the software.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"47 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead
In previous work, the IPMSRL environment (Integrated Platform Management System Reinforcement Learning environment) was developed with the aim of training defensive RL agents in a simulator representing a subset of an IPMS on a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to enhance realism including the additional dynamics of false positive alerts and alert delay. Applying curriculum learning, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.569. Applying action masking, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.743. Importantly, this level of performance was reached in less than 1 million timesteps, which was far more data efficient than vanilla PPO which reached a lower level of performance after 2.5 million timesteps. The training method which resulted in the highest level of performance observed in this paper was a combination of the application of curriculum learning and action masking, with a mean episode reward of 0.137. This paper also introduces a basic hardcoded defensive agent encoding a representation of cyber security best practice, which provides context to the episode reward mean figures reached by the RL agents. The hardcoded agent managed an episode reward mean of -1.895. This paper therefore shows that applications of curriculum learning and action masking, both independently and in tandem, present a way to overcome the complex real-world dynamics that are present in operational technology cyber security threat remediation.
{"title":"Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning","authors":"Alec Wilson, William Holmes, Ryan Menzies, Kez Smithson Whitehead","doi":"arxiv-2409.10563","DOIUrl":"https://doi.org/arxiv-2409.10563","url":null,"abstract":"In previous work, the IPMSRL environment (Integrated Platform Management\u0000System Reinforcement Learning environment) was developed with the aim of\u0000training defensive RL agents in a simulator representing a subset of an IPMS on\u0000a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to\u0000enhance realism including the additional dynamics of false positive alerts and\u0000alert delay. Applying curriculum learning, in the most difficult environment\u0000tested, resulted in an episode reward mean increasing from a baseline result of\u0000-2.791 to -0.569. Applying action masking, in the most difficult environment\u0000tested, resulted in an episode reward mean increasing from a baseline result of\u0000-2.791 to -0.743. Importantly, this level of performance was reached in less\u0000than 1 million timesteps, which was far more data efficient than vanilla PPO\u0000which reached a lower level of performance after 2.5 million timesteps. The\u0000training method which resulted in the highest level of performance observed in\u0000this paper was a combination of the application of curriculum learning and\u0000action masking, with a mean episode reward of 0.137. This paper also introduces\u0000a basic hardcoded defensive agent encoding a representation of cyber security\u0000best practice, which provides context to the episode reward mean figures\u0000reached by the RL agents. The hardcoded agent managed an episode reward mean of\u0000-1.895. This paper therefore shows that applications of curriculum learning and\u0000action masking, both independently and in tandem, present a way to overcome the\u0000complex real-world dynamics that are present in operational technology cyber\u0000security threat remediation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Training Large Language Models (LLMs) requires immense computational power and vast amounts of data. As a result, protecting the intellectual property of these models through fingerprinting is essential for ownership authentication. While adding fingerprints to LLMs through fine-tuning has been attempted, it remains costly and unscalable. In this paper, we introduce FP-VEC, a pilot study on using fingerprint vectors as an efficient fingerprinting method for LLMs. Our approach generates a fingerprint vector that represents a confidential signature embedded in the model, allowing the same fingerprint to be seamlessly incorporated into an unlimited number of LLMs via vector addition. Results on several LLMs show that FP-VEC is lightweight by running on CPU-only devices for fingerprinting, scalable with a single training and unlimited fingerprinting process, and preserves the model's normal behavior. The project page is available at https://fingerprintvector.github.io .
{"title":"FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition","authors":"Zhenhua Xu, Wenpeng Xing, Zhebo Wang, Chang Hu, Chen Jie, Meng Han","doi":"arxiv-2409.08846","DOIUrl":"https://doi.org/arxiv-2409.08846","url":null,"abstract":"Training Large Language Models (LLMs) requires immense computational power\u0000and vast amounts of data. As a result, protecting the intellectual property of\u0000these models through fingerprinting is essential for ownership authentication.\u0000While adding fingerprints to LLMs through fine-tuning has been attempted, it\u0000remains costly and unscalable. In this paper, we introduce FP-VEC, a pilot\u0000study on using fingerprint vectors as an efficient fingerprinting method for\u0000LLMs. Our approach generates a fingerprint vector that represents a\u0000confidential signature embedded in the model, allowing the same fingerprint to\u0000be seamlessly incorporated into an unlimited number of LLMs via vector\u0000addition. Results on several LLMs show that FP-VEC is lightweight by running on\u0000CPU-only devices for fingerprinting, scalable with a single training and\u0000unlimited fingerprinting process, and preserves the model's normal behavior.\u0000The project page is available at https://fingerprintvector.github.io .","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"88 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142261585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}