The Software Supply Chain (SSC) security is a critical concern for both users and developers. Recent incidents, like the SolarWinds Orion compromise, proved the widespread impact resulting from the distribution of compromised software. The reliance on open-source components, which constitute a significant portion of modern software, further exacerbates this risk. To enhance SSC security, the Software Bill of Materials (SBOM) has been promoted as a tool to increase transparency and verifiability in software composition. However, despite its promise, SBOMs are not without limitations. Current SBOM generation tools often suffer from inaccuracies in identifying components and dependencies, leading to the creation of erroneous or incomplete representations of the SSC. Despite existing studies exposing these limitations, their impact on the vulnerability detection capabilities of security tools is still unknown. In this paper, we perform the first security analysis on the vulnerability detection capabilities of tools receiving SBOMs as input. We comprehensively evaluate SBOM generation tools by providing their outputs to vulnerability identification software. Based on our results, we identify the root causes of these tools' ineffectiveness and propose PIP-sbom, a novel pip-inspired solution that addresses their shortcomings. PIP-sbom provides improved accuracy in component identification and dependency resolution. Compared to best-performing state-of-the-art tools, PIP-sbom increases the average precision and recall by 60%, and reduces by ten times the number of false positives.
{"title":"The Impact of SBOM Generators on Vulnerability Assessment in Python: A Comparison and a Novel Approach","authors":"Giacomo Benedetti, Serena Cofano, Alessandro Brighente, Mauro Conti","doi":"arxiv-2409.06390","DOIUrl":"https://doi.org/arxiv-2409.06390","url":null,"abstract":"The Software Supply Chain (SSC) security is a critical concern for both users\u0000and developers. Recent incidents, like the SolarWinds Orion compromise, proved\u0000the widespread impact resulting from the distribution of compromised software.\u0000The reliance on open-source components, which constitute a significant portion\u0000of modern software, further exacerbates this risk. To enhance SSC security, the\u0000Software Bill of Materials (SBOM) has been promoted as a tool to increase\u0000transparency and verifiability in software composition. However, despite its\u0000promise, SBOMs are not without limitations. Current SBOM generation tools often\u0000suffer from inaccuracies in identifying components and dependencies, leading to\u0000the creation of erroneous or incomplete representations of the SSC. Despite\u0000existing studies exposing these limitations, their impact on the vulnerability\u0000detection capabilities of security tools is still unknown. In this paper, we perform the first security analysis on the vulnerability\u0000detection capabilities of tools receiving SBOMs as input. We comprehensively\u0000evaluate SBOM generation tools by providing their outputs to vulnerability\u0000identification software. Based on our results, we identify the root causes of\u0000these tools' ineffectiveness and propose PIP-sbom, a novel pip-inspired\u0000solution that addresses their shortcomings. PIP-sbom provides improved accuracy\u0000in component identification and dependency resolution. Compared to\u0000best-performing state-of-the-art tools, PIP-sbom increases the average\u0000precision and recall by 60%, and reduces by ten times the number of false\u0000positives.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"166 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adrian Brodzik, Tomasz Malec-Kruszyński, Wojciech Niewolski, Mikołaj Tkaczyk, Krzysztof Bocianiak, Sok-Yen Loui
Linux-based cloud environments have become lucrative targets for ransomware attacks, employing various encryption schemes at unprecedented speeds. Addressing the urgency for real-time ransomware protection, we propose leveraging the extended Berkeley Packet Filter (eBPF) to collect system call information regarding active processes and infer about the data directly at the kernel level. In this study, we implement two Machine Learning (ML) models in eBPF - a decision tree and a multilayer perceptron. Benchmarking latency and accuracy against their user space counterparts, our findings underscore the efficacy of this approach.
{"title":"Ransomware Detection Using Machine Learning in the Linux Kernel","authors":"Adrian Brodzik, Tomasz Malec-Kruszyński, Wojciech Niewolski, Mikołaj Tkaczyk, Krzysztof Bocianiak, Sok-Yen Loui","doi":"arxiv-2409.06452","DOIUrl":"https://doi.org/arxiv-2409.06452","url":null,"abstract":"Linux-based cloud environments have become lucrative targets for ransomware\u0000attacks, employing various encryption schemes at unprecedented speeds.\u0000Addressing the urgency for real-time ransomware protection, we propose\u0000leveraging the extended Berkeley Packet Filter (eBPF) to collect system call\u0000information regarding active processes and infer about the data directly at the\u0000kernel level. In this study, we implement two Machine Learning (ML) models in\u0000eBPF - a decision tree and a multilayer perceptron. Benchmarking latency and\u0000accuracy against their user space counterparts, our findings underscore the\u0000efficacy of this approach.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rise of deep learning (DL) has led to a surging demand for training data, which incentivizes the creators of DL models to trawl through the Internet for training materials. Meanwhile, users often have limited control over whether their data (e.g., facial images) are used to train DL models without their consent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data provenance tool that can empower ordinary users to take agency in detecting the unauthorized use of their data in training DL models. We view tracing data provenance through the lens of membership inference (MI). MembershipTracker consists of a lightweight data marking component to mark the target data with small and targeted changes, which can be strongly memorized by the model trained on them; and a specialized MI-based verification process to audit whether the model exhibits strong memorization on the target samples. Overall, MembershipTracker only requires the users to mark a small fraction of data (0.005% to 0.1% in proportion to the training set), and it enables the users to reliably detect the unauthorized use of their data (average 0% FPR@100% TPR). We show that MembershipTracker is highly effective across various settings, including industry-scale training on the full-size ImageNet-1k dataset. We finally evaluate MembershipTracker under multiple classes of countermeasures.
{"title":"Catch Me if You Can: Detecting Unauthorized Data Use in Deep Learning Models","authors":"Zitao Chen, Karthik Pattabiraman","doi":"arxiv-2409.06280","DOIUrl":"https://doi.org/arxiv-2409.06280","url":null,"abstract":"The rise of deep learning (DL) has led to a surging demand for training data,\u0000which incentivizes the creators of DL models to trawl through the Internet for\u0000training materials. Meanwhile, users often have limited control over whether\u0000their data (e.g., facial images) are used to train DL models without their\u0000consent, which has engendered pressing concerns. This work proposes MembershipTracker, a practical data provenance tool that\u0000can empower ordinary users to take agency in detecting the unauthorized use of\u0000their data in training DL models. We view tracing data provenance through the\u0000lens of membership inference (MI). MembershipTracker consists of a lightweight\u0000data marking component to mark the target data with small and targeted changes,\u0000which can be strongly memorized by the model trained on them; and a specialized\u0000MI-based verification process to audit whether the model exhibits strong\u0000memorization on the target samples. Overall, MembershipTracker only requires the users to mark a small fraction\u0000of data (0.005% to 0.1% in proportion to the training set), and it enables the\u0000users to reliably detect the unauthorized use of their data (average 0%\u0000FPR@100% TPR). We show that MembershipTracker is highly effective across\u0000various settings, including industry-scale training on the full-size\u0000ImageNet-1k dataset. We finally evaluate MembershipTracker under multiple\u0000classes of countermeasures.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models, yet its reliability has been recently challenged by recent watermark removal attacks. In this work, we investigate why existing watermark embedding techniques particularly those based on backdooring are vulnerable. Through an information-theoretic analysis, we show that the resilience of watermarking against erasure attacks hinges on the choice of trigger-set samples, where current uses of out-distribution trigger-set are inherently vulnerable to white-box adversaries. Based on this discovery, we propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method. To further minimise the gap to clean models, we analyze the role of logits as watermark information carriers and propose a new approach to better conceal watermark information within the logits. Experiments on real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our method robustly defends against various adversaries with negligible accuracy loss (< 0.1%).
{"title":"On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective","authors":"Aoting Hu, Yanzhi Chen, Renjie Xie, Adrian Weller","doi":"arxiv-2409.06130","DOIUrl":"https://doi.org/arxiv-2409.06130","url":null,"abstract":"Safeguarding the intellectual property of machine learning models has emerged\u0000as a pressing concern in AI security. Model watermarking is a powerful\u0000technique for protecting ownership of machine learning models, yet its\u0000reliability has been recently challenged by recent watermark removal attacks.\u0000In this work, we investigate why existing watermark embedding techniques\u0000particularly those based on backdooring are vulnerable. Through an\u0000information-theoretic analysis, we show that the resilience of watermarking\u0000against erasure attacks hinges on the choice of trigger-set samples, where\u0000current uses of out-distribution trigger-set are inherently vulnerable to\u0000white-box adversaries. Based on this discovery, we propose a novel model\u0000watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the\u0000limitations of existing method. To further minimise the gap to clean models, we\u0000analyze the role of logits as watermark information carriers and propose a new\u0000approach to better conceal watermark information within the logits. Experiments\u0000on real-world datasets including CIFAR-100 and Caltech-101 demonstrate that our\u0000method robustly defends against various adversaries with negligible accuracy\u0000loss (< 0.1%).","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"52 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce the notion of a conditional encryption scheme as an extension of public key encryption. In addition to the standard public key algorithms ($mathsf{KG}$, $mathsf{Enc}$, $mathsf{Dec}$) for key generation, encryption and decryption, a conditional encryption scheme for a binary predicate $P$ adds a new conditional encryption algorithm $mathsf{CEnc}$. The conditional encryption algorithm $c=mathsf{CEnc}_{pk}(c_1,m_2,m_3)$ takes as input the public encryption key $pk$, a ciphertext $c_1 = mathsf{Enc}_{pk}(m_1)$ for an unknown message $m_1$, a control message $m_2$ and a payload message $m_3$ and outputs a conditional ciphertext $c$. Intuitively, if $P(m_1,m_2)=1$ then the conditional ciphertext $c$ should decrypt to the payload message $m_3$. On the other hand if $P(m_1,m_2) = 0$ then the ciphertext should not leak any information about the control message $m_2$ or the payload message $m_3$ even if the attacker already has the secret decryption key $sk$. We formalize the notion of conditional encryption secrecy and provide concretely efficient constructions for a set of predicates relevant to password typo correction. Our practical constructions utilize the Paillier partially homomorphic encryption scheme as well as Shamir Secret Sharing. We prove that our constructions are secure and demonstrate how to use conditional encryption to improve the security of personalized password typo correction systems such as TypTop. We implement a C++ library for our practically efficient conditional encryption schemes and evaluate the performance empirically. We also update the implementation of TypTop to utilize conditional encryption for enhanced security guarantees and evaluate the performance of the updated implementation.
{"title":"Conditional Encryption with Applications to Secure Personalized Password Typo Correction","authors":"Mohammad Hassan Ameri, Jeremiah Blocki","doi":"arxiv-2409.06128","DOIUrl":"https://doi.org/arxiv-2409.06128","url":null,"abstract":"We introduce the notion of a conditional encryption scheme as an extension of\u0000public key encryption. In addition to the standard public key algorithms\u0000($mathsf{KG}$, $mathsf{Enc}$, $mathsf{Dec}$) for key generation, encryption\u0000and decryption, a conditional encryption scheme for a binary predicate $P$ adds\u0000a new conditional encryption algorithm $mathsf{CEnc}$. The conditional\u0000encryption algorithm $c=mathsf{CEnc}_{pk}(c_1,m_2,m_3)$ takes as input the\u0000public encryption key $pk$, a ciphertext $c_1 = mathsf{Enc}_{pk}(m_1)$ for an\u0000unknown message $m_1$, a control message $m_2$ and a payload message $m_3$ and\u0000outputs a conditional ciphertext $c$. Intuitively, if $P(m_1,m_2)=1$ then the\u0000conditional ciphertext $c$ should decrypt to the payload message $m_3$. On the\u0000other hand if $P(m_1,m_2) = 0$ then the ciphertext should not leak any\u0000information about the control message $m_2$ or the payload message $m_3$ even\u0000if the attacker already has the secret decryption key $sk$. We formalize the\u0000notion of conditional encryption secrecy and provide concretely efficient\u0000constructions for a set of predicates relevant to password typo correction. Our\u0000practical constructions utilize the Paillier partially homomorphic encryption\u0000scheme as well as Shamir Secret Sharing. We prove that our constructions are\u0000secure and demonstrate how to use conditional encryption to improve the\u0000security of personalized password typo correction systems such as TypTop. We\u0000implement a C++ library for our practically efficient conditional encryption\u0000schemes and evaluate the performance empirically. We also update the\u0000implementation of TypTop to utilize conditional encryption for enhanced\u0000security guarantees and evaluate the performance of the updated implementation.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As statistical analyses become more central to science, industry and society, there is a growing need to ensure correctness of their results. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? Building on a recent line of work, we study proof-systems that allow a probabilistic verifier to ascertain that the results of an analysis are approximately correct, while drawing fewer samples and using less computational resources than would be needed to replicate the analysis. We focus on distribution testing problems: verifying that an unknown distribution is close to having a claimed property. Our main contribution is a interactive protocol between a verifier and an untrusted prover, which can be used to verify any distribution property that can be decided in polynomial time given a full and explicit description of the distribution. If the distribution is at statistical distance $varepsilon$ from having the property, then the verifier rejects with high probability. This soundness property holds against any polynomial-time strategy that a cheating prover might follow, assuming the existence of collision-resistant hash functions (a standard assumption in cryptography). For distributions over a domain of size $N$, the protocol consists of $4$ messages and the communication complexity and verifier runtime are roughly $widetilde{O}left(sqrt{N} / varepsilon^2 right)$. The verifier's sample complexity is $widetilde{O}left(sqrt{N} / varepsilon^2 right)$, and this is optimal up to $polylog(N)$ factors (for any protocol, regardless of its communication complexity). Even for simple properties, approximately deciding whether an unknown distribution has the property can require quasi-linear sample complexity and running time. For any such property, our protocol provides a quadratic speedup over replicating the analysis.
{"title":"How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions","authors":"Tal Herman, Guy Rothblum","doi":"arxiv-2409.06594","DOIUrl":"https://doi.org/arxiv-2409.06594","url":null,"abstract":"As statistical analyses become more central to science, industry and society,\u0000there is a growing need to ensure correctness of their results. Approximate\u0000correctness can be verified by replicating the entire analysis, but can we\u0000verify without replication? Building on a recent line of work, we study\u0000proof-systems that allow a probabilistic verifier to ascertain that the results\u0000of an analysis are approximately correct, while drawing fewer samples and using\u0000less computational resources than would be needed to replicate the analysis. We\u0000focus on distribution testing problems: verifying that an unknown distribution\u0000is close to having a claimed property. Our main contribution is a interactive protocol between a verifier and an\u0000untrusted prover, which can be used to verify any distribution property that\u0000can be decided in polynomial time given a full and explicit description of the\u0000distribution. If the distribution is at statistical distance $varepsilon$ from\u0000having the property, then the verifier rejects with high probability. This\u0000soundness property holds against any polynomial-time strategy that a cheating\u0000prover might follow, assuming the existence of collision-resistant hash\u0000functions (a standard assumption in cryptography). For distributions over a\u0000domain of size $N$, the protocol consists of $4$ messages and the communication\u0000complexity and verifier runtime are roughly $widetilde{O}left(sqrt{N} /\u0000varepsilon^2 right)$. The verifier's sample complexity is\u0000$widetilde{O}left(sqrt{N} / varepsilon^2 right)$, and this is optimal up\u0000to $polylog(N)$ factors (for any protocol, regardless of its communication\u0000complexity). Even for simple properties, approximately deciding whether an\u0000unknown distribution has the property can require quasi-linear sample\u0000complexity and running time. For any such property, our protocol provides a\u0000quadratic speedup over replicating the analysis.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Non-Fungible Tokens (NFTs) have emerged as a revolutionary method for managing digital assets, providing transparency and secure ownership records on a blockchain. In this paper, we present a theoretical framework for leveraging NFTs to manage UAV (Unmanned Aerial Vehicle) flight data. Our approach focuses on ensuring data integrity, ownership transfer, and secure data sharing among stakeholders. This framework utilizes cryptographic methods, smart contracts, and access control mechanisms to enable a tamper-proof and privacy-preserving management system for UAV flight data.
{"title":"DroneXNFT: An NFT-Driven Framework for Secure Autonomous UAV Operations and Flight Data Management","authors":"Khaoula Hidawi","doi":"arxiv-2409.06507","DOIUrl":"https://doi.org/arxiv-2409.06507","url":null,"abstract":"Non-Fungible Tokens (NFTs) have emerged as a revolutionary method for\u0000managing digital assets, providing transparency and secure ownership records on\u0000a blockchain. In this paper, we present a theoretical framework for leveraging\u0000NFTs to manage UAV (Unmanned Aerial Vehicle) flight data. Our approach focuses\u0000on ensuring data integrity, ownership transfer, and secure data sharing among\u0000stakeholders. This framework utilizes cryptographic methods, smart contracts,\u0000and access control mechanisms to enable a tamper-proof and privacy-preserving\u0000management system for UAV flight data.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang
Open source software (OSS) is integral to modern product development, and any vulnerability within it potentially compromises numerous products. While developers strive to apply security patches, pinpointing these patches among extensive OSS updates remains a challenge. Security patch localization (SPL) recommendation methods are leading approaches to address this. However, existing SPL models often falter when a commit lacks a clear association with its corresponding CVE, and do not consider a scenario that a vulnerability has multiple patches proposed over time before it has been fully resolved. To address these challenges, we introduce LLM-SPL, a recommendation-based SPL approach that leverages the capabilities of the Large Language Model (LLM) to locate the security patch commit for a given CVE. More specifically, we propose a joint learning framework, in which the outputs of LLM serves as additional features to aid our recommendation model in prioritizing security patches. Our evaluation on a dataset of 1,915 CVEs associated with 2,461 patches demonstrates that LLM-SPL excels in ranking patch commits, surpassing the state-of-the-art method in terms of Recall, while significantly reducing manual effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL significantly improves Recall by 22.83%, NDCG by 19.41%, and reduces manual effort by over 25% when checking up to the top 10 rankings. The dataset and source code are available at url{https://anonymous.4open.science/r/LLM-SPL-91F8}.
{"title":"LLM-Enhanced Software Patch Localization","authors":"Jinhong Yu, Yi Chen, Di Tang, Xiaozhong Liu, XiaoFeng Wang, Chen Wu, Haixu Tang","doi":"arxiv-2409.06816","DOIUrl":"https://doi.org/arxiv-2409.06816","url":null,"abstract":"Open source software (OSS) is integral to modern product development, and any\u0000vulnerability within it potentially compromises numerous products. While\u0000developers strive to apply security patches, pinpointing these patches among\u0000extensive OSS updates remains a challenge. Security patch localization (SPL)\u0000recommendation methods are leading approaches to address this. However,\u0000existing SPL models often falter when a commit lacks a clear association with\u0000its corresponding CVE, and do not consider a scenario that a vulnerability has\u0000multiple patches proposed over time before it has been fully resolved. To\u0000address these challenges, we introduce LLM-SPL, a recommendation-based SPL\u0000approach that leverages the capabilities of the Large Language Model (LLM) to\u0000locate the security patch commit for a given CVE. More specifically, we propose\u0000a joint learning framework, in which the outputs of LLM serves as additional\u0000features to aid our recommendation model in prioritizing security patches. Our\u0000evaluation on a dataset of 1,915 CVEs associated with 2,461 patches\u0000demonstrates that LLM-SPL excels in ranking patch commits, surpassing the\u0000state-of-the-art method in terms of Recall, while significantly reducing manual\u0000effort. Notably, for vulnerabilities requiring multiple patches, LLM-SPL\u0000significantly improves Recall by 22.83%, NDCG by 19.41%, and reduces manual\u0000effort by over 25% when checking up to the top 10 rankings. The dataset and\u0000source code are available at\u0000url{https://anonymous.4open.science/r/LLM-SPL-91F8}.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The correct adoption of cryptography APIs is challenging for mainstream developers, often resulting in widespread API misuse. Meanwhile, cryptography misuse detectors have demonstrated inconsistent performance and remain largely inaccessible to most developers. We investigated the extent to which ChatGPT can detect cryptography misuses and compared its performance with that of the state-of-the-art static analysis tools. Our investigation, mainly based on the CryptoAPI-Bench benchmark, demonstrated that ChatGPT is effective in identifying cryptography API misuses, and with the use of prompt engineering, it can even outperform leading static cryptography misuse detectors.
对于主流开发者来说,正确采用密码学 API 是一项挑战,往往会导致广泛的 API 滥用。与此同时,密码学滥用检测器的性能并不稳定,大多数开发人员仍然无法使用。我们研究了 ChatGPT 能在多大程度上检测到密码学滥用,并将其性能与最先进的静态分析工具进行了比较。我们的调查主要基于 CryptoAPI-Bench 基准,结果表明 ChatGPT 在识别密码学 API 滥用方面非常有效,如果使用提示工程,它的性能甚至可以超过领先的静态密码学滥用检测器。
{"title":"ChatGPT's Potential in Cryptography Misuse Detection: A Comparative Analysis with Static Analysis Tools","authors":"Ehsan Firouzi, Mohammad Ghafari, Mike Ebrahimi","doi":"arxiv-2409.06561","DOIUrl":"https://doi.org/arxiv-2409.06561","url":null,"abstract":"The correct adoption of cryptography APIs is challenging for mainstream\u0000developers, often resulting in widespread API misuse. Meanwhile, cryptography\u0000misuse detectors have demonstrated inconsistent performance and remain largely\u0000inaccessible to most developers. We investigated the extent to which ChatGPT\u0000can detect cryptography misuses and compared its performance with that of the\u0000state-of-the-art static analysis tools. Our investigation, mainly based on the\u0000CryptoAPI-Bench benchmark, demonstrated that ChatGPT is effective in\u0000identifying cryptography API misuses, and with the use of prompt engineering,\u0000it can even outperform leading static cryptography misuse detectors.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Android apps collecting data from users must comply with legal frameworks to ensure data protection. This requirement has become even more important since the implementation of the General Data Protection Regulation (GDPR) by the European Union in 2018. Moreover, with the proposed Cyber Resilience Act on the horizon, stakeholders will soon need to assess software against even more stringent security and privacy standards. Effective privacy assessments require collaboration among groups with diverse expertise to function effectively as a cohesive unit. This paper motivates the need for an automated approach that enhances understanding of data protection in Android apps and improves communication between the various parties involved in privacy assessments. We propose the Assessor View, a tool designed to bridge the knowledge gap between these parties, facilitating more effective privacy assessments of Android applications.
{"title":"Advancing Android Privacy Assessments with Automation","authors":"Mugdha Khedkar, Michael Schlichtig, Eric Bodden","doi":"arxiv-2409.06564","DOIUrl":"https://doi.org/arxiv-2409.06564","url":null,"abstract":"Android apps collecting data from users must comply with legal frameworks to\u0000ensure data protection. This requirement has become even more important since\u0000the implementation of the General Data Protection Regulation (GDPR) by the\u0000European Union in 2018. Moreover, with the proposed Cyber Resilience Act on the\u0000horizon, stakeholders will soon need to assess software against even more\u0000stringent security and privacy standards. Effective privacy assessments require\u0000collaboration among groups with diverse expertise to function effectively as a\u0000cohesive unit. This paper motivates the need for an automated approach that enhances\u0000understanding of data protection in Android apps and improves communication\u0000between the various parties involved in privacy assessments. We propose the\u0000Assessor View, a tool designed to bridge the knowledge gap between these\u0000parties, facilitating more effective privacy assessments of Android\u0000applications.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142201540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}