首页 > 最新文献

ArXiv最新文献

英文 中文
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence 生成式人工智能时代大型语言模型基准的不足之处
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09880
Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, M. Halgamuge
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.
具有新兴功能的大型语言模型(LLM)迅速普及,激发了公众对评估和比较不同 LLM 的好奇心,导致许多研究人员提出了自己的 LLM 基准。我们注意到了这些基准的初步不足,于是开始了一项研究,在功能性和安全性的支柱下,使用我们新颖的统一评估框架,从人员、流程和技术的角度,对 23 个最先进的 LLM 基准进行了严格评估。我们的研究发现了重大的局限性,包括偏差、难以衡量真正的推理、适应性、实施不一致、及时工程的复杂性、评估者的多样性,以及在一次全面评估中忽视文化和意识形态规范。我们的讨论强调,鉴于人工智能(AI)的进步,迫切需要标准化方法、监管确定性和道德准则,包括倡导从静态基准发展到动态行为分析,以准确捕捉法学硕士的复杂行为和潜在风险。我们的研究强调,有必要转变本地语言学习者评估方法的范式,强调合作努力对于制定普遍接受的基准和促进人工智能系统融入社会的重要性。
{"title":"Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence","authors":"Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, M. Halgamuge","doi":"10.48550/arXiv.2402.09880","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09880","url":null,"abstract":"The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Why are Sensitive Functions Hard for Transformers? 为什么变压器难以实现敏感功能?
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09963
Michael Hahn, Mark Rofin
Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.
实证研究发现了变换器的一系列可学习性偏差和局限性,例如在学习计算简单的形式语言(如 PARITY)时始终存在困难,而且偏向于低度函数。然而,理论上的理解仍然有限,现有的表现力理论要么过高预测了现实的学习能力,要么过低预测了现实的学习能力。我们证明,在变换器架构下,损失情况受到输入空间敏感性的限制:变压器的输出对输入字符串的许多部分都很敏感,因此会居住在参数空间的孤立点上,从而导致泛化过程中的低灵敏度偏差。我们从理论和实证角度证明,这一理论统一了关于变换器学习能力和偏差的大量实证观察结果,例如它们的泛化偏向于低灵敏度和低度,以及 PARITY 的长度泛化困难。这表明,要理解变换器的归纳偏差,不仅需要研究它们的原理表达能力,还需要研究它们的损失景观。
{"title":"Why are Sensitive Functions Hard for Transformers?","authors":"Michael Hahn, Mark Rofin","doi":"10.48550/arXiv.2402.09963","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09963","url":null,"abstract":"Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"29 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence 解锁结构测量:介绍 PDD--位置话语一致性的自动度量标准
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10175
Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier
Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.
最近的大型语言模型(LLM)在将生成的文本与各种任务中的用户意图相一致方面表现出色。说到长文本生成,人们对从语篇一致性角度生成文本越来越感兴趣。然而,现有的词汇或语义度量标准,如 BLEU、ROUGE、BertScore 等,无法有效捕捉语篇连贯性。因此,开发针对特定语篇的自动评估方法来评估 LLM 的输出值得我们更多关注和探索。在本文中,我们提出了一种新颖的自动度量方法,旨在量化两篇长篇文章之间的话语分歧。在三个代表性领域的数据集上进行的广泛实验表明,我们的度量方法与人类偏好和 GPT-4 连贯性评估更为一致,优于现有的评估方法。
{"title":"Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence","authors":"Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier","doi":"10.48550/arXiv.2402.10175","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10175","url":null,"abstract":"Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"26 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of CCC and ZCCS Through Additive Characters Over Galois Field 通过伽罗瓦场上的加法字符构建 CCC 和 ZCCS
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09757
Gobinda Ghosh, S. Majhi, Subhabrata Paul
The rapid progression in wireless communication technologies, especially in multicarrier code-division multiple access (MC-CDMA), there is a need of advanced code construction methods. Traditional approaches, mainly based on generalized Boolean functions, have limitations in code length versatility. This paper introduces a novel approach to constructing complete complementary codes (CCC) and Z-complementary code sets (ZCCS), for reducing interference in MC-CDMA systems. The proposed construction, distinct from Boolean function-based approaches, employs additive characters over Galois fields GF($p^{r}$), where $p$ is prime and $r$ is a positive integer. First, we develop CCCs with lengths of $p^{r}$, which are then extended to construct ZCCS with both unreported lengths and sizes of $np^{r}$, where $n$ are arbitrary positive integers. The versatility of this method is further highlighted as it includes the lengths of ZCCS reported in prior studies as special cases, underscoring the method's comprehensive nature and superiority.
随着无线通信技术的飞速发展,特别是多载波码分多址(MC-CDMA)技术的发展,需要有先进的编码构造方法。传统方法主要基于广义布尔函数,在代码长度的通用性方面存在局限性。本文介绍了一种构建完整互补码 (CCC) 和 Z 互补码组 (ZCCS) 的新方法,以减少 MC-CDMA 系统中的干扰。与基于布尔函数的方法不同,本文提出的构造采用伽罗瓦域 GF($p^{r}$)上的加法字符,其中$p$为素数,$r$为正整数。首先,我们开发了长度为 $p^{r}$ 的 CCC,然后将其扩展到构建长度和大小均为 $np^{r}$ 的 ZCCS,其中 $n$ 为任意正整数。由于该方法将先前研究中报告的 ZCCS 长度作为特例纳入其中,从而进一步突出了该方法的多功能性,强调了该方法的全面性和优越性。
{"title":"Construction of CCC and ZCCS Through Additive Characters Over Galois Field","authors":"Gobinda Ghosh, S. Majhi, Subhabrata Paul","doi":"10.48550/arXiv.2402.09757","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09757","url":null,"abstract":"The rapid progression in wireless communication technologies, especially in multicarrier code-division multiple access (MC-CDMA), there is a need of advanced code construction methods. Traditional approaches, mainly based on generalized Boolean functions, have limitations in code length versatility. This paper introduces a novel approach to constructing complete complementary codes (CCC) and Z-complementary code sets (ZCCS), for reducing interference in MC-CDMA systems. The proposed construction, distinct from Boolean function-based approaches, employs additive characters over Galois fields GF($p^{r}$), where $p$ is prime and $r$ is a positive integer. First, we develop CCCs with lengths of $p^{r}$, which are then extended to construct ZCCS with both unreported lengths and sizes of $np^{r}$, where $n$ are arbitrary positive integers. The versatility of this method is further highlighted as it includes the lengths of ZCCS reported in prior studies as special cases, underscoring the method's comprehensive nature and superiority.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"20 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized Algorithms for Steiner Forest in Bounded Width Graphs 有界宽度图中斯坦纳森林的参数化算法
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09835
A. Feldmann, M. Lampis
In this paper we reassess the parameterized complexity and approximability of the well-studied Steiner Forest problem in several graph classes of bounded width. The problem takes an edge-weighted graph and pairs of vertices as input, and the aim is to find a minimum cost subgraph in which each given vertex pair lies in the same connected component. It is known that this problem is APX-hard in general, and NP-hard on graphs of treewidth 3, treedepth 4, and feedback vertex set size 2. However, Bateni, Hajiaghayi and Marx [JACM, 2011] gave an approximation scheme with a runtime of $n^{O(frac{k^2}{varepsilon})}$ on graphs of treewidth $k$. Our main result is a much faster efficient parameterized approximation scheme (EPAS) with a runtime of $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})} cdot n^{O(1)}$. If $k$ instead is the vertex cover number of the input graph, we show how to compute the optimum solution in $2^{O(k log k)} cdot n^{O(1)}$ time, and we also prove that this runtime dependence on $k$ is asymptotically best possible, under ETH. Furthermore, if $k$ is the size of a feedback edge set, then we obtain a faster $2^{O(k)} cdot n^{O(1)}$ time algorithm, which again cannot be improved under ETH.
在本文中,我们重新评估了在几种有界宽度的图类中被广泛研究的斯坦纳森林问题的参数化复杂性和近似性。该问题以一个边加权图和一对顶点为输入,目的是找到一个成本最小的子图,其中每个给定的顶点对都位于同一个连通分量中。众所周知,这个问题在一般情况下是 APX 难,在树宽为 3、树深为 4 和反馈顶点集大小为 2 的图上是 NP 难。然而,Bateni、Hajiaghayi 和 Marx [JACM, 2011]给出了一个近似方案,在树宽为 $k$ 的图上的运行时间为 $n^{O(frac{k^2}{varepsilon})}$。我们的主要成果是一种更快的高效参数化近似方案(EPAS),其运行时间为 $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})}cdot n^{O(1)}$.如果 $k$ 是输入图的顶点覆盖数,我们将展示如何在 2^{O(k log k)} cdot n^{O(1)}$ 时间内计算最优解,我们还将证明在 ETH 条件下,这种运行时间对 $k$ 的依赖性是渐近最佳的。此外,如果 $k$ 是反馈边集的大小,那么我们会得到一个更快的 $2^{O(k)} cdot n^{O(1)}$ 时间算法,在 ETH 下同样无法改进。
{"title":"Parameterized Algorithms for Steiner Forest in Bounded Width Graphs","authors":"A. Feldmann, M. Lampis","doi":"10.48550/arXiv.2402.09835","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09835","url":null,"abstract":"In this paper we reassess the parameterized complexity and approximability of the well-studied Steiner Forest problem in several graph classes of bounded width. The problem takes an edge-weighted graph and pairs of vertices as input, and the aim is to find a minimum cost subgraph in which each given vertex pair lies in the same connected component. It is known that this problem is APX-hard in general, and NP-hard on graphs of treewidth 3, treedepth 4, and feedback vertex set size 2. However, Bateni, Hajiaghayi and Marx [JACM, 2011] gave an approximation scheme with a runtime of $n^{O(frac{k^2}{varepsilon})}$ on graphs of treewidth $k$. Our main result is a much faster efficient parameterized approximation scheme (EPAS) with a runtime of $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})} cdot n^{O(1)}$. If $k$ instead is the vertex cover number of the input graph, we show how to compute the optimum solution in $2^{O(k log k)} cdot n^{O(1)}$ time, and we also prove that this runtime dependence on $k$ is asymptotically best possible, under ETH. Furthermore, if $k$ is the size of a feedback edge set, then we obtain a faster $2^{O(k)} cdot n^{O(1)}$ time algorithm, which again cannot be improved under ETH.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"12 20","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization 梦想捕手针对语义一致的文本到图像个性化的外观匹配自我关注
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09812
Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang
The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.
文本到图像(T2I)个性化的目标是根据用户提供的参考概念定制扩散模型,生成与目标提示一致的各种概念图像。使用独特文本嵌入来表示参考概念的传统方法往往无法准确模仿参考概念的外观。为解决这一问题,一种解决方案是在目标去噪过程中明确调节参考图像,即所谓的键值替换。然而,之前的工作仅限于局部编辑,因为它们会破坏预训练 T2I 模型的结构路径。为了克服这一问题,我们提出了一种名为 DreamMatcher 的新颖插件方法,它将 T2I 个性化重新表述为语义匹配。具体来说,DreamMatcher 将目标值替换为通过语义匹配对齐的参考值,同时保持结构路径不变,以保留预训练 T2I 模型生成多样化结构的通用能力。我们还引入了语义一致的屏蔽策略,将个性化概念与目标提示引入的无关区域隔离开来。DreamMatcher 与现有的 T2I 模型兼容,在复杂场景中表现出显著的改进。深入分析证明了我们方法的有效性。
{"title":"DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization","authors":"Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang","doi":"10.48550/arXiv.2402.09812","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09812","url":null,"abstract":"The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"7 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild TSTEM:在野外收集网络威胁情报的认知平台
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09973
Prasasthy Balasubramanian, Sadaf Nazari, Danial Khosh Kholgh, A. Mahmoodi, Justin Seby, Panos Kostakos
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy that enhances the resilience of both Information Technology (IT) and Operational Technology (OT) environments against large-scale cyber-attacks. While previous research has focused on improving individual components of the extraction process, the community lacks open-source platforms for deploying streaming CTI data pipelines in the wild. To address this gap, the study describes the implementation of an efficient and well-performing platform capable of processing compute-intensive data pipelines based on the cloud computing paradigm for real-time detection, collecting, and sharing CTI from different online sources. We developed a prototype platform (TSTEM), a containerized microservice architecture that uses Tweepy, Scrapy, Terraform, ELK, Kafka, and MLOps to autonomously search, extract, and index IOCs in the wild. Moreover, the provisioning, monitoring, and management of the TSTEM platform are achieved through infrastructure as a code (IaC). Custom focus crawlers collect web content, which is then processed by a first-level classifier to identify potential indicators of compromise (IOCs). If deemed relevant, the content advances to a second level of extraction for further examination. Throughout this process, state-of-the-art NLP models are utilized for classification and entity extraction, enhancing the overall IOC extraction methodology. Our experimental results indicate that these models exhibit high accuracy (exceeding 98%) in the classification and extraction tasks, achieving this performance within a time frame of less than a minute. The effectiveness of our system can be attributed to a finely-tuned IOC extraction method that operates at multiple stages, ensuring precise identification of relevant information with low false positives.
从开放源中提取网络威胁情报(CTI)是一种快速发展的防御策略,可增强信息技术(IT)和操作技术(OT)环境抵御大规模网络攻击的能力。虽然以前的研究主要集中在改进提取过程的各个组件,但社区缺乏在野外部署流 CTI 数据管道的开源平台。为了弥补这一不足,本研究介绍了一个高效且性能良好的平台的实施情况,该平台能够处理基于云计算范式的计算密集型数据管道,用于实时检测、收集和共享来自不同在线来源的 CTI。我们开发了一个原型平台(TSTEM),它是一个容器化的微服务架构,使用 Tweepy、Scrapy、Terraform、ELK、Kafka 和 MLOps 在野外自主搜索、提取和索引 IOC。此外,TSTEM 平台的配置、监控和管理都是通过基础设施即代码(IaC)实现的。自定义焦点爬虫收集网络内容,然后由一级分类器进行处理,以识别潜在的入侵指标(IOC)。如果认为相关,内容将进入第二级提取,以便进一步检查。在整个过程中,最先进的 NLP 模型被用于分类和实体提取,从而增强了整个 IOC 提取方法。我们的实验结果表明,这些模型在分类和提取任务中表现出了很高的准确率(超过 98%),并在不到一分钟的时间内实现了这一性能。我们系统的有效性可归功于经过微调的 IOC 提取方法,该方法可在多个阶段运行,确保以较低的误报率精确识别相关信息。
{"title":"TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild","authors":"Prasasthy Balasubramanian, Sadaf Nazari, Danial Khosh Kholgh, A. Mahmoodi, Justin Seby, Panos Kostakos","doi":"10.48550/arXiv.2402.09973","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09973","url":null,"abstract":"The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy that enhances the resilience of both Information Technology (IT) and Operational Technology (OT) environments against large-scale cyber-attacks. While previous research has focused on improving individual components of the extraction process, the community lacks open-source platforms for deploying streaming CTI data pipelines in the wild. To address this gap, the study describes the implementation of an efficient and well-performing platform capable of processing compute-intensive data pipelines based on the cloud computing paradigm for real-time detection, collecting, and sharing CTI from different online sources. We developed a prototype platform (TSTEM), a containerized microservice architecture that uses Tweepy, Scrapy, Terraform, ELK, Kafka, and MLOps to autonomously search, extract, and index IOCs in the wild. Moreover, the provisioning, monitoring, and management of the TSTEM platform are achieved through infrastructure as a code (IaC). Custom focus crawlers collect web content, which is then processed by a first-level classifier to identify potential indicators of compromise (IOCs). If deemed relevant, the content advances to a second level of extraction for further examination. Throughout this process, state-of-the-art NLP models are utilized for classification and entity extraction, enhancing the overall IOC extraction methodology. Our experimental results indicate that these models exhibit high accuracy (exceeding 98%) in the classification and extraction tasks, achieving this performance within a time frame of less than a minute. The effectiveness of our system can be attributed to a finely-tuned IOC extraction method that operates at multiple stages, ensuring precise identification of relevant information with low false positives.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"6 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-based Federated Recommendation 基于 LLM 的联合推荐
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09959
Jujia Zhao, Wenjie Wang, Chen Xu, Zhaochun Ren, See-kiong Ng, Tat-seng Chua
Large Language Models (LLMs), with their advanced contextual understanding abilities, have demonstrated considerable potential in enhancing recommendation systems via fine-tuning methods. However, fine-tuning requires users' behavior data, which poses considerable privacy risks due to the incorporation of sensitive user information. The unintended disclosure of such data could infringe upon data protection laws and give rise to ethical issues. To mitigate these privacy issues, Federated Learning for Recommendation (Fed4Rec) has emerged as a promising approach. Nevertheless, applying Fed4Rec to LLM-based recommendation presents two main challenges: first, an increase in the imbalance of performance across clients, affecting the system's efficiency over time, and second, a high demand on clients' computational and storage resources for local training and inference of LLMs. To address these challenges, we introduce a Privacy-Preserving LLM-based Recommendation (PPLR) framework. The PPLR framework employs two primary strategies. First, it implements a dynamic balance strategy, which involves the design of dynamic parameter aggregation and adjustment of learning speed for different clients during the training phase, to ensure relatively balanced performance across all clients. Second, PPLR adopts a flexible storage strategy, selectively retaining certain sensitive layers of the language model on the client side while offloading non-sensitive layers to the server. This approach aims to preserve user privacy while efficiently saving computational and storage resources. Experimental results demonstrate that PPLR not only achieves a balanced performance among clients but also enhances overall system performance in a manner that is both computationally and storage-efficient, while effectively protecting user privacy.
大型语言模型(LLM)具有先进的上下文理解能力,在通过微调方法增强推荐系统方面已显示出相当大的潜力。然而,微调需要用户的行为数据,由于包含敏感的用户信息,因此存在相当大的隐私风险。无意中披露这些数据可能会违反数据保护法,并引发道德问题。为缓解这些隐私问题,联合推荐学习(Fed4Rec)已成为一种很有前途的方法。然而,将 Fed4Rec 应用于基于 LLM 的推荐会面临两个主要挑战:首先,客户端之间的性能不平衡会加剧,从而影响系统的长期效率;其次,本地训练和推理 LLM 对客户端的计算和存储资源要求很高。为了应对这些挑战,我们引入了基于 LLM 的隐私保护推荐(PPLR)框架。PPLR 框架采用两种主要策略。首先,它采用动态平衡策略,即在训练阶段为不同客户设计动态参数聚合和调整学习速度,以确保所有客户的性能相对均衡。其次,PPLR 采用灵活的存储策略,有选择地将语言模型的某些敏感层保留在客户端,而将非敏感层卸载到服务器。这种方法旨在保护用户隐私,同时有效节省计算和存储资源。实验结果表明,PPLR 不仅实现了客户端之间的性能平衡,还以一种既节省计算和存储资源又能有效保护用户隐私的方式提高了系统的整体性能。
{"title":"LLM-based Federated Recommendation","authors":"Jujia Zhao, Wenjie Wang, Chen Xu, Zhaochun Ren, See-kiong Ng, Tat-seng Chua","doi":"10.48550/arXiv.2402.09959","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09959","url":null,"abstract":"Large Language Models (LLMs), with their advanced contextual understanding abilities, have demonstrated considerable potential in enhancing recommendation systems via fine-tuning methods. However, fine-tuning requires users' behavior data, which poses considerable privacy risks due to the incorporation of sensitive user information. The unintended disclosure of such data could infringe upon data protection laws and give rise to ethical issues. To mitigate these privacy issues, Federated Learning for Recommendation (Fed4Rec) has emerged as a promising approach. Nevertheless, applying Fed4Rec to LLM-based recommendation presents two main challenges: first, an increase in the imbalance of performance across clients, affecting the system's efficiency over time, and second, a high demand on clients' computational and storage resources for local training and inference of LLMs. To address these challenges, we introduce a Privacy-Preserving LLM-based Recommendation (PPLR) framework. The PPLR framework employs two primary strategies. First, it implements a dynamic balance strategy, which involves the design of dynamic parameter aggregation and adjustment of learning speed for different clients during the training phase, to ensure relatively balanced performance across all clients. Second, PPLR adopts a flexible storage strategy, selectively retaining certain sensitive layers of the language model on the client side while offloading non-sensitive layers to the server. This approach aims to preserve user privacy while efficiently saving computational and storage resources. Experimental results demonstrate that PPLR not only achieves a balanced performance among clients but also enhances overall system performance in a manner that is both computationally and storage-efficient, while effectively protecting user privacy.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"2 6","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recovering the Pre-Fine-Tuning Weights of Generative Models 恢复生成模型的预微调权重
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10208
Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen
The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.
生成式建模的主流模式包括两个步骤:i) 在大规模但不安全的数据集上进行预训练;ii) 通过微调使预训练模型与人类的价值观相一致。这种做法被认为是安全的,因为目前没有任何方法可以恢复不安全的、预先微调的模型权重。在本文中,我们将证明这一假设往往是错误的。具体来说,我们提出了光谱去微调法(Spectral DeTuning),这是一种可以使用少量低阶(LoRA)微调模型恢复微调前模型权重的方法。与以往试图恢复预微调能力的攻击不同,我们的方法旨在恢复精确的预微调权重。我们的方法利用了这一新漏洞来对付大规模模型,如个性化稳定扩散模型和对齐 Mistral 模型。
{"title":"Recovering the Pre-Fine-Tuning Weights of Generative Models","authors":"Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen","doi":"10.48550/arXiv.2402.10208","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10208","url":null,"abstract":"The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"31 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias 受多体物理学启发的电感偏置多激励投射模拟
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10192
Philip A. LeMaitre, Marius Krumm, H. Briegel
With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.
随着深度学习取得令人瞩目的进展,依赖机器学习的应用正越来越多地融入日常生活。然而,大多数深度学习模型都具有不透明、类似甲骨文的性质,因此很难解释和理解其决策。这一问题导致了可解释人工智能(XAI)领域的发展。该领域的一种方法被称为 "投影模拟"(Projective Simulation,PS),它将思维链建模为粒子在图形上的随机行走,而图形的顶点都附有概念。虽然这种描述方法有各种优点,包括量化的可能性,但它无法自然地用于模拟同时结合多个概念的思维。为了克服这一局限,我们引入了多激发投影模拟(mePS),这是一种将思维链视为超图上多个粒子随机行走的概括。我们提出了动态超图的定义,以描述代理的训练历史,并将其应用于人工智能和超图可视化。量子多体物理学中使用的少体相互作用模型取得了巨大成功,受此启发,我们为经典的 mePS 框架正式确定了归纳偏差,并利用它来解决与超图的天真实现相关的指数级复杂性问题。我们证明,我们的归纳偏差将复杂性从指数级降低到了多项式级,指数代表了粒子相互作用数量的截止值。我们将我们的方法应用于两个玩具环境和一个更复杂的计算机故障诊断模型。这些环境表明,选择适当的归纳偏差可以节省资源,并展示了可解释性的各个方面。此外,还简要介绍了 mePS 的量子模型,并讨论了该模型的一些未来发展方向。
{"title":"Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias","authors":"Philip A. LeMaitre, Marius Krumm, H. Briegel","doi":"10.48550/arXiv.2402.10192","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10192","url":null,"abstract":"With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"16 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1