首页 > 最新文献

ArXiv最新文献

英文 中文
Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence 解锁结构测量:介绍 PDD--位置话语一致性的自动度量标准
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10175
Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier
Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.
最近的大型语言模型(LLM)在将生成的文本与各种任务中的用户意图相一致方面表现出色。说到长文本生成,人们对从语篇一致性角度生成文本越来越感兴趣。然而,现有的词汇或语义度量标准,如 BLEU、ROUGE、BertScore 等,无法有效捕捉语篇连贯性。因此,开发针对特定语篇的自动评估方法来评估 LLM 的输出值得我们更多关注和探索。在本文中,我们提出了一种新颖的自动度量方法,旨在量化两篇长篇文章之间的话语分歧。在三个代表性领域的数据集上进行的广泛实验表明,我们的度量方法与人类偏好和 GPT-4 连贯性评估更为一致,优于现有的评估方法。
{"title":"Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence","authors":"Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier","doi":"10.48550/arXiv.2402.10175","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10175","url":null,"abstract":"Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of CCC and ZCCS Through Additive Characters Over Galois Field 通过伽罗瓦场上的加法字符构建 CCC 和 ZCCS
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09757
Gobinda Ghosh, S. Majhi, Subhabrata Paul
The rapid progression in wireless communication technologies, especially in multicarrier code-division multiple access (MC-CDMA), there is a need of advanced code construction methods. Traditional approaches, mainly based on generalized Boolean functions, have limitations in code length versatility. This paper introduces a novel approach to constructing complete complementary codes (CCC) and Z-complementary code sets (ZCCS), for reducing interference in MC-CDMA systems. The proposed construction, distinct from Boolean function-based approaches, employs additive characters over Galois fields GF($p^{r}$), where $p$ is prime and $r$ is a positive integer. First, we develop CCCs with lengths of $p^{r}$, which are then extended to construct ZCCS with both unreported lengths and sizes of $np^{r}$, where $n$ are arbitrary positive integers. The versatility of this method is further highlighted as it includes the lengths of ZCCS reported in prior studies as special cases, underscoring the method's comprehensive nature and superiority.
随着无线通信技术的飞速发展,特别是多载波码分多址(MC-CDMA)技术的发展,需要有先进的编码构造方法。传统方法主要基于广义布尔函数,在代码长度的通用性方面存在局限性。本文介绍了一种构建完整互补码 (CCC) 和 Z 互补码组 (ZCCS) 的新方法,以减少 MC-CDMA 系统中的干扰。与基于布尔函数的方法不同,本文提出的构造采用伽罗瓦域 GF($p^{r}$)上的加法字符,其中$p$为素数,$r$为正整数。首先,我们开发了长度为 $p^{r}$ 的 CCC,然后将其扩展到构建长度和大小均为 $np^{r}$ 的 ZCCS,其中 $n$ 为任意正整数。由于该方法将先前研究中报告的 ZCCS 长度作为特例纳入其中,从而进一步突出了该方法的多功能性,强调了该方法的全面性和优越性。
{"title":"Construction of CCC and ZCCS Through Additive Characters Over Galois Field","authors":"Gobinda Ghosh, S. Majhi, Subhabrata Paul","doi":"10.48550/arXiv.2402.09757","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09757","url":null,"abstract":"The rapid progression in wireless communication technologies, especially in multicarrier code-division multiple access (MC-CDMA), there is a need of advanced code construction methods. Traditional approaches, mainly based on generalized Boolean functions, have limitations in code length versatility. This paper introduces a novel approach to constructing complete complementary codes (CCC) and Z-complementary code sets (ZCCS), for reducing interference in MC-CDMA systems. The proposed construction, distinct from Boolean function-based approaches, employs additive characters over Galois fields GF($p^{r}$), where $p$ is prime and $r$ is a positive integer. First, we develop CCCs with lengths of $p^{r}$, which are then extended to construct ZCCS with both unreported lengths and sizes of $np^{r}$, where $n$ are arbitrary positive integers. The versatility of this method is further highlighted as it includes the lengths of ZCCS reported in prior studies as special cases, underscoring the method's comprehensive nature and superiority.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameterized Algorithms for Steiner Forest in Bounded Width Graphs 有界宽度图中斯坦纳森林的参数化算法
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09835
A. Feldmann, M. Lampis
In this paper we reassess the parameterized complexity and approximability of the well-studied Steiner Forest problem in several graph classes of bounded width. The problem takes an edge-weighted graph and pairs of vertices as input, and the aim is to find a minimum cost subgraph in which each given vertex pair lies in the same connected component. It is known that this problem is APX-hard in general, and NP-hard on graphs of treewidth 3, treedepth 4, and feedback vertex set size 2. However, Bateni, Hajiaghayi and Marx [JACM, 2011] gave an approximation scheme with a runtime of $n^{O(frac{k^2}{varepsilon})}$ on graphs of treewidth $k$. Our main result is a much faster efficient parameterized approximation scheme (EPAS) with a runtime of $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})} cdot n^{O(1)}$. If $k$ instead is the vertex cover number of the input graph, we show how to compute the optimum solution in $2^{O(k log k)} cdot n^{O(1)}$ time, and we also prove that this runtime dependence on $k$ is asymptotically best possible, under ETH. Furthermore, if $k$ is the size of a feedback edge set, then we obtain a faster $2^{O(k)} cdot n^{O(1)}$ time algorithm, which again cannot be improved under ETH.
在本文中,我们重新评估了在几种有界宽度的图类中被广泛研究的斯坦纳森林问题的参数化复杂性和近似性。该问题以一个边加权图和一对顶点为输入,目的是找到一个成本最小的子图,其中每个给定的顶点对都位于同一个连通分量中。众所周知,这个问题在一般情况下是 APX 难,在树宽为 3、树深为 4 和反馈顶点集大小为 2 的图上是 NP 难。然而,Bateni、Hajiaghayi 和 Marx [JACM, 2011]给出了一个近似方案,在树宽为 $k$ 的图上的运行时间为 $n^{O(frac{k^2}{varepsilon})}$。我们的主要成果是一种更快的高效参数化近似方案(EPAS),其运行时间为 $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})}cdot n^{O(1)}$.如果 $k$ 是输入图的顶点覆盖数,我们将展示如何在 2^{O(k log k)} cdot n^{O(1)}$ 时间内计算最优解,我们还将证明在 ETH 条件下,这种运行时间对 $k$ 的依赖性是渐近最佳的。此外,如果 $k$ 是反馈边集的大小,那么我们会得到一个更快的 $2^{O(k)} cdot n^{O(1)}$ 时间算法,在 ETH 下同样无法改进。
{"title":"Parameterized Algorithms for Steiner Forest in Bounded Width Graphs","authors":"A. Feldmann, M. Lampis","doi":"10.48550/arXiv.2402.09835","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09835","url":null,"abstract":"In this paper we reassess the parameterized complexity and approximability of the well-studied Steiner Forest problem in several graph classes of bounded width. The problem takes an edge-weighted graph and pairs of vertices as input, and the aim is to find a minimum cost subgraph in which each given vertex pair lies in the same connected component. It is known that this problem is APX-hard in general, and NP-hard on graphs of treewidth 3, treedepth 4, and feedback vertex set size 2. However, Bateni, Hajiaghayi and Marx [JACM, 2011] gave an approximation scheme with a runtime of $n^{O(frac{k^2}{varepsilon})}$ on graphs of treewidth $k$. Our main result is a much faster efficient parameterized approximation scheme (EPAS) with a runtime of $2^{O(frac{k^2}{varepsilon} log frac{k^2}{varepsilon})} cdot n^{O(1)}$. If $k$ instead is the vertex cover number of the input graph, we show how to compute the optimum solution in $2^{O(k log k)} cdot n^{O(1)}$ time, and we also prove that this runtime dependence on $k$ is asymptotically best possible, under ETH. Furthermore, if $k$ is the size of a feedback edge set, then we obtain a faster $2^{O(k)} cdot n^{O(1)}$ time algorithm, which again cannot be improved under ETH.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization 梦想捕手针对语义一致的文本到图像个性化的外观匹配自我关注
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09812
Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang
The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.
文本到图像(T2I)个性化的目标是根据用户提供的参考概念定制扩散模型,生成与目标提示一致的各种概念图像。使用独特文本嵌入来表示参考概念的传统方法往往无法准确模仿参考概念的外观。为解决这一问题,一种解决方案是在目标去噪过程中明确调节参考图像,即所谓的键值替换。然而,之前的工作仅限于局部编辑,因为它们会破坏预训练 T2I 模型的结构路径。为了克服这一问题,我们提出了一种名为 DreamMatcher 的新颖插件方法,它将 T2I 个性化重新表述为语义匹配。具体来说,DreamMatcher 将目标值替换为通过语义匹配对齐的参考值,同时保持结构路径不变,以保留预训练 T2I 模型生成多样化结构的通用能力。我们还引入了语义一致的屏蔽策略,将个性化概念与目标提示引入的无关区域隔离开来。DreamMatcher 与现有的 T2I 模型兼容,在复杂场景中表现出显著的改进。深入分析证明了我们方法的有效性。
{"title":"DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization","authors":"Jisu Nam, Heesu Kim, DongJae Lee, Siyoon Jin, Seungryong Kim, Seunggyu Chang","doi":"10.48550/arXiv.2402.09812","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09812","url":null,"abstract":"The objective of text-to-image (T2I) personalization is to customize a diffusion model to a user-provided reference concept, generating diverse images of the concept aligned with the target prompts. Conventional methods representing the reference concepts using unique text embeddings often fail to accurately mimic the appearance of the reference. To address this, one solution may be explicitly conditioning the reference images into the target denoising process, known as key-value replacement. However, prior works are constrained to local editing since they disrupt the structure path of the pre-trained T2I model. To overcome this, we propose a novel plug-in method, called DreamMatcher, which reformulates T2I personalization as semantic matching. Specifically, DreamMatcher replaces the target values with reference values aligned by semantic matching, while leaving the structure path unchanged to preserve the versatile capability of pre-trained T2I models for generating diverse structures. We also introduce a semantic-consistent masking strategy to isolate the personalized concept from irrelevant regions introduced by the target prompts. Compatible with existing T2I models, DreamMatcher shows significant improvements in complex scenarios. Intensive analyses demonstrate the effectiveness of our approach.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild TSTEM:在野外收集网络威胁情报的认知平台
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09973
Prasasthy Balasubramanian, Sadaf Nazari, Danial Khosh Kholgh, A. Mahmoodi, Justin Seby, Panos Kostakos
The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy that enhances the resilience of both Information Technology (IT) and Operational Technology (OT) environments against large-scale cyber-attacks. While previous research has focused on improving individual components of the extraction process, the community lacks open-source platforms for deploying streaming CTI data pipelines in the wild. To address this gap, the study describes the implementation of an efficient and well-performing platform capable of processing compute-intensive data pipelines based on the cloud computing paradigm for real-time detection, collecting, and sharing CTI from different online sources. We developed a prototype platform (TSTEM), a containerized microservice architecture that uses Tweepy, Scrapy, Terraform, ELK, Kafka, and MLOps to autonomously search, extract, and index IOCs in the wild. Moreover, the provisioning, monitoring, and management of the TSTEM platform are achieved through infrastructure as a code (IaC). Custom focus crawlers collect web content, which is then processed by a first-level classifier to identify potential indicators of compromise (IOCs). If deemed relevant, the content advances to a second level of extraction for further examination. Throughout this process, state-of-the-art NLP models are utilized for classification and entity extraction, enhancing the overall IOC extraction methodology. Our experimental results indicate that these models exhibit high accuracy (exceeding 98%) in the classification and extraction tasks, achieving this performance within a time frame of less than a minute. The effectiveness of our system can be attributed to a finely-tuned IOC extraction method that operates at multiple stages, ensuring precise identification of relevant information with low false positives.
从开放源中提取网络威胁情报(CTI)是一种快速发展的防御策略,可增强信息技术(IT)和操作技术(OT)环境抵御大规模网络攻击的能力。虽然以前的研究主要集中在改进提取过程的各个组件,但社区缺乏在野外部署流 CTI 数据管道的开源平台。为了弥补这一不足,本研究介绍了一个高效且性能良好的平台的实施情况,该平台能够处理基于云计算范式的计算密集型数据管道,用于实时检测、收集和共享来自不同在线来源的 CTI。我们开发了一个原型平台(TSTEM),它是一个容器化的微服务架构,使用 Tweepy、Scrapy、Terraform、ELK、Kafka 和 MLOps 在野外自主搜索、提取和索引 IOC。此外,TSTEM 平台的配置、监控和管理都是通过基础设施即代码(IaC)实现的。自定义焦点爬虫收集网络内容,然后由一级分类器进行处理,以识别潜在的入侵指标(IOC)。如果认为相关,内容将进入第二级提取,以便进一步检查。在整个过程中,最先进的 NLP 模型被用于分类和实体提取,从而增强了整个 IOC 提取方法。我们的实验结果表明,这些模型在分类和提取任务中表现出了很高的准确率(超过 98%),并在不到一分钟的时间内实现了这一性能。我们系统的有效性可归功于经过微调的 IOC 提取方法,该方法可在多个阶段运行,确保以较低的误报率精确识别相关信息。
{"title":"TSTEM: A Cognitive Platform for Collecting Cyber Threat Intelligence in the Wild","authors":"Prasasthy Balasubramanian, Sadaf Nazari, Danial Khosh Kholgh, A. Mahmoodi, Justin Seby, Panos Kostakos","doi":"10.48550/arXiv.2402.09973","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09973","url":null,"abstract":"The extraction of cyber threat intelligence (CTI) from open sources is a rapidly expanding defensive strategy that enhances the resilience of both Information Technology (IT) and Operational Technology (OT) environments against large-scale cyber-attacks. While previous research has focused on improving individual components of the extraction process, the community lacks open-source platforms for deploying streaming CTI data pipelines in the wild. To address this gap, the study describes the implementation of an efficient and well-performing platform capable of processing compute-intensive data pipelines based on the cloud computing paradigm for real-time detection, collecting, and sharing CTI from different online sources. We developed a prototype platform (TSTEM), a containerized microservice architecture that uses Tweepy, Scrapy, Terraform, ELK, Kafka, and MLOps to autonomously search, extract, and index IOCs in the wild. Moreover, the provisioning, monitoring, and management of the TSTEM platform are achieved through infrastructure as a code (IaC). Custom focus crawlers collect web content, which is then processed by a first-level classifier to identify potential indicators of compromise (IOCs). If deemed relevant, the content advances to a second level of extraction for further examination. Throughout this process, state-of-the-art NLP models are utilized for classification and entity extraction, enhancing the overall IOC extraction methodology. Our experimental results indicate that these models exhibit high accuracy (exceeding 98%) in the classification and extraction tasks, achieving this performance within a time frame of less than a minute. The effectiveness of our system can be attributed to a finely-tuned IOC extraction method that operates at multiple stages, ensuring precise identification of relevant information with low false positives.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-based Federated Recommendation 基于 LLM 的联合推荐
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09959
Jujia Zhao, Wenjie Wang, Chen Xu, Zhaochun Ren, See-kiong Ng, Tat-seng Chua
Large Language Models (LLMs), with their advanced contextual understanding abilities, have demonstrated considerable potential in enhancing recommendation systems via fine-tuning methods. However, fine-tuning requires users' behavior data, which poses considerable privacy risks due to the incorporation of sensitive user information. The unintended disclosure of such data could infringe upon data protection laws and give rise to ethical issues. To mitigate these privacy issues, Federated Learning for Recommendation (Fed4Rec) has emerged as a promising approach. Nevertheless, applying Fed4Rec to LLM-based recommendation presents two main challenges: first, an increase in the imbalance of performance across clients, affecting the system's efficiency over time, and second, a high demand on clients' computational and storage resources for local training and inference of LLMs. To address these challenges, we introduce a Privacy-Preserving LLM-based Recommendation (PPLR) framework. The PPLR framework employs two primary strategies. First, it implements a dynamic balance strategy, which involves the design of dynamic parameter aggregation and adjustment of learning speed for different clients during the training phase, to ensure relatively balanced performance across all clients. Second, PPLR adopts a flexible storage strategy, selectively retaining certain sensitive layers of the language model on the client side while offloading non-sensitive layers to the server. This approach aims to preserve user privacy while efficiently saving computational and storage resources. Experimental results demonstrate that PPLR not only achieves a balanced performance among clients but also enhances overall system performance in a manner that is both computationally and storage-efficient, while effectively protecting user privacy.
大型语言模型(LLM)具有先进的上下文理解能力,在通过微调方法增强推荐系统方面已显示出相当大的潜力。然而,微调需要用户的行为数据,由于包含敏感的用户信息,因此存在相当大的隐私风险。无意中披露这些数据可能会违反数据保护法,并引发道德问题。为缓解这些隐私问题,联合推荐学习(Fed4Rec)已成为一种很有前途的方法。然而,将 Fed4Rec 应用于基于 LLM 的推荐会面临两个主要挑战:首先,客户端之间的性能不平衡会加剧,从而影响系统的长期效率;其次,本地训练和推理 LLM 对客户端的计算和存储资源要求很高。为了应对这些挑战,我们引入了基于 LLM 的隐私保护推荐(PPLR)框架。PPLR 框架采用两种主要策略。首先,它采用动态平衡策略,即在训练阶段为不同客户设计动态参数聚合和调整学习速度,以确保所有客户的性能相对均衡。其次,PPLR 采用灵活的存储策略,有选择地将语言模型的某些敏感层保留在客户端,而将非敏感层卸载到服务器。这种方法旨在保护用户隐私,同时有效节省计算和存储资源。实验结果表明,PPLR 不仅实现了客户端之间的性能平衡,还以一种既节省计算和存储资源又能有效保护用户隐私的方式提高了系统的整体性能。
{"title":"LLM-based Federated Recommendation","authors":"Jujia Zhao, Wenjie Wang, Chen Xu, Zhaochun Ren, See-kiong Ng, Tat-seng Chua","doi":"10.48550/arXiv.2402.09959","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09959","url":null,"abstract":"Large Language Models (LLMs), with their advanced contextual understanding abilities, have demonstrated considerable potential in enhancing recommendation systems via fine-tuning methods. However, fine-tuning requires users' behavior data, which poses considerable privacy risks due to the incorporation of sensitive user information. The unintended disclosure of such data could infringe upon data protection laws and give rise to ethical issues. To mitigate these privacy issues, Federated Learning for Recommendation (Fed4Rec) has emerged as a promising approach. Nevertheless, applying Fed4Rec to LLM-based recommendation presents two main challenges: first, an increase in the imbalance of performance across clients, affecting the system's efficiency over time, and second, a high demand on clients' computational and storage resources for local training and inference of LLMs. To address these challenges, we introduce a Privacy-Preserving LLM-based Recommendation (PPLR) framework. The PPLR framework employs two primary strategies. First, it implements a dynamic balance strategy, which involves the design of dynamic parameter aggregation and adjustment of learning speed for different clients during the training phase, to ensure relatively balanced performance across all clients. Second, PPLR adopts a flexible storage strategy, selectively retaining certain sensitive layers of the language model on the client side while offloading non-sensitive layers to the server. This approach aims to preserve user privacy while efficiently saving computational and storage resources. Experimental results demonstrate that PPLR not only achieves a balanced performance among clients but also enhances overall system performance in a manner that is both computationally and storage-efficient, while effectively protecting user privacy.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recovering the Pre-Fine-Tuning Weights of Generative Models 恢复生成模型的预微调权重
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10208
Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen
The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.
生成式建模的主流模式包括两个步骤:i) 在大规模但不安全的数据集上进行预训练;ii) 通过微调使预训练模型与人类的价值观相一致。这种做法被认为是安全的,因为目前没有任何方法可以恢复不安全的、预先微调的模型权重。在本文中,我们将证明这一假设往往是错误的。具体来说,我们提出了光谱去微调法(Spectral DeTuning),这是一种可以使用少量低阶(LoRA)微调模型恢复微调前模型权重的方法。与以往试图恢复预微调能力的攻击不同,我们的方法旨在恢复精确的预微调权重。我们的方法利用了这一新漏洞来对付大规模模型,如个性化稳定扩散模型和对齐 Mistral 模型。
{"title":"Recovering the Pre-Fine-Tuning Weights of Generative Models","authors":"Eliahu Horwitz, Jonathan Kahana, Yedid Hoshen","doi":"10.48550/arXiv.2402.10208","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10208","url":null,"abstract":"The dominant paradigm in generative modeling consists of two steps: i) pre-training on a large-scale but unsafe dataset, ii) aligning the pre-trained model with human values via fine-tuning. This practice is considered safe, as no current method can recover the unsafe, pre-fine-tuning model weights. In this paper, we demonstrate that this assumption is often false. Concretely, we present Spectral DeTuning, a method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, our method aims to recover the exact pre-fine-tuning weights. Our approach exploits this new vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias 受多体物理学启发的电感偏置多激励投射模拟
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10192
Philip A. LeMaitre, Marius Krumm, H. Briegel
With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.
随着深度学习取得令人瞩目的进展,依赖机器学习的应用正越来越多地融入日常生活。然而,大多数深度学习模型都具有不透明、类似甲骨文的性质,因此很难解释和理解其决策。这一问题导致了可解释人工智能(XAI)领域的发展。该领域的一种方法被称为 "投影模拟"(Projective Simulation,PS),它将思维链建模为粒子在图形上的随机行走,而图形的顶点都附有概念。虽然这种描述方法有各种优点,包括量化的可能性,但它无法自然地用于模拟同时结合多个概念的思维。为了克服这一局限,我们引入了多激发投影模拟(mePS),这是一种将思维链视为超图上多个粒子随机行走的概括。我们提出了动态超图的定义,以描述代理的训练历史,并将其应用于人工智能和超图可视化。量子多体物理学中使用的少体相互作用模型取得了巨大成功,受此启发,我们为经典的 mePS 框架正式确定了归纳偏差,并利用它来解决与超图的天真实现相关的指数级复杂性问题。我们证明,我们的归纳偏差将复杂性从指数级降低到了多项式级,指数代表了粒子相互作用数量的截止值。我们将我们的方法应用于两个玩具环境和一个更复杂的计算机故障诊断模型。这些环境表明,选择适当的归纳偏差可以节省资源,并展示了可解释性的各个方面。此外,还简要介绍了 mePS 的量子模型,并讨论了该模型的一些未来发展方向。
{"title":"Multi-Excitation Projective Simulation with a Many-Body Physics Inspired Inductive Bias","authors":"Philip A. LeMaitre, Marius Krumm, H. Briegel","doi":"10.48550/arXiv.2402.10192","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10192","url":null,"abstract":"With the impressive progress of deep learning, applications relying on machine learning are increasingly being integrated into daily life. However, most deep learning models have an opaque, oracle-like nature making it difficult to interpret and understand their decisions. This problem led to the development of the field known as eXplainable Artificial Intelligence (XAI). One method in this field known as Projective Simulation (PS) models a chain-of-thought as a random walk of a particle on a graph with vertices that have concepts attached to them. While this description has various benefits, including the possibility of quantization, it cannot be naturally used to model thoughts that combine several concepts simultaneously. To overcome this limitation, we introduce Multi-Excitation Projective Simulation (mePS), a generalization that considers a chain-of-thought to be a random walk of several particles on a hypergraph. A definition for a dynamic hypergraph is put forward to describe the agent's training history along with applications to AI and hypergraph visualization. An inductive bias inspired by the remarkably successful few-body interaction models used in quantum many-body physics is formalized for our classical mePS framework and employed to tackle the exponential complexity associated with naive implementations of hypergraphs. We prove that our inductive bias reduces the complexity from exponential to polynomial, with the exponent representing the cutoff on how many particles can interact. We numerically apply our method to two toy environments and a more complex scenario modelling the diagnosis of a broken computer. These environments demonstrate the resource savings provided by an appropriate choice of inductive bias, as well as showcasing aspects of interpretability. A quantum model for mePS is also briefly outlined and some future directions for it are discussed.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LAPDoc: Layout-Aware Prompting for Documents LAPDoc:文档布局感知提示
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09841
Marcel Lamott, Yves-Noel Weweler, A. Ulges, Faisal Shafait, Dirk Krechel, Darko Obradovic
Recent advances in training large language models (LLMs) using massive amounts of solely textual data lead to strong generalization across many domains and tasks, including document-specific tasks. Opposed to that there is a trend to train multi-modal transformer architectures tailored for document understanding that are designed specifically to fuse textual inputs with the corresponding document layout. This involves a separate fine-tuning step for which additional training data is required. At present, no document transformers with comparable generalization to LLMs are available That raises the question which type of model is to be preferred for document understanding tasks. In this paper we investigate the possibility to use purely text-based LLMs for document-specific tasks by using layout enrichment. We explore drop-in modifications and rule-based methods to enrich purely textual LLM prompts with layout information. In our experiments we investigate the effects on the commercial ChatGPT model and the open-source LLM Solar. We demonstrate that using our approach both LLMs show improved performance on various standard document benchmarks. In addition, we study the impact of noisy OCR and layout errors, as well as the limitations of LLMs when it comes to utilizing document layout. Our results indicate that layout enrichment can improve the performance of purely text-based LLMs for document understanding by up to 15% compared to just using plain document text. In conclusion, this approach should be considered for the best model choice between text-based LLM or multi-modal document transformers.
最近,在使用海量纯文本数据训练大型语言模型(LLMs)方面取得了进展,从而在许多领域和任务(包括特定文档任务)中实现了强大的泛化能力。与此相反,现在的趋势是训练为文档理解量身定制的多模式转换器架构,这种架构专门设计用于将文本输入与相应的文档布局融合在一起。这涉及一个单独的微调步骤,需要额外的训练数据。目前,还没有与 LLM 具有类似通用性的文档转换器。在本文中,我们研究了通过布局丰富化将纯文本 LLM 用于特定文档任务的可能性。我们探索了用布局信息丰富纯文本 LLM 提示的插入式修改和基于规则的方法。在实验中,我们研究了商业 ChatGPT 模型和开源 LLM Solar 的效果。我们证明,使用我们的方法后,这两种 LLM 在各种标准文档基准测试中的性能都有所提高。此外,我们还研究了噪声 OCR 和布局错误的影响,以及 LLM 在利用文档布局方面的局限性。我们的研究结果表明,与只使用纯文本文档相比,丰富布局可以将纯文本 LLMs 的文档理解性能提高 15%。总之,在基于文本的 LLM 或多模式文档转换器之间选择最佳模型时,应该考虑这种方法。
{"title":"LAPDoc: Layout-Aware Prompting for Documents","authors":"Marcel Lamott, Yves-Noel Weweler, A. Ulges, Faisal Shafait, Dirk Krechel, Darko Obradovic","doi":"10.48550/arXiv.2402.09841","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09841","url":null,"abstract":"Recent advances in training large language models (LLMs) using massive amounts of solely textual data lead to strong generalization across many domains and tasks, including document-specific tasks. Opposed to that there is a trend to train multi-modal transformer architectures tailored for document understanding that are designed specifically to fuse textual inputs with the corresponding document layout. This involves a separate fine-tuning step for which additional training data is required. At present, no document transformers with comparable generalization to LLMs are available That raises the question which type of model is to be preferred for document understanding tasks. In this paper we investigate the possibility to use purely text-based LLMs for document-specific tasks by using layout enrichment. We explore drop-in modifications and rule-based methods to enrich purely textual LLM prompts with layout information. In our experiments we investigate the effects on the commercial ChatGPT model and the open-source LLM Solar. We demonstrate that using our approach both LLMs show improved performance on various standard document benchmarks. In addition, we study the impact of noisy OCR and layout errors, as well as the limitations of LLMs when it comes to utilizing document layout. Our results indicate that layout enrichment can improve the performance of purely text-based LLMs for document understanding by up to 15% compared to just using plain document text. In conclusion, this approach should be considered for the best model choice between text-based LLM or multi-modal document transformers.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantized Embedding Vectors for Controllable Diffusion Language Models 可控扩散语言模型的量化嵌入向量
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10107
Cheng Kang, Xinye Chen, Yong Hu, Daniel Novak
Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.
提高扩散语言模型(DLM)的可控性、可移植性和推理速度是自然语言生成中的一个关键挑战。虽然最近的研究表明,利用语言模型生成复杂文本取得了巨大成功,但对内存和计算能力的要求仍然很高,与预期相差甚远,这自然会导致模型的可移植性低和不稳定。为了缓解这些问题,人们提出了许多成熟的神经网络量化方法。为了进一步提高神经网络独立部署的可移植性以及通过语言复杂度评估的稳定性,我们提出了一种名为量化嵌入可控扩散语言模型(QE-CDLM)的新方法。QE-CDLM 以最近成功的可控扩散语言模型为基础,通过量化重塑了特定任务的嵌入空间。这为生成任务带来了基于梯度的控制器,并获得了更稳定的中间潜变量,从而自然而然地加快了收敛速度并提高了可控性。此外,还采用了自适应微调方法来减少可调权重。五项具有挑战性的细粒度控制任务的实验结果表明,QE-CDLM 在质量和可行性方面优于现有方法,实现了更好的复杂度和轻量级微调。
{"title":"Quantized Embedding Vectors for Controllable Diffusion Language Models","authors":"Cheng Kang, Xinye Chen, Yong Hu, Daniel Novak","doi":"10.48550/arXiv.2402.10107","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10107","url":null,"abstract":"Improving the controllability, portability, and inference speed of diffusion language models (DLMs) is a key challenge in natural language generation. While recent research has shown significant success in complex text generation with language models, the memory and computational power are still very demanding and fall short of expectations, which naturally results in low portability and instability for the models. To mitigate these issues, numerous well-established methods were proposed for neural network quantization. To further enhance their portability of independent deployment as well as improve their stability evaluated by language perplexity, we propose a novel approach called the Quantized Embedding Controllable Diffusion Language Model (QE-CDLM). QE-CDLM builds upon the recent successful controllable DLMs by remodeling the task-specific embedding space via quantization. This leads to a gradient-based controller for the generation tasks, and more stable intermediate latent variables are obtained, which naturally brings in an accelerated convergence as well as better controllability. Additionally, the adaption fine-tuning method is employed to reduce tunable weights. Experimental results on five challenging fine-grained control tasks demonstrate that QE-CDLM compares favorably to existing methods in terms of quality and feasibility, achieving better perplexity and lightweight fine-tuning.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1