首页 > 最新文献

ArXiv最新文献

英文 中文
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models RS-DPO:用于大型语言模型对齐的混合拒绝采样和直接偏好优化方法
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10038
Saeed Khaki, JinJin Li, Lan Ma, Liu Yang, Prathap Ramachandra
Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address those challenges. However, DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model, limiting the effectiveness of the RLHF. In this paper, we addresses both challenges by systematically combining rejection sampling (RS) and DPO. Our proposed method, RS-DPO, initiates with the development of a supervised fine-tuned policy model (SFT). A varied set of k responses per prompt are sampled directly from the SFT model. RS-DPO identifies pairs of contrastive samples based on their reward distribution. Finally, we apply DPO with the contrastive samples to align the model to human preference. Our experiments indicate that our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent. Furthermore, it outperforms existing methods, including RS, PPO, and DPO.
来自人类反馈的强化学习(RLHF)已被广泛用于将大型语言模型与用户意图相匹配。然而,基于近端策略优化(PPO)的 RLHF 有时并不稳定,需要对超参数进行大量微调,而且在对齐过程中要使估计奖励最大化,计算成本很高。最近,有人提出了直接偏好优化(DPO)来应对这些挑战。然而,DPO 依赖于人类注释者和替代 LLM 生成的对比反应,而不是策略模型,从而限制了 RLHF 的有效性。在本文中,我们通过系统地结合拒绝采样(RS)和 DPO 来解决这两个难题。我们提出的 RS-DPO 方法首先要开发一个有监督的微调策略模型(SFT)。直接从 SFT 模型中抽取每个提示的 k 个不同响应集。RS-DPO 根据其奖励分布确定成对的对比样本。最后,我们对对比样本应用 DPO,使模型与人类偏好保持一致。我们的实验表明,我们提出的方法能在资源有限的环境下有效地微调 LLM,从而改善与用户意图的一致性。此外,它还优于 RS、PPO 和 DPO 等现有方法。
{"title":"RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models","authors":"Saeed Khaki, JinJin Li, Lan Ma, Liu Yang, Prathap Ramachandra","doi":"10.48550/arXiv.2402.10038","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10038","url":null,"abstract":"Reinforcement learning from human feedback (RLHF) has been extensively employed to align large language models with user intent. However, proximal policy optimization (PPO) based RLHF is occasionally unstable requiring significant hyperparameter finetuning, and computationally expensive to maximize the estimated reward during alignment. Recently, direct preference optimization (DPO) is proposed to address those challenges. However, DPO relies on contrastive responses generated from human annotator and alternative LLM, instead of the policy model, limiting the effectiveness of the RLHF. In this paper, we addresses both challenges by systematically combining rejection sampling (RS) and DPO. Our proposed method, RS-DPO, initiates with the development of a supervised fine-tuned policy model (SFT). A varied set of k responses per prompt are sampled directly from the SFT model. RS-DPO identifies pairs of contrastive samples based on their reward distribution. Finally, we apply DPO with the contrastive samples to align the model to human preference. Our experiments indicate that our proposed method effectively fine-tunes LLMs with limited resource environments, leading to improved alignment with user intent. Furthermore, it outperforms existing methods, including RS, PPO, and DPO.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling the Impact of Timeline Algorithms on Opinion Dynamics Using Low-rank Updates 利用低等级更新模拟时间轴算法对舆论动态的影响
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10053
Tianyi Zhou, Stefan Neumann, Kiran Garimella, A. Gionis
Timeline algorithms are key parts of online social networks, but during recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. Specifically, we show how the popular Friedkin--Johnsen opinion-formation model can be augmented based on aggregate information, extracted from timeline data. We use our model to study the problem of minimizing the polarization and disagreement; we assume that we are allowed to make small changes to the users' timeline compositions by strengthening some topics of discussion and penalizing some others. We present a gradient descent-based algorithm for this problem, and show that under realistic parameter settings, our algorithm computes a $(1+varepsilon)$-approximate solution in time $tilde{O}(msqrt{n} lg(1/varepsilon))$, where $m$ is the number of edges in the graph and $n$ is the number of vertices. We also present an algorithm that provably computes an $varepsilon$-approximation of our model in near-linear time. We evaluate our method on real-world data and show that it effectively reduces the polarization and disagreement in the network. Finally, we release an anonymized graph dataset with ground-truth opinions and more than 27,000 nodes (the previously largest publicly available dataset contains less than 550 nodes).
时间轴算法是在线社交网络的关键部分,但近年来它们却被指责为加剧社会两极分化和分歧的罪魁祸首。舆论动力学模型已被用于研究在线社交网络中的各种现象,但如何增强这些模型以考虑到用户级时间轴算法的细粒度影响,仍是一个未决问题。我们提供了一种方法来模拟时间轴算法对舆论动态的影响,从而在这一问题上取得了进展。具体来说,我们展示了流行的弗里德金-约翰逊(Friedkin-Johnsen)舆论形成模型如何基于从时间轴数据中提取的综合信息进行扩展。我们使用我们的模型来研究最小化两极分化和分歧的问题;我们假设允许我们通过加强一些讨论话题和惩罚另一些讨论话题来对用户的时间轴构成进行微小的改变。我们针对这个问题提出了一种基于梯度下降的算法,并证明在现实的参数设置下,我们的算法可以在 $tilde{O}(msqrt{n} 的时间内计算出一个 $(1+varepsilon)$ 近似解。lg(1/varepsilon))$,其中 $m$ 是图中边的数量,$n$ 是顶点的数量。我们还提出了一种算法,可以证明它能在接近线性的时间内计算出我们模型的 $varepsilon$ 近似值。我们在真实世界的数据上评估了我们的方法,结果表明它能有效减少网络中的两极分化和分歧。最后,我们发布了一个匿名图数据集,其中包含地面实况意见和超过 27,000 个节点(之前最大的公开数据集包含不到 550 个节点)。
{"title":"Modeling the Impact of Timeline Algorithms on Opinion Dynamics Using Low-rank Updates","authors":"Tianyi Zhou, Stefan Neumann, Kiran Garimella, A. Gionis","doi":"10.48550/arXiv.2402.10053","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10053","url":null,"abstract":"Timeline algorithms are key parts of online social networks, but during recent years they have been blamed for increasing polarization and disagreement in our society. Opinion-dynamics models have been used to study a variety of phenomena in online social networks, but an open question remains on how these models can be augmented to take into account the fine-grained impact of user-level timeline algorithms. We make progress on this question by providing a way to model the impact of timeline algorithms on opinion dynamics. Specifically, we show how the popular Friedkin--Johnsen opinion-formation model can be augmented based on aggregate information, extracted from timeline data. We use our model to study the problem of minimizing the polarization and disagreement; we assume that we are allowed to make small changes to the users' timeline compositions by strengthening some topics of discussion and penalizing some others. We present a gradient descent-based algorithm for this problem, and show that under realistic parameter settings, our algorithm computes a $(1+varepsilon)$-approximate solution in time $tilde{O}(msqrt{n} lg(1/varepsilon))$, where $m$ is the number of edges in the graph and $n$ is the number of vertices. We also present an algorithm that provably computes an $varepsilon$-approximation of our model in near-linear time. We evaluate our method on real-world data and show that it effectively reduces the polarization and disagreement in the network. Finally, we release an anonymized graph dataset with ground-truth opinions and more than 27,000 nodes (the previously largest publicly available dataset contains less than 550 nodes).","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VisIRNet: Deep Image Alignment for UAV-taken Visible and Infrared Image Pairs VisIRNet:无人机拍摄的可见光和红外图像对的深度图像配准
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09635
Sedat Ozer, A. P. Ndigande
This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.
本文针对无人机拍摄的图像,提出了一种基于深度学习的多模态图像配准解决方案。最近提出的许多最先进的配准技术都依赖于使用基于卢卡斯-卡纳德(LK)的解决方案来成功配准。然而,我们的研究表明,无需使用基于 LK 的方法,我们也能获得最先进的结果。我们的方法谨慎地利用了基于特征嵌入块的双分支卷积神经网络(CNN)。我们提出了两种方法的变体,在第一种变体(模型 A)中,我们只直接预测待对齐图像四个角的新坐标;而在第二种变体(模型 B)中,我们直接预测同构矩阵。与计算和匹配许多(关键)点相比,只对图像的四个角进行配准会迫使算法只匹配这四个角,因为后者可能会导致许多异常值,从而降低配准的准确性。我们在四个航空数据集上测试了我们提出的方法,并与现有的基于深度 LK 的架构进行了比较,得出了最先进的结果。
{"title":"VisIRNet: Deep Image Alignment for UAV-taken Visible and Infrared Image Pairs","authors":"Sedat Ozer, A. P. Ndigande","doi":"10.48550/arXiv.2402.09635","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09635","url":null,"abstract":"This paper proposes a deep learning based solution for multi-modal image alignment regarding UAV-taken images. Many recently proposed state-of-the-art alignment techniques rely on using Lucas-Kanade (LK) based solutions for a successful alignment. However, we show that we can achieve state of the art results without using LK-based methods. Our approach carefully utilizes a two-branch based convolutional neural network (CNN) based on feature embedding blocks. We propose two variants of our approach, where in the first variant (ModelA), we directly predict the new coordinates of only the four corners of the image to be aligned; and in the second one (ModelB), we predict the homography matrix directly. Applying alignment on the image corners forces algorithm to match only those four corners as opposed to computing and matching many (key)points, since the latter may cause many outliers, yielding less accurate alignment. We test our proposed approach on four aerial datasets and obtain state of the art results, when compared to the existing recent deep LK-based architectures.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inversion of limited-aperture Fresnel experimental data using orthogonality sampling method with single and multiple sources 使用正交采样法对有限孔径菲涅尔实验数据进行单源和多源反演
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09740
Won-Kwang Park
In this study, we consider the application of orthogonality sampling method (OSM) with single and multiple sources for a fast identification of small objects in limited-aperture inverse scattering problem. We first apply the OSM with single source and show that the indicator function with single source can be expressed by the Bessel function of order zero of the first kind, infinite series of Bessel function of nonzero integer order of the first kind, range of signal receiver, and the location of emitter. Based on this result, we explain that the objects can be identified through the OSM with single source but the identification is significantly influenced by the location of source and applied frequency. For a successful improvement, we then consider the OSM with multiple sources. Based on the identified structure of the OSM with single source, we design an indicator function of the OSM with multiple sources and show that it can be expressed by the square of the Bessel function of order zero of the first kind an infinite series of the square of Bessel function of nonzero integer order of the first kind. Based on the theoretical results, we explain that the objects can be identified uniquely through the designed OSM. Several numerical experiments with experimental data provided by the Institute Fresnel demonstrate the pros and cons of the OSM with single source and how the designed OSM with multiple sources behave.
在本研究中,我们考虑应用单源和多源的正交采样法(OSM)来快速识别有限孔径反向散射问题中的小物体。我们首先应用了单源的正交采样法,结果表明单源的指示函数可以用第一类零阶贝塞尔函数、第一类非零整数阶贝塞尔函数的无穷级数、信号接收器的范围和发射器的位置来表示。基于这一结果,我们解释说,通过单源 OSM 可以识别物体,但识别效果受到源位置和应用频率的显著影响。为了成功改进,我们随后考虑了多源 OSM。根据单源 OSM 的识别结构,我们设计了多源 OSM 的指示函数,并证明它可以用第一类零阶贝塞尔函数的平方和第一类非零整数阶贝塞尔函数平方的无穷级数来表示。基于理论结果,我们解释了通过设计的 OSM 可以唯一地识别物体。利用菲涅尔研究所提供的实验数据进行的几项数值实验证明了单光源 OSM 的优缺点,以及所设计的多光源 OSM 的性能。
{"title":"Inversion of limited-aperture Fresnel experimental data using orthogonality sampling method with single and multiple sources","authors":"Won-Kwang Park","doi":"10.48550/arXiv.2402.09740","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09740","url":null,"abstract":"In this study, we consider the application of orthogonality sampling method (OSM) with single and multiple sources for a fast identification of small objects in limited-aperture inverse scattering problem. We first apply the OSM with single source and show that the indicator function with single source can be expressed by the Bessel function of order zero of the first kind, infinite series of Bessel function of nonzero integer order of the first kind, range of signal receiver, and the location of emitter. Based on this result, we explain that the objects can be identified through the OSM with single source but the identification is significantly influenced by the location of source and applied frequency. For a successful improvement, we then consider the OSM with multiple sources. Based on the identified structure of the OSM with single source, we design an indicator function of the OSM with multiple sources and show that it can be expressed by the square of the Bessel function of order zero of the first kind an infinite series of the square of Bessel function of nonzero integer order of the first kind. Based on the theoretical results, we explain that the objects can be identified uniquely through the designed OSM. Several numerical experiments with experimental data provided by the Institute Fresnel demonstrate the pros and cons of the OSM with single source and how the designed OSM with multiple sources behave.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reproducing, Extending, and Analyzing Naming Experiments 复制、扩展和分析命名实验
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10022
Rachel Alpern, Ido Lazer, Issar Tzachor, Hanit Hakim, Sapir Weissbuch, D. Feitelson
Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.
命名在软件开发中非常重要,因为名称往往是代码意图的唯一载体。最近一项关于开发人员如何选择名称的研究收集了不同开发人员为相同对象所起的名称。这使得我们能够对这些名称的多样性和结构进行研究,并构建名称创建模型。我们在三个独立实验中重现了这项研究的不同部分。重要的是,我们采用了不同的方法,而不是力求完全相同。当获得相同的结果时,我们就会增强对其有效性的信心,证明这些结果并不依赖于方法。我们的结果确实证实了原始研究的结果,包括名称的多样性、两个开发人员选择相同名称的概率较低,以及发现有经验的开发人员倾向于使用比没有经验的学生稍长的名称。我们通过对名称进行新的分析来解释名称的多样性,并将名称中代表的概念分为普遍概念(一致同意)、替代概念(反映了对某一主题的不同看法)或可选概念(反映了对是否包含这一概念的不同看法)。这种分类为命名决策中的考虑因素提供了新的研究方向。我们还表明,明确使用原始研究中提出的模型来指导命名会产生更好的名称,而仅仅要求参与者使用更长、更详细的名称这种简单的方法则不会。
{"title":"Reproducing, Extending, and Analyzing Naming Experiments","authors":"Rachel Alpern, Ido Lazer, Issar Tzachor, Hanit Hakim, Sapir Weissbuch, D. Feitelson","doi":"10.48550/arXiv.2402.10022","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10022","url":null,"abstract":"Naming is very important in software development, as names are often the only vehicle of meaning about what the code is intended to do. A recent study on how developers choose names collected the names given by different developers for the same objects. This enabled a study of these names' diversity and structure, and the construction of a model of how names are created. We reproduce different parts of this study in three independent experiments. Importantly, we employ methodological variations rather than striving of an exact replication. When the same results are obtained this then boosts our confidence in their validity by demonstrating that they do not depend on the methodology. Our results indeed corroborate those of the original study in terms of the diversity of names, the low probability of two developers choosing the same name, and the finding that experienced developers tend to use slightly longer names than inexperienced students. We explain name diversity by performing a new analysis of the names, classifying the concepts represented in them as universal (agreed upon), alternative (reflecting divergent views on a topic), or optional (reflecting divergent opinions on whether to include this concept at all). This classification enables new research directions concerning the considerations involved in naming decisions. We also show that explicitly using the model proposed in the original study to guide naming leads to the creation of better names, whereas the simpler approach of just asking participants to use longer and more detailed names does not.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characterizing Role Models in Software Practitioners' Career: An Interview Study 描述软件从业人员职业生涯中的榜样:访谈研究
Pub Date : 2024-02-15 DOI: 10.1145/3641822.3641883
Mary S'anchez-Gord'on, Ricardo Colomo Palacios, Alex Sanchez Gordon
A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what characteristics are valued by participants in their role models. To do so, authors use a thematic coding approach to analyze the data collected from interviewing ten Latin American software practitioners. Findings reveal that role models were perceived as sources of knowledge, yet the majority of participants, regardless of their career stage, displayed a stronger interest in the human side and the moral values that their role models embodied. This study also shows that any practitioner can be viewed as a role model.
榜样是指在价值观、行为、成就和个人特征等方面作为他人学习榜样的人。在本文中,作者研究了榜样如何影响软件从业人员的职业生涯,这是以前的文献中没有研究过的一个方面。通过这项研究,作者旨在了解是否存在任何突出的榜样原型,以及参与者重视榜样的哪些特征。为此,作者采用主题编码方法,对采访十位拉美软件从业人员收集到的数据进行了分析。研究结果表明,榜样被视为知识的源泉,但大多数参与者,无论其职业阶段如何,都对榜样所体现的人性一面和道德价值观表现出更浓厚的兴趣。这项研究还表明,任何从业人员都可以被视为榜样。
{"title":"Characterizing Role Models in Software Practitioners' Career: An Interview Study","authors":"Mary S'anchez-Gord'on, Ricardo Colomo Palacios, Alex Sanchez Gordon","doi":"10.1145/3641822.3641883","DOIUrl":"https://doi.org/10.1145/3641822.3641883","url":null,"abstract":"A role model is a person who serves as an example for others to follow, especially in terms of values, behavior, achievements, and personal characteristics. In this paper, authors study how role models influence software practitioners careers, an aspect not studied in the literature before. By means of this study, authors aim to understand if there are any salient role model archetypes and what characteristics are valued by participants in their role models. To do so, authors use a thematic coding approach to analyze the data collected from interviewing ten Latin American software practitioners. Findings reveal that role models were perceived as sources of knowledge, yet the majority of participants, regardless of their career stage, displayed a stronger interest in the human side and the moral values that their role models embodied. This study also shows that any practitioner can be viewed as a role model.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthogonal Time Frequency Space for Integrated Sensing and Communication: A Survey 用于综合传感与通信的正交时频空间:调查
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09637
Eyad Shtaiwi, Ahmed Abdelhadi, Husheng Li, Zhu Han, H. V. Poor
Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.
如欧洲 6G 旗舰项目 Hexa-X 所述,第六代(6G)无线通信系统预计将实现智能、通信、传感、定位和计算的集成。这种集成的一个重要方面是综合传感和通信(ISAC),即传感和通信两个系统使用相同的波形,以应对频谱稀缺的挑战。最近,有人提出了正交时频空间(OTFS)波形,以解决 OFDM 因未来某些无线通信系统中的高多普勒频差而受到的限制。在本文中,我们回顾了用于 ISAC 系统的现有 OTFS 波形,并对未来的研究提出了一些见解。首先,我们介绍了 OTFS 的基本原理和系统模型,并对这一创新技术的核心概念和架构提供了基础性的理解。随后,我们概述了基于 OTFS 的 ISAC 系统框架。我们全面回顾了 OTFS 辅助 ISAC 系统领域的最新研究进展和技术现状,以全面了解当前的格局和进展。此外,我们还对支持 OTFS 的 ISAC 操作和传统 OFDM 进行了全面比较,突出强调了 OTFS 的独特优势,尤其是在高多普勒传播情况下。随后,我们讨论了基于 OTFS 的 ISAC 系统面临的主要挑战,指出了潜在的局限性和缺点。最后,我们提出了未来的研究方向,旨在激发 6G 无线通信领域的进一步创新。
{"title":"Orthogonal Time Frequency Space for Integrated Sensing and Communication: A Survey","authors":"Eyad Shtaiwi, Ahmed Abdelhadi, Husheng Li, Zhu Han, H. V. Poor","doi":"10.48550/arXiv.2402.09637","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09637","url":null,"abstract":"Sixth-generation (6G) wireless communication systems, as stated in the European 6G flagship project Hexa-X, are anticipated to feature the integration of intelligence, communication, sensing, positioning, and computation. An important aspect of this integration is integrated sensing and communication (ISAC), in which the same waveform is used for both systems both sensing and communication, to address the challenge of spectrum scarcity. Recently, the orthogonal time frequency space (OTFS) waveform has been proposed to address OFDM's limitations due to the high Doppler spread in some future wireless communication systems. In this paper, we review existing OTFS waveforms for ISAC systems and provide some insights into future research. Firstly, we introduce the basic principles and a system model of OTFS and provide a foundational understanding of this innovative technology's core concepts and architecture. Subsequently, we present an overview of OTFS-based ISAC system frameworks. We provide a comprehensive review of recent research developments and the current state of the art in the field of OTFS-assisted ISAC systems to gain a thorough understanding of the current landscape and advancements. Furthermore, we perform a thorough comparison between OTFS-enabled ISAC operations and traditional OFDM, highlighting the distinctive advantages of OTFS, especially in high Doppler spread scenarios. Subsequently, we address the primary challenges facing OTFS-based ISAC systems, identifying potential limitations and drawbacks. Then, finally, we suggest future research directions, aiming to inspire further innovation in the 6G wireless communication landscape.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields Reg-NF:神经场内隐含曲面的高效注册
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09722
Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam
Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.
神经场是一种基于坐标的神经网络,最近在隐式表示场景方面大受欢迎。与基于显式表示(如点云)的传统方法相比,神经场提供了一种连续的场景表示,能够以一种紧凑的方式表示三维几何和外观,是机器人应用的理想选择。然而,此前通过直接利用这些连续的隐式表示来研究多个神经场注册的方法非常有限。在本文中,我们介绍了 Reg-NF,这是一种基于神经场的配准方法,可优化两个任意神经场之间的相对 6-DoF 变换,即使这两个神经场具有不同的比例因子。Reg-NF 的关键组成部分包括双向配准损失、多视角表面采样和利用体积符号距离函数 (SDF)。我们在一个用于评估配准问题的新神经场数据集上展示了我们的方法。我们提供了一套详尽的实验和消融研究,以确定我们方法的性能,同时还讨论了局限性,为研究界在无约束环境中利用神经场的公开挑战提供了未来方向。
{"title":"Reg-NF: Efficient Registration of Implicit Surfaces within Neural Fields","authors":"Stephen Hausler, David Hall, Sutharsan Mahendren, Peyman Moghadam","doi":"10.48550/arXiv.2402.09722","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09722","url":null,"abstract":"Neural fields, coordinate-based neural networks, have recently gained popularity for implicitly representing a scene. In contrast to classical methods that are based on explicit representations such as point clouds, neural fields provide a continuous scene representation able to represent 3D geometry and appearance in a way which is compact and ideal for robotics applications. However, limited prior methods have investigated registering multiple neural fields by directly utilising these continuous implicit representations. In this paper, we present Reg-NF, a neural fields-based registration that optimises for the relative 6-DoF transformation between two arbitrary neural fields, even if those two fields have different scale factors. Key components of Reg-NF include a bidirectional registration loss, multi-view surface sampling, and utilisation of volumetric signed distance functions (SDFs). We showcase our approach on a new neural field dataset for evaluating registration problems. We provide an exhaustive set of experiments and ablation studies to identify the performance of our approach, while also discussing limitations to provide future direction to the research community on open challenges in utilizing neural fields in unconstrained environments.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition LLMs as Bridges:重构基础多模态命名实体识别
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09989
Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan
Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) Module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.
基础多模态命名实体识别(GMNER)是一项新兴的多模态任务,旨在识别命名实体、实体类型及其相应的视觉区域。GMNER 任务有两个具有挑战性的特性:1) 社交媒体中图像-文本对之间的相关性很弱,这导致相当一部分命名实体是不成立的。2) 类似任务中常用的粗粒度指代表达(如短语定位、指代表达理解)与细粒度命名实体之间存在区别。在本文中,我们提出了 RiVEG 这一统一框架,通过利用大型语言模型(LLM)作为连接桥梁,将 GMNER 重新表述为 MNER-VE-VG 联合任务。这种重构带来了两个好处:1) 它保持了最佳的 MNER 性能,并且无需使用对象检测方法来预先提取区域特征,从而自然而然地解决了现有 GMNER 方法的两大局限性。2) 引入实体扩展表达式和 Visual Entailment(VE)模块,将视觉接地(VG)和实体接地(EG)统一起来。它使 RiVEG 能够毫不费力地继承任何当前或未来多模态预训练模型的 Visual Entailment 和 Visual Grounding 功能。广泛的实验证明,在现有的 GMNER 数据集上,RiVEG 的表现优于最先进的方法,并在所有三个子任务中分别取得了 10.65%、6.21% 和 8.83% 的绝对领先优势。
{"title":"LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition","authors":"Jinyuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan","doi":"10.48550/arXiv.2402.09989","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09989","url":null,"abstract":"Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coarse-grained referring expressions commonly used in similar tasks (e.g., phrase localization, referring expression comprehension) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as a connecting bridge. This reformulation brings two benefits: 1) It maintains the optimal MNER performance and eliminates the need for employing object detection methods to pre-extract regional features, thereby naturally addressing two major limitations of existing GMNER methods. 2) The introduction of entity expansion expression and Visual Entailment (VE) Module unifies Visual Grounding (VG) and Entity Grounding (EG). It enables RiVEG to effortlessly inherit the Visual Entailment and Visual Grounding capabilities of any current or prospective multimodal pretraining models. Extensive experiments demonstrate that RiVEG outperforms state-of-the-art methods on the existing GMNER dataset and achieves absolute leads of 10.65%, 6.21%, and 8.83% in all three subtasks.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetry-Breaking Augmentations for Ad Hoc Teamwork 用于临时团队协作的对称性破坏增强技术
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09984
Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid
In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.
在许多协作环境中,人工智能(AI)代理必须能够适应使用未知或以前未观察到的策略的新队友。虽然这对人类来说通常很简单,但对人工智能代理来说却极具挑战性。例如,如果一个人工智能代理学会了与只在道路一侧驾驶的其他人(训练集)并肩驾驶,那么即使他们的行为只是沿着左右对称的方向翻转,它也可能难以调整这种经验来与对面的驾驶员协调。为了解决这个问题,我们引入了对称破缺增强(SBA),通过应用对称翻转操作来增加训练队友行为的多样性。通过学习对增强队友集的最佳响应,我们的代理可以接触到更广泛的行为惯例,从而在与新队友一起部署时提高性能。我们在两种环境中进行了实验演示,结果表明我们的方法改进了之前在具有挑战性的纸牌游戏 "花牌"(Hanabi)中的临时团队合作结果。我们还提出了一种通用指标,用于估算给定策略集之间的对称依赖性。
{"title":"Symmetry-Breaking Augmentations for Ad Hoc Teamwork","authors":"Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid","doi":"10.48550/arXiv.2402.09984","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09984","url":null,"abstract":"In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1