首页 > 最新文献

ArXiv最新文献

英文 中文
EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models EFUF:用于减轻多模态大型语言模型中的幻觉的高效细粒度非学习框架
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09801
Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai
Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.
在过去几年中,多模态大语言模型(MLLMs)吸引了越来越多的关注,但它们生成的描述仍可能包含相应图像中不存在的物体,这种现象被称为物体幻觉。为了消除幻觉,现有的方法是人工标注有幻觉和无幻觉的配对回答,然后采用各种配准算法来提高图像和文本之间的配准能力。然而,这些方法不仅在微调阶段需要大量计算资源,还需要昂贵的人工标注来构建配对算法所需的配对数据。为了解决这些问题,我们借鉴了 "解除学习"(unlearning)的思想,提出了一种高效的细粒度解除学习框架(EFUF),它无需配对数据就能消除幻觉。广泛的实验表明,我们的方法可以持续减少幻觉,同时保持生成质量,计算开销不大。我们的代码和数据集将公开发布。
{"title":"EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models","authors":"Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai","doi":"10.48550/arXiv.2402.09801","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09801","url":null,"abstract":"Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Flawed is ECE? An Analysis via Logit Smoothing 欧洲经委会有多大缺陷?对数平滑分析
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.10046
Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov
Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.
非正式地讲,如果模型预测正确的概率与预测的置信度相匹配,那么该模型就是经过校准的。迄今为止,文献中最常用的校准测量方法是预期校准误差(ECE)。然而,最近的研究指出了 ECE 的缺点,例如它在预测因子空间中是不连续的。在这项工作中,我们要问:这些问题有多根本,它们对现有结果有什么影响?为此,我们完全描述了 ECE 在波兰空间上的一般概率度量的不连续性。然后,我们利用这些不连续性的性质,提出了一种新颖的连续、易于估计的误判度量,我们称之为 Logit 平滑 ECE (LS-ECE)。通过比较预先训练好的图像分类模型的 ECE 和 LS-ECE,我们在初步实验中发现,二进制 ECE 与 LS-ECE 非常接近,这表明 ECE 的理论缺陷在实践中是可以避免的。
{"title":"How Flawed is ECE? An Analysis via Logit Smoothing","authors":"Muthu Chidambaram, Holden Lee, Colin McSwiggen, Semon Rezchikov","doi":"10.48550/arXiv.2402.10046","DOIUrl":"https://doi.org/10.48550/arXiv.2402.10046","url":null,"abstract":"Informally, a model is calibrated if its predictions are correct with a probability that matches the confidence of the prediction. By far the most common method in the literature for measuring calibration is the expected calibration error (ECE). Recent work, however, has pointed out drawbacks of ECE, such as the fact that it is discontinuous in the space of predictors. In this work, we ask: how fundamental are these issues, and what are their impacts on existing results? Towards this end, we completely characterize the discontinuities of ECE with respect to general probability measures on Polish spaces. We then use the nature of these discontinuities to motivate a novel continuous, easily estimated miscalibration metric, which we term Logit-Smoothed ECE (LS-ECE). By comparing the ECE and LS-ECE of pre-trained image classification models, we show in initial experiments that binned ECE closely tracks LS-ECE, indicating that the theoretical pathologies of ECE may be avoidable in practice.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules 使用自动学习翻译规则的系统级动态二进制翻译器
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09688
Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, P. Yew, Weihua Zhang
System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.
系统级仿真器被广泛用于系统设计、调试和评估。系统级仿真器的工作原理是提供一个系统级虚拟机,以支持在使用相同或不同指令集架构的平台上运行的客户操作系统(OS)。对于这种系统级仿真,动态二进制转换(DBT)是核心技术之一。最近提出的一种基于学习的 DBT 方法显示,使用自动学习的翻译规则,性能显著提高,翻译代码的质量也更高。然而,这种方法只应用于用户级仿真,尚未应用于系统级仿真。在本文中,我们探讨了应用这种方法改进系统级仿真的可行性,并使用 QEMU 构建了一个原型。...为了获得更好的性能,我们采用了多项优化措施,包括减少协调开销以降低每次协调的开销,消除协调和代码调度以降低协调频率。实验结果表明,在以 SPEC CINT2006 为应用基准的系统仿真模式下,与 QEMU 6.1 相比,在协调开销可忽略不计的情况下,它的平均速度提高了 1.36 倍,在实际应用中提高了 1.15 倍。
{"title":"A System-Level Dynamic Binary Translator using Automatically-Learned Translation Rules","authors":"Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, P. Yew, Weihua Zhang","doi":"10.48550/arXiv.2402.09688","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09688","url":null,"abstract":"System-level emulators have been used extensively for system design, debugging and evaluation. They work by providing a system-level virtual machine to support a guest operating system (OS) running on a platform with the same or different native OS that uses the same or different instruction-set architecture. For such system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based DBT approach has shown a significantly improved performance with a higher quality of translated code using automatically learned translation rules. However, it has only been applied to user-level emulation, and not yet to system-level emulation. In this paper, we explore the feasibility of applying this approach to improve system-level emulation, and use QEMU to build a prototype. ... To achieve better performance, we leverage several optimizations that include coordination overhead reduction to reduce the overhead of each coordination, and coordination elimination and code scheduling to reduce the coordination frequency. Experimental results show that it can achieve an average of 1.36X speedup over QEMU 6.1 with negligible coordination overhead in the system emulation mode using SPEC CINT2006 as application benchmarks and 1.15X on real-world applications.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation 文本定位:分解多概念图像,实现主题驱动的文本到图像生成
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09966
Junjie Shentu, Matthew Watson, N. A. Moubayed
Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.
受试者驱动的文本到图像扩散模型使用户能够利用少量样本图像,根据预训练数据集中缺乏的新概念定制模型。然而,流行的主体驱动模型主要依赖于单一概念输入图像,在处理多概念输入图像时,在指定目标概念方面面临挑战。为此,我们引入了文本本地化文本到图像模型(Texual Localization)来处理多概念输入图像。在微调过程中,我们的方法采用了一种新颖的交叉注意引导来分解多个概念,在目标概念的视觉表示和文本提示中的标识符号之间建立了明显的联系。实验结果表明,在多概念输入图像的图像保真度和图像-文本对齐方面,我们的方法优于或相当于基线模型。与 "自定义扩散 "相比,在单概念和多概念生成中,我们的方法在硬引导下的 CLIP-I 分数分别高出 7.04% 和 8.13%,CLIP-T 分数分别高出 2.22% 和 5.85%。值得注意的是,我们的方法生成的交叉注意图与生成图像中的目标概念一致,这是现有模型所不具备的能力。
{"title":"Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation","authors":"Junjie Shentu, Matthew Watson, N. A. Moubayed","doi":"10.48550/arXiv.2402.09966","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09966","url":null,"abstract":"Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input images, facing challenges in specifying the target concept when dealing with multi-concept input images. To this end, we introduce a textual localized text-to-image model (Texual Localization) to handle multi-concept input images. During fine-tuning, our method incorporates a novel cross-attention guidance to decompose multiple concepts, establishing distinct connections between the visual representation of the target concept and the identifier token in the text prompt. Experimental results reveal that our method outperforms or performs comparably to the baseline models in terms of image fidelity and image-text alignment on multi-concept input images. In comparison to Custom Diffusion, our method with hard guidance achieves CLIP-I scores that are 7.04%, 8.13% higher and CLIP-T scores that are 2.22%, 5.85% higher in single-concept and multi-concept generation, respectively. Notably, our method generates cross-attention maps consistent with the target concept in the generated images, a capability absent in existing models.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming TEXTRON:通过数据编程进行弱监督多语言文本检测
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09811
Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, P. Chaudhuri, Ganesh Ramakrishnan
Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON
最近几种基于深度学习(DL)的技术在基于图像的多语言文本检测方面表现相当出色。然而,它们的性能在很大程度上取决于训练数据的可用性和质量。页面级文档图像种类繁多,包含多种模式、语言、字体和布局的信息。这使得文本检测成为计算机视觉(CV)领域的一个挑战性问题,尤其是对于低资源或手写语言。此外,用于文本检测的单词级标记数据非常稀缺,特别是对于多语言环境和包含印刷和手写文本的印度脚本。按照惯例,印度文字文本检测需要在大量标注数据上训练 DL 模型,但据我们所知,目前还没有相关的数据集。对这些数据进行人工标注需要大量的时间、精力和专业知识。为了解决这个问题,我们提出了基于数据编程的 TEXTRON 方法,用户可以将各种文本检测方法插入基于弱监督的学习框架中。我们可以将这种多语言文本检测方法视为不同的基于 CV 的技术和 DL 方法的集合。TEXTRON 可以利用在大量语言数据上预先训练好的 DL 模型的预测结果,结合基于 CV 的方法来改进其他语言的文本检测。我们证明,尽管缺乏相应的标记数据,TEXTRON 仍能提高以印度语言编写的文档的检测性能。此外,通过广泛的实验,我们展示了我们的方法对当前最先进(SOTA)模型所带来的改进,尤其是在手写 Devanagari 文本方面。代码和数据集可从 https://github.com/IITB-LEAP-OCR/TEXTRON 获取。
{"title":"TEXTRON: Weakly Supervised Multilingual Text Detection through Data Programming","authors":"Dhruv Kudale, Badri Vishal Kasuba, Venkatapathy Subramanian, P. Chaudhuri, Ganesh Ramakrishnan","doi":"10.48550/arXiv.2402.09811","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09811","url":null,"abstract":"Several recent deep learning (DL) based techniques perform considerably well on image-based multilingual text detection. However, their performance relies heavily on the availability and quality of training data. There are numerous types of page-level document images consisting of information in several modalities, languages, fonts, and layouts. This makes text detection a challenging problem in the field of computer vision (CV), especially for low-resource or handwritten languages. Furthermore, there is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts that incorporate both printed and handwritten text. Conventionally, Indian script text detection requires training a DL model on plenty of labeled data, but to the best of our knowledge, no relevant datasets are available. Manual annotation of such data requires a lot of time, effort, and expertise. In order to solve this problem, we propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework. One can view this approach to multilingual text detection as an ensemble of different CV-based techniques and DL approaches. TEXTRON can leverage the predictions of DL models pre-trained on a significant amount of language data in conjunction with CV-based methods to improve text detection in other languages. We demonstrate that TEXTRON can improve the detection performance for documents written in Indian languages, despite the absence of corresponding labeled data. Further, through extensive experimentation, we show improvement brought about by our approach over the current State-of-the-art (SOTA) models, especially for handwritten Devanagari text. Code and dataset has been made available at https://github.com/IITB-LEAP-OCR/TEXTRON","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategic Vote Timing in Online Elections With Public Tallies 有公开计票的在线选举中的战略性投票时间安排
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09776
Aviv Yaish, S. Abramova, Rainer Bohme
We study the effect of public tallies on online elections, in a setting where voting is costly and voters are allowed to strategically time their votes. The strategic importance of choosing emph{when} to vote arises when votes are public, such as in online event scheduling polls (e.g., Doodle), or in blockchain governance mechanisms. In particular, there is a tension between voting early to influence future votes and waiting to observe interim results and avoid voting costs if the outcome has already been decided. Our study draws on empirical findings showing that"temporal"bandwagon effects occur when interim results are revealed to the electorate: late voters are more likely to vote for leading candidates. To capture this phenomenon, we analyze a novel model where the electorate consists of informed voters who have a preferred candidate, and uninformed swing voters who can be swayed according to the interim outcome at the time of voting. In our main results, we prove the existence of equilibria where both early and late voting occur with a positive probability, and we characterize conditions that lead to the appearance of"last minute"voting behavior, where all informed voters vote late.
我们研究了在投票成本高昂且允许投票人战略性地安排投票时间的情况下,公开计票对在线选举的影响。当投票公开时,例如在线活动安排投票(如 Doodle)或区块链治理机制中,选择投票时间的战略重要性就显现出来了。尤其是,在提前投票以影响未来投票与等待观察中期结果并在结果已定的情况下避免投票成本之间存在矛盾。我们的研究借鉴了经验研究结果,这些研究结果表明,当中期结果向选民公布时,会出现 "时间 "带头效应:较晚投票的人更有可能投票给领先的候选人。为了捕捉这一现象,我们分析了一个新颖的模型,在该模型中,选民由知情的选民和不知情的摇摆选民组成,前者有自己中意的候选人,后者则会根据投票时的中期结果而被左右。在我们的主要结果中,我们证明了提早投票和延迟投票都以正概率发生的均衡状态的存在,并描述了导致出现 "最后一分钟 "投票行为(即所有知情选民都延迟投票)的条件。
{"title":"Strategic Vote Timing in Online Elections With Public Tallies","authors":"Aviv Yaish, S. Abramova, Rainer Bohme","doi":"10.48550/arXiv.2402.09776","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09776","url":null,"abstract":"We study the effect of public tallies on online elections, in a setting where voting is costly and voters are allowed to strategically time their votes. The strategic importance of choosing emph{when} to vote arises when votes are public, such as in online event scheduling polls (e.g., Doodle), or in blockchain governance mechanisms. In particular, there is a tension between voting early to influence future votes and waiting to observe interim results and avoid voting costs if the outcome has already been decided. Our study draws on empirical findings showing that\"temporal\"bandwagon effects occur when interim results are revealed to the electorate: late voters are more likely to vote for leading candidates. To capture this phenomenon, we analyze a novel model where the electorate consists of informed voters who have a preferred candidate, and uninformed swing voters who can be swayed according to the interim outcome at the time of voting. In our main results, we prove the existence of equilibria where both early and late voting occur with a positive probability, and we characterize conditions that lead to the appearance of\"last minute\"voting behavior, where all informed voters vote late.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GeoBotsVR: A Robotics Learning Game for Beginners with Hands-on Learning Simulation GeoBotsVR:面向初学者的机器人学习游戏,提供动手操作学习模拟
Pub Date : 2024-02-15 DOI: 10.1145/3613905.3648111
Syed Tanzim, Mubarrat
This article introduces GeoBotsVR, an easily accessible virtual reality game that combines elements of puzzle-solving with robotics learning and aims to cultivate interest and motivation in robotics, programming, and electronics among individuals with limited experience in these domains. The game allows players to build and customize a two-wheeled mobile robot using various robotic components and use their robot to solve various procedurally-generated puzzles in a diverse range of environments. An innovative aspect is the inclusion of a repair feature, requiring players to address randomly generated electronics and programming issues with their robot through hands-on manipulation. GeoBotsVR is designed to be immersive, replayable, and practical application-based, offering an enjoyable and accessible tool for beginners to acquaint themselves with robotics. The game simulates a hands-on learning experience and does not require prior technical knowledge, making it a potentially valuable resource for beginners to get an engaging introduction to the field of robotics.
本文介绍的 GeoBotsVR 是一款易于上手的虚拟现实游戏,它将解谜元素与机器人学习相结合,旨在培养机器人、编程和电子学方面经验有限的个人对这些领域的兴趣和动力。该游戏允许玩家使用各种机器人组件构建和定制一个双轮移动机器人,并使用他们的机器人在各种不同的环境中解决各种程序生成的谜题。游戏的创新之处在于加入了维修功能,要求玩家通过动手操作来解决随机生成的机器人电子和编程问题。GeoBotsVR 的设计具有身临其境、可重玩和基于实际应用的特点,为初学者熟悉机器人技术提供了一个轻松愉快的工具。游戏模拟了动手学习的体验,而且不需要事先掌握技术知识,因此对于初学者来说,这是一个潜在的宝贵资源,可以让他们对机器人技术领域有一个引人入胜的了解。
{"title":"GeoBotsVR: A Robotics Learning Game for Beginners with Hands-on Learning Simulation","authors":"Syed Tanzim, Mubarrat","doi":"10.1145/3613905.3648111","DOIUrl":"https://doi.org/10.1145/3613905.3648111","url":null,"abstract":"This article introduces GeoBotsVR, an easily accessible virtual reality game that combines elements of puzzle-solving with robotics learning and aims to cultivate interest and motivation in robotics, programming, and electronics among individuals with limited experience in these domains. The game allows players to build and customize a two-wheeled mobile robot using various robotic components and use their robot to solve various procedurally-generated puzzles in a diverse range of environments. An innovative aspect is the inclusion of a repair feature, requiring players to address randomly generated electronics and programming issues with their robot through hands-on manipulation. GeoBotsVR is designed to be immersive, replayable, and practical application-based, offering an enjoyable and accessible tool for beginners to acquaint themselves with robotics. The game simulates a hands-on learning experience and does not require prior technical knowledge, making it a potentially valuable resource for beginners to get an engaging introduction to the field of robotics.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performative Reinforcement Learning in Gradually Shifting Environments 渐变环境中的表演强化学习
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09838
Ben Rank, Stelios Triantafyllou, Debmalya Mandal, Goran Radanovic
When Reinforcement Learning (RL) agents are deployed in practice, they might impact their environment and change its dynamics. Ongoing research attempts to formally model this phenomenon and to analyze learning algorithms in these models. To this end, we propose a framework where the current environment depends on the deployed policy as well as its previous dynamics. This is a generalization of Performative RL (PRL) [Mandal et al., 2023]. Unlike PRL, our framework allows to model scenarios where the environment gradually adjusts to a deployed policy. We adapt two algorithms from the performative prediction literature to our setting and propose a novel algorithm called Mixed Delayed Repeated Retraining (MDRR). We provide conditions under which these algorithms converge and compare them using three metrics: number of retrainings, approximation guarantee, and number of samples per deployment. Unlike previous approaches, MDRR combines samples from multiple deployments in its training. This makes MDRR particularly suitable for scenarios where the environment's response strongly depends on its previous dynamics, which are common in practice. We experimentally compare the algorithms using a simulation-based testbed and our results show that MDRR converges significantly faster than previous approaches.
在实践中部署强化学习(RL)代理时,它们可能会影响环境并改变其动态。正在进行的研究试图对这一现象进行正式建模,并分析这些模型中的学习算法。为此,我们提出了一个框架,在这个框架中,当前环境取决于所部署的策略及其之前的动态。这是对 Performative RL (PRL) [Mandal 等人,2023] 的概括。与 PRL 不同的是,我们的框架允许对环境逐渐适应已部署策略的场景进行建模。我们将执行预测文献中的两种算法调整到我们的环境中,并提出了一种名为混合延迟重复训练(MDRR)的新算法。我们提供了这些算法收敛的条件,并使用三个指标对它们进行了比较:重新训练次数、近似保证和每次部署的样本数。与之前的方法不同,MDRR 在训练中结合了多个部署的样本。这使得 MDRR 特别适用于环境响应强烈依赖于其先前动态的场景,而这在实践中很常见。我们使用基于仿真的测试平台对这两种算法进行了实验比较,结果表明 MDRR 的收敛速度明显快于之前的方法。
{"title":"Performative Reinforcement Learning in Gradually Shifting Environments","authors":"Ben Rank, Stelios Triantafyllou, Debmalya Mandal, Goran Radanovic","doi":"10.48550/arXiv.2402.09838","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09838","url":null,"abstract":"When Reinforcement Learning (RL) agents are deployed in practice, they might impact their environment and change its dynamics. Ongoing research attempts to formally model this phenomenon and to analyze learning algorithms in these models. To this end, we propose a framework where the current environment depends on the deployed policy as well as its previous dynamics. This is a generalization of Performative RL (PRL) [Mandal et al., 2023]. Unlike PRL, our framework allows to model scenarios where the environment gradually adjusts to a deployed policy. We adapt two algorithms from the performative prediction literature to our setting and propose a novel algorithm called Mixed Delayed Repeated Retraining (MDRR). We provide conditions under which these algorithms converge and compare them using three metrics: number of retrainings, approximation guarantee, and number of samples per deployment. Unlike previous approaches, MDRR combines samples from multiple deployments in its training. This makes MDRR particularly suitable for scenarios where the environment's response strongly depends on its previous dynamics, which are common in practice. We experimentally compare the algorithms using a simulation-based testbed and our results show that MDRR converges significantly faster than previous approaches.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence 生成式人工智能时代大型语言模型基准的不足之处
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09880
Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, M. Halgamuge
The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.
具有新兴功能的大型语言模型(LLM)迅速普及,激发了公众对评估和比较不同 LLM 的好奇心,导致许多研究人员提出了自己的 LLM 基准。我们注意到了这些基准的初步不足,于是开始了一项研究,在功能性和安全性的支柱下,使用我们新颖的统一评估框架,从人员、流程和技术的角度,对 23 个最先进的 LLM 基准进行了严格评估。我们的研究发现了重大的局限性,包括偏差、难以衡量真正的推理、适应性、实施不一致、及时工程的复杂性、评估者的多样性,以及在一次全面评估中忽视文化和意识形态规范。我们的讨论强调,鉴于人工智能(AI)的进步,迫切需要标准化方法、监管确定性和道德准则,包括倡导从静态基准发展到动态行为分析,以准确捕捉法学硕士的复杂行为和潜在风险。我们的研究强调,有必要转变本地语言学习者评估方法的范式,强调合作努力对于制定普遍接受的基准和促进人工智能系统融入社会的重要性。
{"title":"Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence","authors":"Timothy R. McIntosh, Teo Susnjak, Tong Liu, Paul Watters, M. Halgamuge","doi":"10.48550/arXiv.2402.09880","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09880","url":null,"abstract":"The rapid rise in popularity of Large Language Models (LLMs) with emerging capabilities has spurred public curiosity to evaluate and compare different LLMs, leading many researchers to propose their LLM benchmarks. Noticing preliminary inadequacies in those benchmarks, we embarked on a study to critically assess 23 state-of-the-art LLM benchmarks, using our novel unified evaluation framework through the lenses of people, process, and technology, under the pillars of functionality and security. Our research uncovered significant limitations, including biases, difficulties in measuring genuine reasoning, adaptability, implementation inconsistencies, prompt engineering complexity, evaluator diversity, and the overlooking of cultural and ideological norms in one comprehensive assessment. Our discussions emphasized the urgent need for standardized methodologies, regulatory certainties, and ethical guidelines in light of Artificial Intelligence (AI) advancements, including advocating for an evolution from static benchmarks to dynamic behavioral profiling to accurately capture LLMs' complex behaviors and potential risks. Our study highlighted the necessity for a paradigm shift in LLM evaluation methodologies, underlining the importance of collaborative efforts for the development of universally accepted benchmarks and the enhancement of AI systems' integration into society.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139963773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Why are Sensitive Functions Hard for Transformers? 为什么变压器难以实现敏感功能?
Pub Date : 2024-02-15 DOI: 10.48550/arXiv.2402.09963
Michael Hahn, Mark Rofin
Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.
实证研究发现了变换器的一系列可学习性偏差和局限性,例如在学习计算简单的形式语言(如 PARITY)时始终存在困难,而且偏向于低度函数。然而,理论上的理解仍然有限,现有的表现力理论要么过高预测了现实的学习能力,要么过低预测了现实的学习能力。我们证明,在变换器架构下,损失情况受到输入空间敏感性的限制:变压器的输出对输入字符串的许多部分都很敏感,因此会居住在参数空间的孤立点上,从而导致泛化过程中的低灵敏度偏差。我们从理论和实证角度证明,这一理论统一了关于变换器学习能力和偏差的大量实证观察结果,例如它们的泛化偏向于低灵敏度和低度,以及 PARITY 的长度泛化困难。这表明,要理解变换器的归纳偏差,不仅需要研究它们的原理表达能力,还需要研究它们的损失景观。
{"title":"Why are Sensitive Functions Hard for Transformers?","authors":"Michael Hahn, Mark Rofin","doi":"10.48550/arXiv.2402.09963","DOIUrl":"https://doi.org/10.48550/arXiv.2402.09963","url":null,"abstract":"Empirical studies have identified a range of learnability biases and limitations of transformers, such as a persistent difficulty in learning to compute simple formal languages such as PARITY, and a bias towards low-degree functions. However, theoretical understanding remains limited, with existing expressiveness theory either overpredicting or underpredicting realistic learning abilities. We prove that, under the transformer architecture, the loss landscape is constrained by the input-space sensitivity: Transformers whose output is sensitive to many parts of the input string inhabit isolated points in parameter space, leading to a low-sensitivity bias in generalization. We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY. This shows that understanding transformers' inductive biases requires studying not just their in-principle expressivity, but also their loss landscape.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139962134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1