EFUF：用于减轻多模态大型语言模型中的幻觉的高效细粒度非学习框架

ArXiv Pub Date : 2024-02-15 DOI:10.48550/arXiv.2402.09801

Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai

{"title":"EFUF：用于减轻多模态大型语言模型中的幻觉的高效细粒度非学习框架","authors":"Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai","doi":"10.48550/arXiv.2402.09801","DOIUrl":null,"url":null,"abstract":"Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.","PeriodicalId":8425,"journal":{"name":"ArXiv","volume":"16 17","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models\",\"authors\":\"Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, Weihao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai\",\"doi\":\"10.48550/arXiv.2402.09801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.\",\"PeriodicalId\":8425,\"journal\":{\"name\":\"ArXiv\",\"volume\":\"16 17\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ArXiv\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2402.09801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.09801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去几年中，多模态大语言模型（MLLMs）吸引了越来越多的关注，但它们生成的描述仍可能包含相应图像中不存在的物体，这种现象被称为物体幻觉。为了消除幻觉，现有的方法是人工标注有幻觉和无幻觉的配对回答，然后采用各种配准算法来提高图像和文本之间的配准能力。然而，这些方法不仅在微调阶段需要大量计算资源，还需要昂贵的人工标注来构建配对算法所需的配对数据。为了解决这些问题，我们借鉴了 "解除学习"（unlearning）的思想，提出了一种高效的细粒度解除学习框架（EFUF），它无需配对数据就能消除幻觉。广泛的实验表明，我们的方法可以持续减少幻觉，同时保持生成质量，计算开销不大。我们的代码和数据集将公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination. To eliminate hallucinations, existing methods manually annotate paired responses with and without hallucinations, and then employ various alignment algorithms to improve the alignment capability between images and text. However, they not only demand considerable computation resources during the finetuning stage but also require expensive human annotation to construct paired data needed by the alignment algorithms. To address these issues, we borrow the idea of unlearning and propose an efficient fine-grained unlearning framework (EFUF), which can eliminate hallucinations without the need for paired data. Extensive experiments show that our method consistently reduces hallucinations while preserving the generation quality with modest computational overhead. Our code and datasets will be publicly available.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ArXiv

自引率

0.00%

发文量