Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs

Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang
{"title":"Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs","authors":"Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang","doi":"arxiv-2409.10994","DOIUrl":null,"url":null,"abstract":"The rapid advancement of Multimodal Large Language Models (MLLMs) has led to\nremarkable performances across various domains. However, this progress is\naccompanied by a substantial surge in the resource consumption of these models.\nWe address this pressing issue by introducing a new approach, Token Reduction\nusing CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without\nsacrificing their performance. Inspired by human attention patterns in Visual\nQuestion Answering (VQA) tasks, TRIM presents a fresh perspective on the\nselection and reduction of image tokens. The TRIM method has been extensively\ntested across 12 datasets, and the results demonstrate a significant reduction\nin computational overhead while maintaining a consistent level of performance.\nThis research marks a critical stride in efficient MLLM development, promoting\ngreater accessibility and sustainability of high-performing models.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"54 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10994","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid advancement of Multimodal Large Language Models (MLLMs) has led to remarkable performances across various domains. However, this progress is accompanied by a substantial surge in the resource consumption of these models. We address this pressing issue by introducing a new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their performance. Inspired by human attention patterns in Visual Question Answering (VQA) tasks, TRIM presents a fresh perspective on the selection and reduction of image tokens. The TRIM method has been extensively tested across 12 datasets, and the results demonstrate a significant reduction in computational overhead while maintaining a consistent level of performance. This research marks a critical stride in efficient MLLM development, promoting greater accessibility and sustainability of high-performing models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
少即是多:一种简单而有效的标记减少方法,可实现高效的多模态 LLM
多模态大语言模型(MLLMs)的快速发展使其在各个领域都取得了令人瞩目的成绩。为了解决这一紧迫问题,我们引入了一种新方法--使用 CLIP 度量的标记减少法(TRIM),旨在提高 MLLM 的效率,同时不影响其性能。受视觉问题解答(VQA)任务中人类注意力模式的启发,TRIM 为图像标记的选择和减少提供了一个全新的视角。TRIM 方法已在 12 个数据集上进行了广泛测试,结果表明在保持性能水平一致的同时显著降低了计算开销。这项研究标志着在高效 MLLM 开发方面迈出了关键一步,促进了高性能模型的更大可及性和可持续性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1