首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Liquid: Language Models are Scalable and Unified Multi-Modal Generators 液态:语言模型是可扩展的和统一的多模态生成器
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1007/s11263-025-02639-5
Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai
We present Liquid, a versatile and native auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using any existing large language models (LLMs), eliminating the need for external pretrained visual modules such as CLIP and diffusion models. For the first time, Liquid reveals that the power-law scaling laws of unified multimodal models align with those observed in language models, and it discovers that the trade-offs between visual and language tasks diminish as model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We demonstrate that existing LLMs can serve as strong foundations for Liquid, saving training costs by 100times while surpassing Chameleon in multimodal capabilities. Compared to previous unified multimodal models, Liquid maintains on-par language performance comparable to mainstream LLMs like Llama2, preserving its potential as a foundational model. Building on this foundation, Liquid outperforms visual generation models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. The code and models are available at https://github.com/FoundationVision/Liquid.
我们提出了Liquid,一个通用的本地自动回归生成范例,它通过将图像标记为离散代码并在视觉和语言的共享特征空间中学习这些代码嵌入和文本标记来无缝集成视觉理解和生成。与之前的多模态大型语言模型(MLLM)不同,Liquid使用任何现有的大型语言模型(llm)来实现这种集成,从而消除了对外部预训练视觉模块(如CLIP和扩散模型)的需求。Liquid首次揭示了统一多模态模型的幂律缩放定律与语言模型中观察到的幂律缩放定律一致,并且发现视觉和语言任务之间的权衡随着模型大小的增加而减少。此外,统一的标记空间使视觉生成和理解任务相互增强,有效地消除了早期模型中常见的典型干扰。我们证明,现有的llm可以作为Liquid的坚实基础,节省培训成本100倍,同时在多式联运能力方面超过变色龙。与之前的统一多模态模型相比,Liquid保持了与Llama2等主流llm相当的语言性能,保留了其作为基础模型的潜力。在此基础上,Liquid优于SD v2.1和SD- xl (MJHQ-30K上的FID为5.47)等视觉生成模型,在视觉语言和纯文本任务中都表现出色。代码和模型可在https://github.com/FoundationVision/Liquid上获得。
{"title":"Liquid: Language Models are Scalable and Unified Multi-Modal Generators","authors":"Junfeng Wu, Yi Jiang, Chuofan Ma, Yuliang Liu, Hengshuang Zhao, Zehuan Yuan, Song Bai, Xiang Bai","doi":"10.1007/s11263-025-02639-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02639-5","url":null,"abstract":"We present Liquid, a versatile and native auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation by tokenizing images into discrete codes and learning these code embeddings alongside text tokens within a shared feature space for both vision and language. Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using any existing large language models (LLMs), eliminating the need for external pretrained visual modules such as CLIP and diffusion models. For the first time, Liquid reveals that the power-law scaling laws of unified multimodal models align with those observed in language models, and it discovers that the trade-offs between visual and language tasks diminish as model size increases. Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other, effectively removing the typical interference seen in earlier models. We demonstrate that existing LLMs can serve as strong foundations for Liquid, saving training costs by 100<italic>times</italic> while surpassing Chameleon in multimodal capabilities. Compared to previous unified multimodal models, Liquid maintains on-par language performance comparable to mainstream LLMs like Llama2, preserving its potential as a foundational model. Building on this foundation, Liquid outperforms visual generation models like SD v2.1 and SD-XL (FID of 5.47 on MJHQ-30K), excelling in both vision-language and text-only tasks. The code and models are available at <ext-link ext-link-type=\"uri\" xlink:href=\"https://github.com/FoundationVision/Liquid\">https://github.com/FoundationVision/Liquid</ext-link>.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"14 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145903757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Concept-Based Explanation for Deep Vision Models: A Comprehensive Survey on Techniques, Taxonomy, Applications, and Recent Advances 基于概念的深度视觉模型解释:技术、分类、应用和最新进展综述
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1007/s11263-025-02647-5
Razan Alharith, Jiashu Zhang, Ashraf Osman Ibrahim, Zhenyu Wu
Concept-based explanation represents an important yet rapidly evolving method aimed at enhancing the interpretability and transparency of deep learning models by clarifying their behaviors and predictions using understandable concepts. However, the current literature lacks a comprehensive survey and classification of the various strategies and methodologies employed to analyze these models. This paper aims to fill this gap by introducing a new taxonomy of concept-based explanation strategies. Following a thorough review of 101 relevant studies, a preliminary taxonomy was developed that groups strategies based on criteria such as data modality, level of supervision, model complexity, explanation scope, and model interpretability. Furthermore, we present a comprehensive evaluation of the advantages and limitations of various methodologies, as well as the datasets commonly used in this field. We also identify promising avenues for further exploration. Our study aims to serve as a useful tool for researchers and professionals interested in advancing concept-based explanation. Furthermore, we have built a GitHub project page that gathers key materials for concept-based explanations, which may be accessible through : https://github.com/razanalharith/Concept-Based-Explanation.
基于概念的解释是一种重要而快速发展的方法,旨在通过使用可理解的概念来澄清深度学习模型的行为和预测,从而提高深度学习模型的可解释性和透明度。然而,目前的文献缺乏对分析这些模型所采用的各种策略和方法的全面调查和分类。本文旨在通过引入一种新的基于概念的解释策略分类法来填补这一空白。在对101项相关研究进行全面回顾之后,我们开发了一个初步的分类法,该分类法根据数据模式、监督水平、模型复杂性、解释范围和模型可解释性等标准对策略进行分组。此外,我们对各种方法的优点和局限性以及该领域常用的数据集进行了全面的评估。我们还确定了进一步探索的有希望的途径。我们的研究旨在为有兴趣推进基于概念的解释的研究人员和专业人士提供有用的工具。此外,我们已经建立了一个GitHub项目页面,收集了基于概念的解释的关键材料,可以通过https://github.com/razanalharith/Concept-Based-Explanation访问。
{"title":"Concept-Based Explanation for Deep Vision Models: A Comprehensive Survey on Techniques, Taxonomy, Applications, and Recent Advances","authors":"Razan Alharith, Jiashu Zhang, Ashraf Osman Ibrahim, Zhenyu Wu","doi":"10.1007/s11263-025-02647-5","DOIUrl":"https://doi.org/10.1007/s11263-025-02647-5","url":null,"abstract":"Concept-based explanation represents an important yet rapidly evolving method aimed at enhancing the interpretability and transparency of deep learning models by clarifying their behaviors and predictions using understandable concepts. However, the current literature lacks a comprehensive survey and classification of the various strategies and methodologies employed to analyze these models. This paper aims to fill this gap by introducing a new taxonomy of concept-based explanation strategies. Following a thorough review of 101 relevant studies, a preliminary taxonomy was developed that groups strategies based on criteria such as data modality, level of supervision, model complexity, explanation scope, and model interpretability. Furthermore, we present a comprehensive evaluation of the advantages and limitations of various methodologies, as well as the datasets commonly used in this field. We also identify promising avenues for further exploration. Our study aims to serve as a useful tool for researchers and professionals interested in advancing concept-based explanation. Furthermore, we have built a GitHub project page that gathers key materials for concept-based explanations, which may be accessible through : <ext-link ext-link-type=\"uri\" xlink:href=\"https://github.com/razanalharith/Concept-Based-Explanation\">https://github.com/razanalharith/Concept-Based-Explanation</ext-link>.","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"30 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145903763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FourierMIL: Fourier Filtering-based Multiple Instance Learning for Whole Slide Image Analysis 基于傅立叶滤波的多实例学习全幻灯片图像分析
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-28 DOI: 10.1007/s11263-025-02679-x
Yi Zheng, Harsh Sharma, Margrit Betke, Jonathan D. Cherry, Jesse B. Mez, Jennifer E. Beane, Vijaya B. Kolachalama
{"title":"FourierMIL: Fourier Filtering-based Multiple Instance Learning for Whole Slide Image Analysis","authors":"Yi Zheng, Harsh Sharma, Margrit Betke, Jonathan D. Cherry, Jesse B. Mez, Jennifer E. Beane, Vijaya B. Kolachalama","doi":"10.1007/s11263-025-02679-x","DOIUrl":"https://doi.org/10.1007/s11263-025-02679-x","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"29 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UIL-AQA: Uncertainty-Aware Clip-Level Interpretable Action Quality Assessment UIL-AQA:不确定性感知剪辑级可解释动作质量评估
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-28 DOI: 10.1007/s11263-025-02638-6
Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert
This work proposes UIL-AQA for long-term Action Quality Assessment AQA designed to be clip-level interpretable and uncertainty-aware. AQA evaluates the execution quality of actions in videos. However, the complexity and diversity of actions, especially in long videos, increase the difficulty of AQA. Existing AQA methods solve this by limiting themselves generally to short-term videos. These approaches lack detailed semantic interpretation for individual clips and fail to account for the impact of human biases and subjectivity in the data during model training. Moreover, although query-based Transformer networks demonstrate strong capabilities in long-term modelling, their interpretability in AQA remains insufficient. This is primarily due to a phenomenon we identified, termed Temporal Skipping , where the model skips self-attention layers to prevent output degradation. We introduce an Attention Loss function and a Query Initialization Module to enhance the modelling capability of query-based Transformer networks. Additionally, we incorporate a Gaussian Noise Injection Module to simulate biases in human scoring, mitigating the influence of uncertainty and improving model reliability. Furthermore, we propose a Difficulty-Quality Regression Module, which decomposes each clip’s action score into independent difficulty and quality components, enabling a more fine-grained and interpretable evaluation. Our extensive quantitative and qualitative analysis demonstrates that our proposed method achieves state-of-the-art performance on three long-term real-world AQA datasets. Our code is available at: https://github.com/dx199771/Interpretability-AQA
这项工作提出了长期行动质量评估的UIL-AQA, AQA设计为剪辑级可解释和不确定性意识。AQA评估视频中动作的执行质量。然而,动作的复杂性和多样性,特别是在长视频中,增加了AQA的难度。现有的AQA方法通过将自己限制在短期视频中来解决这个问题。这些方法缺乏对单个片段的详细语义解释,并且无法解释模型训练期间数据中人为偏见和主观性的影响。此外,尽管基于查询的Transformer网络在长期建模方面表现出强大的能力,但它们在AQA中的可解释性仍然不足。这主要是由于我们发现的一种现象,称为时间跳跃,其中模型跳过自关注层以防止输出退化。为了提高基于查询的变压器网络的建模能力,我们引入了注意损失函数和查询初始化模块。此外,我们采用高斯噪声注入模块来模拟人类评分中的偏差,减轻不确定性的影响,提高模型的可靠性。此外,我们提出了一个难度-质量回归模块,该模块将每个片段的动作得分分解为独立的难度和质量组件,从而实现更细粒度和可解释的评估。我们广泛的定量和定性分析表明,我们提出的方法在三个长期的现实世界AQA数据集上实现了最先进的性能。我们的代码可在:https://github.com/dx199771/Interpretability-AQA
{"title":"UIL-AQA: Uncertainty-Aware Clip-Level Interpretable Action Quality Assessment","authors":"Xu Dong, Xinran Liu, Wanqing Li, Anthony Adeyemi-Ejeye, Andrew Gilbert","doi":"10.1007/s11263-025-02638-6","DOIUrl":"https://doi.org/10.1007/s11263-025-02638-6","url":null,"abstract":"This work proposes UIL-AQA for long-term Action Quality Assessment AQA designed to be clip-level interpretable and uncertainty-aware. AQA evaluates the execution quality of actions in videos. However, the complexity and diversity of actions, especially in long videos, increase the difficulty of AQA. Existing AQA methods solve this by limiting themselves generally to short-term videos. These approaches lack detailed semantic interpretation for individual clips and fail to account for the impact of human biases and subjectivity in the data during model training. Moreover, although query-based Transformer networks demonstrate strong capabilities in long-term modelling, their interpretability in AQA remains insufficient. This is primarily due to a phenomenon we identified, termed <jats:italic>Temporal Skipping</jats:italic> , where the model skips self-attention layers to prevent output degradation. We introduce an Attention Loss function and a Query Initialization Module to enhance the modelling capability of query-based Transformer networks. Additionally, we incorporate a Gaussian Noise Injection Module to simulate biases in human scoring, mitigating the influence of uncertainty and improving model reliability. Furthermore, we propose a Difficulty-Quality Regression Module, which decomposes each clip’s action score into independent difficulty and quality components, enabling a more fine-grained and interpretable evaluation. Our extensive quantitative and qualitative analysis demonstrates that our proposed method achieves state-of-the-art performance on three long-term real-world AQA datasets. Our code is available at: <jats:ext-link xmlns:xlink=\"http://www.w3.org/1999/xlink\" xlink:href=\"https://github.com/dx199771/Interpretability-AQA\" ext-link-type=\"uri\">https://github.com/dx199771/Interpretability-AQA</jats:ext-link>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"23 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Oulu Remote-photoplethysmography Presentation Attacks Database (OR-PAD) 奥卢远距光电脉搏描记表示攻击数据库(OR-PAD)
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-28 DOI: 10.1007/s11263-025-02588-z
Marko Savic, Guoying Zhao
{"title":"Oulu Remote-photoplethysmography Presentation Attacks Database (OR-PAD)","authors":"Marko Savic, Guoying Zhao","doi":"10.1007/s11263-025-02588-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02588-z","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"49 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145847151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution 热光谱分布正则化红外图像超分辨率Contourlet细化门框架
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-27 DOI: 10.1007/s11263-025-02668-0
Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, Jinyuan Liu, Peng Wang, Yanning Zhang
{"title":"Contourlet Refinement Gate Framework for Thermal Spectrum Distribution Regularized Infrared Image Super-Resolution","authors":"Yang Zou, Zhixin Chen, Zhipeng Zhang, Xingyuan Li, Long Ma, Jinyuan Liu, Peng Wang, Yanning Zhang","doi":"10.1007/s11263-025-02668-0","DOIUrl":"https://doi.org/10.1007/s11263-025-02668-0","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"8 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding 视频曼巴套件:状态空间模型作为视频理解的通用选择
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1007/s11263-025-02597-y
Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Jiahao Wang, Zhe Chen, Zhiqi Li, Tong Lu, Limin Wang
{"title":"Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding","authors":"Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Jiahao Wang, Zhe Chen, Zhiqi Li, Tong Lu, Limin Wang","doi":"10.1007/s11263-025-02597-y","DOIUrl":"https://doi.org/10.1007/s11263-025-02597-y","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"33 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreerCustom: Training-Free Multi-Concept Customization for Image and Video Generation FreerCustom:图像和视频生成的免费多概念定制
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1007/s11263-025-02623-z
Canyu Zhao, Ganggui Ding, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen
{"title":"FreerCustom: Training-Free Multi-Concept Customization for Image and Video Generation","authors":"Canyu Zhao, Ganggui Ding, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen","doi":"10.1007/s11263-025-02623-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02623-z","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"184 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145829948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Special Issue on Visual Datasets 客座社论:关于可视化数据集的特刊
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1007/s11263-025-02722-x
Xin Zhao, Liang Zheng, Qiang Qiu, Yin Li, Limin Wang, José Lezama, Qiuhong Ke, Yongchan Kwon, Ruoxi Jia, Jungong Han
{"title":"Guest Editorial: Special Issue on Visual Datasets","authors":"Xin Zhao, Liang Zheng, Qiang Qiu, Yin Li, Limin Wang, José Lezama, Qiuhong Ke, Yongchan Kwon, Ruoxi Jia, Jungong Han","doi":"10.1007/s11263-025-02722-x","DOIUrl":"https://doi.org/10.1007/s11263-025-02722-x","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"48 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs 注意力再分配:面向零成本可控幻觉缓解的传销营销
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-26 DOI: 10.1007/s11263-025-02607-z
Chongjun Tu, Peng Ye, Dongzhan Zhou, Lei Bai, Gang Yu, Tao Chen, Wanli Ouyang
{"title":"Attention Reallocation: Towards Zero-cost and Controllable Hallucination Mitigation of MLLMs","authors":"Chongjun Tu, Peng Ye, Dongzhan Zhou, Lei Bai, Gang Yu, Tao Chen, Wanli Ouyang","doi":"10.1007/s11263-025-02607-z","DOIUrl":"https://doi.org/10.1007/s11263-025-02607-z","url":null,"abstract":"","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"30 1","pages":""},"PeriodicalIF":19.5,"publicationDate":"2025-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145836245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1