首页 > 最新文献

Pattern Recognition最新文献

英文 中文
AdvCloak: Customized adversarial cloak for privacy protection AdvCloak:用于隐私保护的定制对抗性斗篷
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111050
Xuannan Liu, Yaoyao Zhong, Xing Cui, Yuhang Zhang, Peipei Li, Weihong Deng
With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalization ability. Specifically, AdvCloak sequentially optimizes the generative adversarial networks by employing a two-stage training strategy. This strategy initially focuses on adapting the masks to the unique individual faces and then enhances their feature-level generalization ability to diverse facial variations of individuals. To fully utilize the limited training data, we combine AdvCloak with several general geometric modeling methods, to better describe the feature subspace of source identities. Extensive quantitative and qualitative evaluations on both common and celebrity datasets demonstrate that AdvCloak outperforms existing state-of-the-art methods in terms of efficiency and effectiveness. The code is available at https://github.com/liuxuannan/AdvCloak.
随着大量人脸图像在社交媒体上被分享,人们对隐私的担忧明显升级。在本文中,我们提出了利用生成模型保护隐私的创新框架 AdvCloak。AdvCloak 的设计目的是自动定制类对抗掩码,以保持卓越的图像级自然度,同时提供更强的特征级泛化能力。具体来说,AdvCloak 采用两阶段训练策略,依次优化生成式对抗网络。这一策略首先侧重于使面具适应独特的个人面孔,然后增强其对不同个人面孔变化的特征级泛化能力。为了充分利用有限的训练数据,我们将 AdvCloak 与几种通用几何建模方法相结合,以更好地描述源身份的特征子空间。在普通数据集和名人数据集上进行的大量定量和定性评估表明,AdvCloak 在效率和效果方面都优于现有的最先进方法。代码可在 https://github.com/liuxuannan/AdvCloak 上获取。
{"title":"AdvCloak: Customized adversarial cloak for privacy protection","authors":"Xuannan Liu,&nbsp;Yaoyao Zhong,&nbsp;Xing Cui,&nbsp;Yuhang Zhang,&nbsp;Peipei Li,&nbsp;Weihong Deng","doi":"10.1016/j.patcog.2024.111050","DOIUrl":"10.1016/j.patcog.2024.111050","url":null,"abstract":"<div><div>With extensive face images being shared on social media, there has been a notable escalation in privacy concerns. In this paper, we propose AdvCloak, an innovative framework for privacy protection using generative models. AdvCloak is designed to automatically customize class-wise adversarial masks that can maintain superior image-level naturalness while providing enhanced feature-level generalization ability. Specifically, AdvCloak sequentially optimizes the generative adversarial networks by employing a two-stage training strategy. This strategy initially focuses on adapting the masks to the unique individual faces and then enhances their feature-level generalization ability to diverse facial variations of individuals. To fully utilize the limited training data, we combine AdvCloak with several general geometric modeling methods, to better describe the feature subspace of source identities. Extensive quantitative and qualitative evaluations on both common and celebrity datasets demonstrate that AdvCloak outperforms existing state-of-the-art methods in terms of efficiency and effectiveness. The code is available at <span><span>https://github.com/liuxuannan/AdvCloak</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111050"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CHA: Conditional Hyper-Adapter method for detecting human–object interaction CHA: 检测人与物体互动的条件超适配器方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111075
Mengyang Sun , Wei Suo , Ji Wang , Peng Wang , Yanning Zhang
Human–object interactions (HOI) detection aims at capturing human–object pairs in images and predicting their actions. It is an essential step for many visual reasoning tasks, such as VQA, image retrieval and surveillance event detection. The challenge of this task is to tackle the compositional learning problem, especially in a few-shot setting. A straightforward approach is designing a group of dedicated models for each specific pair. However, the maintenance of these independent models is unrealistic due to combinatorial explosion. To address the above problems, we propose a new Conditional Hyper-Adapter (CHA) method based on meta-learning. Different from previous works, our approach regards each <verb, object> as an independent sub-task. Meanwhile, we design two kinds of Hyper-Adapter structures to guide the model to learn “how to address the HOI detection”. By combining the different conditions and hypernetwork, the CHA can adaptively generate partial parameters and improve the representation and generalization ability of the model. Finally, our proposed method can be viewed as a plug-and-play module to boost existing HOI detection models on the widely used HOI benchmarks.
人-物互动(HOI)检测旨在捕捉图像中的人-物对并预测其动作。这是许多视觉推理任务(如 VQA、图像检索和监控事件检测)的重要步骤。这项任务的难点在于如何解决合成学习问题,尤其是在拍摄数量较少的情况下。一种简单直接的方法是为每对特定图像设计一组专用模型。然而,由于组合爆炸,维护这些独立模型是不现实的。为了解决上述问题,我们提出了一种基于元学习的全新条件超适配器(CHA)方法。与以往的方法不同,我们的方法将每个动词、对象视为一个独立的子任务。同时,我们设计了两种超适配器结构来引导模型学习 "如何解决 HOI 检测"。通过结合不同的条件和超网络,CHA 可以自适应地生成部分参数,提高模型的表征和泛化能力。最后,我们提出的方法可被视为一个即插即用模块,可在广泛使用的 HOI 基准上提升现有 HOI 检测模型。
{"title":"CHA: Conditional Hyper-Adapter method for detecting human–object interaction","authors":"Mengyang Sun ,&nbsp;Wei Suo ,&nbsp;Ji Wang ,&nbsp;Peng Wang ,&nbsp;Yanning Zhang","doi":"10.1016/j.patcog.2024.111075","DOIUrl":"10.1016/j.patcog.2024.111075","url":null,"abstract":"<div><div>Human–object interactions (HOI) detection aims at capturing human–object pairs in images and predicting their actions. It is an essential step for many visual reasoning tasks, such as VQA, image retrieval and surveillance event detection. The challenge of this task is to tackle the compositional learning problem, especially in a few-shot setting. A straightforward approach is designing a group of dedicated models for each specific pair. However, the maintenance of these independent models is unrealistic due to combinatorial explosion. To address the above problems, we propose a new Conditional Hyper-Adapter (CHA) method based on meta-learning. Different from previous works, our approach regards each <span><math><mo>&lt;</mo></math></span>verb, object<span><math><mo>&gt;</mo></math></span> as an independent sub-task. Meanwhile, we design two kinds of Hyper-Adapter structures to guide the model to learn “how to address the HOI detection”. By combining the different conditions and hypernetwork, the CHA can adaptively generate partial parameters and improve the representation and generalization ability of the model. Finally, our proposed method can be viewed as a plug-and-play module to boost existing HOI detection models on the widely used HOI benchmarks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111075"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142530431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-aware frame-event fusion based pattern recognition via large vision–language models 通过大型视觉语言模型进行基于语义感知的帧-事件融合模式识别
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111080
Dong Li , Jiandong Jin , Yuhao Zhang , Yanlin Zhong , Yaoyang Wu , Lan Chen , Xiao Wang , Bin Luo
Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from two key issues: (1). They attempt to directly learn a mapping from the input vision modality to the semantic labels. This approach often leads to sub-optimal results due to the disparity between the input and semantic labels; (2). They utilize small-scale backbone networks for the extraction of RGB and Event input features, thus these models fail to harness the recent performance advancements of large-scale visual-language models. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision–language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language descriptions through prompt engineering and polish using ChatGPT, and then obtain the semantic features using the pre-trained large-scale language model (CLIP text encoder). Subsequently, we integrate the RGB/Event features and semantic features using multimodal Transformer networks. The resulting frame and event tokens are further amplified using self-attention layers. Concurrently, we propose to enhance the interactions between text tokens and RGB/Event tokens via cross-attention. Finally, we consolidate all three modalities using self-attention and feed-forward layers for recognition. Comprehensive experiments on the HARDVS and PokerEvent datasets fully substantiate the efficacy of our proposed SAFE model. The source code has been released at https://github.com/Event-AHU/SAFE_LargeVLM.
近年来,通过融合 RGB 帧和事件流进行模式识别已成为一个新的研究领域。目前的方法通常采用骨干网络来单独提取 RGB 帧和事件流的特征,然后融合这些特征进行模式识别。然而,我们认为这些方法可能存在两个关键问题:(1).它们试图直接学习从输入视觉模式到语义标签的映射。由于输入和语义标签之间的差异,这种方法往往会导致次优结果;(2).它们利用小规模骨干网络提取 RGB 和事件输入特征,因此这些模型无法利用大规模视觉语言模型最近的性能进步。在本研究中,我们利用预先训练好的大规模视觉语言模型,引入了一种整合语义标签、RGB 帧和事件流的新型模式识别框架。具体来说,在给定输入的 RGB 帧、事件流和所有预定义的语义标签后,我们采用预先训练好的大规模视觉模型(CLIP 视觉编码器)来提取 RGB 和事件特征。在处理语义标签时,我们首先使用 ChatGPT 通过提示工程和润色将其转换为语言描述,然后使用预先训练好的大规模语言模型(CLIP 文本编码器)获取语义特征。随后,我们使用多模态变换器网络整合 RGB/事件特征和语义特征。由此产生的帧和事件标记将通过自关注层进一步放大。同时,我们建议通过交叉关注来增强文本标记和 RGB/Event 标记之间的交互。最后,我们利用自注意层和前馈层将所有三种模式整合在一起进行识别。在 HARDVS 和 PokerEvent 数据集上进行的综合实验充分证明了我们提出的 SAFE 模型的有效性。源代码已在 https://github.com/Event-AHU/SAFE_LargeVLM 上发布。
{"title":"Semantic-aware frame-event fusion based pattern recognition via large vision–language models","authors":"Dong Li ,&nbsp;Jiandong Jin ,&nbsp;Yuhao Zhang ,&nbsp;Yanlin Zhong ,&nbsp;Yaoyang Wu ,&nbsp;Lan Chen ,&nbsp;Xiao Wang ,&nbsp;Bin Luo","doi":"10.1016/j.patcog.2024.111080","DOIUrl":"10.1016/j.patcog.2024.111080","url":null,"abstract":"<div><div>Pattern recognition through the fusion of RGB frames and Event streams has emerged as a novel research area in recent years. Current methods typically employ backbone networks to individually extract the features of RGB frames and event streams, and subsequently fuse these features for pattern recognition. However, we posit that these methods may suffer from two key issues: (1). They attempt to directly learn a mapping from the input vision modality to the semantic labels. This approach often leads to sub-optimal results due to the disparity between the input and semantic labels; (2). They utilize small-scale backbone networks for the extraction of RGB and Event input features, thus these models fail to harness the recent performance advancements of large-scale visual-language models. In this study, we introduce a novel pattern recognition framework that consolidates the semantic labels, RGB frames, and event streams, leveraging pre-trained large-scale vision–language models. Specifically, given the input RGB frames, event streams, and all the predefined semantic labels, we employ a pre-trained large-scale vision model (CLIP vision encoder) to extract the RGB and event features. To handle the semantic labels, we initially convert them into language descriptions through prompt engineering and polish using ChatGPT, and then obtain the semantic features using the pre-trained large-scale language model (CLIP text encoder). Subsequently, we integrate the RGB/Event features and semantic features using multimodal Transformer networks. The resulting frame and event tokens are further amplified using self-attention layers. Concurrently, we propose to enhance the interactions between text tokens and RGB/Event tokens via cross-attention. Finally, we consolidate all three modalities using self-attention and feed-forward layers for recognition. Comprehensive experiments on the HARDVS and PokerEvent datasets fully substantiate the efficacy of our proposed SAFE model. The source code has been released at <span><span>https://github.com/Event-AHU/SAFE_LargeVLM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111080"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perspective-assisted prototype-based learning for semi-supervised crowd counting 基于视角辅助原型的半监督式人群计数学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.patcog.2024.111073
Yifei Qian , Liangfei Zhang , Zhongliang Guo , Xiaopeng Hong , Ognjen Arandjelović , Carl R. Donovan
To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.
为了减轻标注数据以训练人群计数模型的负担,我们提出了一种基于原型的半监督人群计数学习方法,并嵌入了对透视的理解。我们的主要想法是,在相似的透视失真条件下,具有相同人员密度的图像斑块很可能会表现出一致的外观变化,但在不同的失真条件下则会有显著差异。受此启发,我们为每个密度级别构建了多个原型,以捕捉透视的变化。对于有标签的数据,基于原型的学习通过规范化特征空间和模拟不同密度级别内部和之间的关系来协助回归任务。对于未标注数据,学习到的视角嵌入原型可增强相同密度水平样本之间的差异,从而对预测结果进行更细致的评估。通过结合回归结果,我们将未标记的样本分为可靠和不可靠两类,并应用定制的一致性学习策略来提高模型的准确性和泛化能力。由于透视信息通常不可用,我们提出了一种基于透视自组织的新型伪标签分配器,它不需要额外的注释,就能将图像区域分配到不同的空间密度组中,这主要反映了区域间平均密度的差异。在四个人群计数基准上进行的广泛实验证明了我们方法的有效性。
{"title":"Perspective-assisted prototype-based learning for semi-supervised crowd counting","authors":"Yifei Qian ,&nbsp;Liangfei Zhang ,&nbsp;Zhongliang Guo ,&nbsp;Xiaopeng Hong ,&nbsp;Ognjen Arandjelović ,&nbsp;Carl R. Donovan","doi":"10.1016/j.patcog.2024.111073","DOIUrl":"10.1016/j.patcog.2024.111073","url":null,"abstract":"<div><div>To alleviate the burden of labeling data to train crowd counting models, we propose a prototype-based learning approach for semi-supervised crowd counting with an embeded understanding of perspective. Our key idea is that image patches with the same density of people are likely to exhibit coherent appearance changes under similar perspective distortion, but differ significantly under varying distortions. Motivated by this observation, we construct multiple prototypes for each density level to capture variations in perspective. For labeled data, the prototype-based learning assists the regression task by regularizing the feature space and modeling the relationships within and across different density levels. For unlabeled data, the learnt perspective-embedded prototypes enhance differentiation between samples of the same density levels, allowing for a more nuanced assessment of the predictions. By incorporating regression results, we categorize unlabeled samples as reliable or unreliable, applying tailored consistency learning strategies to enhance model accuracy and generalization. Since the perspective information is often unavailable, we propose a novel pseudo-label assigner based on perspective self-organization which requires no additional annotations and assigns image regions to distinct spatial density groups, which mainly reflect the differences in average density among regions. Extensive experiments on four crowd counting benchmarks demonstrate the effectiveness of our approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111073"},"PeriodicalIF":7.5,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pixel shuffling is all you need: spatially aware convmixer for dense prediction tasks 只需像素洗牌:用于密集预测任务的空间感知卷积混频器
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111068
Hatem Ibrahem, Ahmed Salem, Hyun-Soo Kang
ConvMixer is an extremely simple model that could perform better than the state-of-the-art convolutional-based and vision transformer-based methods thanks to mixing the input image patches using a standard convolution. The global mixing process of the patches is only valid for the classification tasks, but it cannot be used for dense prediction tasks as the spatial information of the image is lost in the mixing process. We propose a more efficient technique for image patching, known as pixel shuffling, as it can preserve spatial information. We downsample the input image using the pixel shuffle downsampling in the same form of image patches so that the ConvMixer can be extended for the dense prediction tasks. This paper proves that pixel shuffle downsampling is more efficient than the standard image patching as it outperforms the original ConvMixer architecture in the CIFAR10 and ImageNet-1k classification tasks. We also suggest spatially-aware ConvMixer architectures based on efficient pixel shuffle downsampling and upsampling operations for semantic segmentation and monocular depth estimation. We performed extensive experiments to test the proposed architectures on several datasets; Pascal VOC2012, Cityscapes, and ADE20k for semantic segmentation, NYU-depthV2, and Cityscapes for depth estimation. We show that SA-ConvMixer is efficient enough to get relatively high accuracy at many tasks in a few training epochs (150400). The proposed SA-ConvMixer could achieve an ImageNet-1K Top-1 classification accuracy of 87.02%, mean intersection over union (mIOU) of 87.1% in the PASCAL VOC2012 semantic segmentation task, and absolute relative error of 0.096 in the NYU depthv2 depth estimation task. The implementation code of the proposed method is available at: https://github.com/HatemHosam/SA-ConvMixer/.
ConvMixer 是一个非常简单的模型,通过使用标准卷积混合输入图像补丁,其性能优于最先进的基于卷积的方法和基于视觉变换器的方法。全局混合图像片段的方法只适用于分类任务,但不能用于密集预测任务,因为在混合过程中会丢失图像的空间信息。我们提出了一种更有效的图像修补技术,即像素洗牌,因为它可以保留空间信息。我们使用像素洗牌降采样技术对输入图像进行降采样,使其成为相同形式的图像补丁,从而使 ConvMixer 可扩展用于高密度预测任务。本文证明了像素洗牌下采样比标准图像修补更有效,因为它在 CIFAR10 和 ImageNet-1k 分类任务中的表现优于原始 ConvMixer 架构。我们还提出了基于高效像素洗牌下采样和上采样操作的空间感知 ConvMixer 架构,用于语义分割和单目深度估计。我们进行了大量实验,在多个数据集上测试了所提出的架构:Pascal VOC2012、Cityscapes 和 ADE20k 用于语义分割,NYU-depthV2 和 Cityscapes 用于深度估计。我们的研究表明,SA-ConvMixer 足够高效,只需几个训练历元(150∼400)就能在许多任务中获得相对较高的准确率。所提出的 SA-ConvMixer 在 ImageNet-1K Top-1 分类准确率为 87.02%,在 PASCAL VOC2012 语义分割任务中的平均交集大于联合率(mIOU)为 87.1%,在纽约大学 depthv2 深度估计任务中的绝对相对误差为 0.096。该方法的实现代码可在以下网址获取:https://github.com/HatemHosam/SA-ConvMixer/。
{"title":"Pixel shuffling is all you need: spatially aware convmixer for dense prediction tasks","authors":"Hatem Ibrahem,&nbsp;Ahmed Salem,&nbsp;Hyun-Soo Kang","doi":"10.1016/j.patcog.2024.111068","DOIUrl":"10.1016/j.patcog.2024.111068","url":null,"abstract":"<div><div>ConvMixer is an extremely simple model that could perform better than the state-of-the-art convolutional-based and vision transformer-based methods thanks to mixing the input image patches using a standard convolution. The global mixing process of the patches is only valid for the classification tasks, but it cannot be used for dense prediction tasks as the spatial information of the image is lost in the mixing process. We propose a more efficient technique for image patching, known as pixel shuffling, as it can preserve spatial information. We downsample the input image using the pixel shuffle downsampling in the same form of image patches so that the ConvMixer can be extended for the dense prediction tasks. This paper proves that pixel shuffle downsampling is more efficient than the standard image patching as it outperforms the original ConvMixer architecture in the CIFAR10 and ImageNet-1k classification tasks. We also suggest spatially-aware ConvMixer architectures based on efficient pixel shuffle downsampling and upsampling operations for semantic segmentation and monocular depth estimation. We performed extensive experiments to test the proposed architectures on several datasets; Pascal VOC2012, Cityscapes, and ADE20k for semantic segmentation, NYU-depthV2, and Cityscapes for depth estimation. We show that SA-ConvMixer is efficient enough to get relatively high accuracy at many tasks in a few training epochs (150<span><math><mo>∼</mo></math></span>400). The proposed SA-ConvMixer could achieve an ImageNet-1K Top-1 classification accuracy of 87.02%, mean intersection over union (mIOU) of 87.1% in the PASCAL VOC2012 semantic segmentation task, and absolute relative error of 0.096 in the NYU depthv2 depth estimation task. The implementation code of the proposed method is available at: <span><span>https://github.com/HatemHosam/SA-ConvMixer/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111068"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diffusion process with structural changes for subspace clustering 用于子空间聚类的具有结构变化的扩散过程
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111066
Yanjiao Zhu , Qilin Li , Wanquan Liu , Chuancun Yin
Spectral clustering-based methods have gained significant popularity in subspace clustering due to their ability to capture the underlying data structure effectively. Standard spectral clustering focuses on only pairwise relationships between data points, neglecting interactions among high-order neighboring points. Integrating the diffusion process can address this limitation by leveraging a Markov random walk. However, ensuring that diffusion methods capture sufficient information while maintaining stability against noise remains challenging. In this paper, we propose the Diffusion Process with Structural Changes (DPSC) method, a novel affinity learning framework that enhances the robustness of the diffusion process. Our approach broadens the scope of nearest neighbors and leverages the dropout idea to generate random transition matrices. Furthermore, inspired by the structural changes model, we use two transition matrices to optimize the iteration rule. The resulting affinity matrix undergoes self-supervised learning and is subsequently integrated back into the diffusion process for refinement. Notably, the convergence of the proposed DPSC is theoretically proven. Extensive experiments on benchmark datasets demonstrate that the proposed method outperforms existing subspace clustering methods. The code of our proposed DPSC is available at https://github.com/zhudafa/DPSC.
基于光谱聚类的方法能有效捕捉底层数据结构,因此在子空间聚类领域大受欢迎。标准频谱聚类只关注数据点之间的成对关系,忽略了高阶相邻点之间的相互作用。整合扩散过程可以利用马尔可夫随机游走解决这一局限性。然而,如何确保扩散方法既能捕捉到足够的信息,又能保持对噪声的稳定性,仍然是一项挑战。在本文中,我们提出了具有结构变化的扩散过程(DPSC)方法,这是一种新颖的亲和学习框架,可增强扩散过程的鲁棒性。我们的方法拓宽了近邻的范围,并利用辍学思想生成随机过渡矩阵。此外,受结构变化模型的启发,我们使用两个过渡矩阵来优化迭代规则。由此产生的亲和矩阵经过自我监督学习,随后被整合回扩散过程中进行完善。值得注意的是,所提出的 DPSC 的收敛性已在理论上得到证明。在基准数据集上进行的大量实验证明,所提出的方法优于现有的子空间聚类方法。我们提出的 DPSC 的代码可在 https://github.com/zhudafa/DPSC 上获取。
{"title":"Diffusion process with structural changes for subspace clustering","authors":"Yanjiao Zhu ,&nbsp;Qilin Li ,&nbsp;Wanquan Liu ,&nbsp;Chuancun Yin","doi":"10.1016/j.patcog.2024.111066","DOIUrl":"10.1016/j.patcog.2024.111066","url":null,"abstract":"<div><div>Spectral clustering-based methods have gained significant popularity in subspace clustering due to their ability to capture the underlying data structure effectively. Standard spectral clustering focuses on only pairwise relationships between data points, neglecting interactions among high-order neighboring points. Integrating the diffusion process can address this limitation by leveraging a Markov random walk. However, ensuring that diffusion methods capture sufficient information while maintaining stability against noise remains challenging. In this paper, we propose the Diffusion Process with Structural Changes (DPSC) method, a novel affinity learning framework that enhances the robustness of the diffusion process. Our approach broadens the scope of nearest neighbors and leverages the dropout idea to generate random transition matrices. Furthermore, inspired by the structural changes model, we use two transition matrices to optimize the iteration rule. The resulting affinity matrix undergoes self-supervised learning and is subsequently integrated back into the diffusion process for refinement. Notably, the convergence of the proposed DPSC is theoretically proven. Extensive experiments on benchmark datasets demonstrate that the proposed method outperforms existing subspace clustering methods. The code of our proposed DPSC is available at <span><span>https://github.com/zhudafa/DPSC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111066"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class agnostic and specific consistency learning for weakly-supervised point cloud semantic segmentation 用于弱监督点云语义分割的类无关性和特定一致性学习
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111067
Junwei Wu , Mingjie Sun , Haotian Xu , Chenru Jiang , Wuwei Ma , Quan Zhang
This paper focuses on Weakly Supervised 3D Point Cloud Semantic Segmentation (WS3DSS), which involves annotating only a few points while leaving a large number of points unlabeled in the training sample. Existing methods roughly force point-to-point predictions across different augmented versions of inputs close to each other. While this paper introduces a carefully-designed approach for learning class agnostic and specific consistency, based on the teacher–student framework. The proposed class-agnostic consistency learning, to bring the features of student and teacher models closer together, enhances the model robustness by replacing the traditional point-to-point prediction consistency with the group-to-group consistency based on the perturbed local neighboring points’ features. Furthermore, to facilitate learning under class-wise supervisions, we propose a class-specific consistency learning method, pulling the feature of the unlabeled point towards its corresponding class-specific memory bank feature. Such a class of the unlabeled point is determined as the one with the highest probability predicted by the classifier. Extensive experimental results demonstrate that our proposed method surpasses the SOTA method SQN (Huet al., 2022) by 2.5% and 8.3% on S3DIS dataset, and 4.4% and 13.9% on ScanNetV2 dataset, on the 0.1% and 0.01% settings, respectively. Code is available at https://github.com/jasonwjw/CASC.
本文的重点是弱监督三维点云语义分割(WS3DSS),其中涉及只标注少数几个点,而在训练样本中留下大量未标注的点。现有的方法大致上是强制对彼此接近的不同增强版输入进行点对点预测。而本文基于师生框架,介绍了一种精心设计的学习类无关和特定一致性的方法。所提出的 "类无关一致性学习 "能使学生模型和教师模型的特征更加接近,通过基于扰动局部邻近点特征的组对组一致性取代传统的点对点预测一致性,增强了模型的鲁棒性。此外,为了便于在分类监督下学习,我们提出了一种分类一致性学习方法,将未标记点的特征拉向其对应的分类记忆库特征。这样,未标记点的类别就被确定为分类器预测概率最高的类别。大量实验结果表明,在 S3DIS 数据集上,我们提出的方法比 SOTA 方法 SQN(Huet al.,2022 年)分别高出 2.5% 和 8.3%;在 ScanNetV2 数据集上,在 0.1% 和 0.01% 的设置下,分别高出 4.4% 和 13.9%。代码见 https://github.com/jasonwjw/CASC。
{"title":"Class agnostic and specific consistency learning for weakly-supervised point cloud semantic segmentation","authors":"Junwei Wu ,&nbsp;Mingjie Sun ,&nbsp;Haotian Xu ,&nbsp;Chenru Jiang ,&nbsp;Wuwei Ma ,&nbsp;Quan Zhang","doi":"10.1016/j.patcog.2024.111067","DOIUrl":"10.1016/j.patcog.2024.111067","url":null,"abstract":"<div><div>This paper focuses on Weakly Supervised 3D Point Cloud Semantic Segmentation (WS3DSS), which involves annotating only a few points while leaving a large number of points unlabeled in the training sample. Existing methods roughly force point-to-point predictions across different augmented versions of inputs close to each other. While this paper introduces a carefully-designed approach for learning class agnostic and specific consistency, based on the teacher–student framework. The proposed class-agnostic consistency learning, to bring the features of student and teacher models closer together, enhances the model robustness by replacing the traditional point-to-point prediction consistency with the group-to-group consistency based on the perturbed local neighboring points’ features. Furthermore, to facilitate learning under class-wise supervisions, we propose a class-specific consistency learning method, pulling the feature of the unlabeled point towards its corresponding class-specific memory bank feature. Such a class of the unlabeled point is determined as the one with the highest probability predicted by the classifier. Extensive experimental results demonstrate that our proposed method surpasses the SOTA method SQN (Huet al., 2022) by 2.5% and 8.3% on S3DIS dataset, and 4.4% and 13.9% on ScanNetV2 dataset, on the 0.1% and 0.01% settings, respectively. Code is available at <span><span>https://github.com/jasonwjw/CASC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111067"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142425153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-perspective multi-instance embedding learning with adaptive density distribution mining 双视角多实例嵌入学习与自适应密度分布挖掘
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-09 DOI: 10.1016/j.patcog.2024.111063
Mei Yang , Tian-Lin Chen , Wei-Zhi Wu , Wen-Xi Zeng , Jing-Yu Zhang , Fan Min
Multi-instance learning (MIL) is a potent framework for solving weakly supervised problems, with bags containing multiple instances. Various embedding methods convert each bag into a vector in the new feature space based on a representative bag or instance, aiming to extract useful information from the bag. However, since the distribution of instances is related to labels, these methods rely solely on the overall perspective embedding without considering the different distribution characteristics, which will conflate the varied distributions of instances and thus lead to poor classification performance. In this paper, we propose the dual-perspective multi-instance embedding learning with adaptive density distribution mining (DPMIL) algorithm with three new techniques. First, the mutual instance selection technique consists of adaptive density distribution mining and discriminative evaluation. The distribution characteristics of negative instances and heterogeneous instance dissimilarity are effectively exploited to obtain instances with strong representativeness. Second, the embedding technique mines two crucial information of the bag simultaneously. Bags are converted into sequence invariant vectors according to the dual-perspective such that the distinguishability is maintained. Finally, the ensemble technique trains a batch of classifiers. The final model is obtained by weighted voting with the contribution of the dual-perspective embedding information. The experimental results demonstrate that the DPMIL algorithm has higher average accuracy than other compared algorithms, especially on web datasets.
多实例学习(Multi-instance Learning,MIL)是解决弱监督问题的有效框架,其中的袋包含多个实例。各种嵌入方法都是根据具有代表性的袋或实例,将每个袋转换成新特征空间中的一个向量,目的是从袋中提取有用的信息。然而,由于实例的分布与标签相关,这些方法仅依赖于整体视角嵌入,而不考虑不同的分布特征,这将混淆实例的不同分布,从而导致分类性能低下。本文提出了双视角多实例嵌入学习与自适应密度分布挖掘(DPMIL)算法,并采用了三种新技术。首先,互选实例技术包括自适应密度分布挖掘和判别评估。它有效地利用了负实例的分布特征和异质实例的相似性,从而获得具有较强代表性的实例。其次,嵌入技术同时挖掘了包的两个关键信息。根据双重视角将数据包转换为序列不变向量,从而保持了可区分性。最后,集合技术会训练一批分类器。最终模型是通过加权投票和双视角嵌入信息得到的。实验结果表明,DPMIL 算法的平均准确率高于其他同类算法,尤其是在网络数据集上。
{"title":"Dual-perspective multi-instance embedding learning with adaptive density distribution mining","authors":"Mei Yang ,&nbsp;Tian-Lin Chen ,&nbsp;Wei-Zhi Wu ,&nbsp;Wen-Xi Zeng ,&nbsp;Jing-Yu Zhang ,&nbsp;Fan Min","doi":"10.1016/j.patcog.2024.111063","DOIUrl":"10.1016/j.patcog.2024.111063","url":null,"abstract":"<div><div>Multi-instance learning (MIL) is a potent framework for solving weakly supervised problems, with bags containing multiple instances. Various embedding methods convert each bag into a vector in the new feature space based on a representative bag or instance, aiming to extract useful information from the bag. However, since the distribution of instances is related to labels, these methods rely solely on the overall perspective embedding without considering the different distribution characteristics, which will conflate the varied distributions of instances and thus lead to poor classification performance. In this paper, we propose the dual-perspective multi-instance embedding learning with adaptive density distribution mining (DPMIL) algorithm with three new techniques. First, the mutual instance selection technique consists of adaptive density distribution mining and discriminative evaluation. The distribution characteristics of negative instances and heterogeneous instance dissimilarity are effectively exploited to obtain instances with strong representativeness. Second, the embedding technique mines two crucial information of the bag simultaneously. Bags are converted into sequence invariant vectors according to the dual-perspective such that the distinguishability is maintained. Finally, the ensemble technique trains a batch of classifiers. The final model is obtained by weighted voting with the contribution of the dual-perspective embedding information. The experimental results demonstrate that the DPMIL algorithm has higher average accuracy than other compared algorithms, especially on web datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111063"},"PeriodicalIF":7.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SNN using color-opponent and attention mechanisms for object recognition 利用颜色-对手和注意力机制进行物体识别的 SNN
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111070
Zhiwei Yao , Shaobing Gao , Wenjuan Li
The current spiking neural network (SNN) relies on spike-timing-dependent plasticity (STDP) primarily for shape learning in object recognition tasks, overlooking the equally critical aspect of color information. To address this gap, our study introduces an unsupervised variant of STDP that incorporates principles from color-opponency mechanisms (COM) and classical receptive fields (CRF) found in the biological visual system, facilitating the integration of color information during parameter updates within the SNN architecture. Our approach initially preprocesses images into two distinct feature maps: one for shape and another for color. Then, signals derived from COM and intensity concurrently drive the STDP process, thereby updating parameters associated with both color and shape feature maps. Furthermore, we propose a channel-wise attention mechanism to enhance differentiation among objects sharing similar shapes or colors. Specifically, this mechanism utilizes convolution to generate an output spike-wave, identifying a winner based on earliest spike timing and maximal potential. The winning kernel computes attention, which is then applied via convolution to each input image feature map, generating post-feature maps. A STDP-like normalization rule compares firing times between pre- and post-feature maps, dynamically adjusting channel weights to optimize object recognition during the training phase.
We assessed the proposed algorithm using SNN with both single-layer and multi-layer architectures across three datasets. Experimental findings highlight its efficacy and superiority in complex object recognition tasks compared to state-of-the-art (SOTA) algorithms. Notably, our approach achieved a significant 20% performance improvement over the SOTA on the Caltech-101 dataset. Moreover, the algorithm is well-suited for hardware implementation and energy efficiency, leveraging a winner-selection mechanism based on the earliest spike time.
目前的尖峰神经网络(SNN)主要依靠尖峰计时可塑性(STDP)进行物体识别任务中的形状学习,而忽略了同样重要的颜色信息。为了弥补这一缺陷,我们的研究引入了一种无监督的 STDP 变体,该变体结合了生物视觉系统中的色彩反应机制(COM)和经典感受野(CRF)原理,有助于在 SNN 架构内的参数更新过程中整合色彩信息。我们的方法首先将图像预处理成两个不同的特征图:一个是形状特征图,另一个是颜色特征图。然后,来自 COM 和强度的信号同时驱动 STDP 流程,从而更新与颜色和形状特征图相关的参数。此外,我们还提出了一种通道关注机制,以加强对具有相似形状或颜色的物体的区分。具体来说,该机制利用卷积生成输出尖峰波,并根据最早的尖峰时间和最大电位确定获胜者。获胜内核计算注意力,然后通过卷积应用于每个输入图像特征图,生成后特征图。在训练阶段,类似 STDP 的归一化规则会比较前特征图和后特征图之间的点燃时间,动态调整通道权重以优化目标识别。实验结果表明,与最先进的(SOTA)算法相比,该算法在复杂的物体识别任务中更有效、更优越。值得注意的是,在 Caltech-101 数据集上,我们的方法比 SOTA 算法显著提高了 20% 的性能。此外,该算法利用基于最早尖峰时间的优胜者选择机制,非常适合硬件实现和提高能效。
{"title":"SNN using color-opponent and attention mechanisms for object recognition","authors":"Zhiwei Yao ,&nbsp;Shaobing Gao ,&nbsp;Wenjuan Li","doi":"10.1016/j.patcog.2024.111070","DOIUrl":"10.1016/j.patcog.2024.111070","url":null,"abstract":"<div><div>The current spiking neural network (SNN) relies on spike-timing-dependent plasticity (STDP) primarily for shape learning in object recognition tasks, overlooking the equally critical aspect of color information. To address this gap, our study introduces an unsupervised variant of STDP that incorporates principles from color-opponency mechanisms (COM) and classical receptive fields (CRF) found in the biological visual system, facilitating the integration of color information during parameter updates within the SNN architecture. Our approach initially preprocesses images into two distinct feature maps: one for shape and another for color. Then, signals derived from COM and intensity concurrently drive the STDP process, thereby updating parameters associated with both color and shape feature maps. Furthermore, we propose a channel-wise attention mechanism to enhance differentiation among objects sharing similar shapes or colors. Specifically, this mechanism utilizes convolution to generate an output spike-wave, identifying a winner based on earliest spike timing and maximal potential. The winning kernel computes attention, which is then applied via convolution to each input image feature map, generating post-feature maps. A STDP-like normalization rule compares firing times between pre- and post-feature maps, dynamically adjusting channel weights to optimize object recognition during the training phase.</div><div>We assessed the proposed algorithm using SNN with both single-layer and multi-layer architectures across three datasets. Experimental findings highlight its efficacy and superiority in complex object recognition tasks compared to state-of-the-art (SOTA) algorithms. Notably, our approach achieved a significant 20% performance improvement over the SOTA on the Caltech-101 dataset. Moreover, the algorithm is well-suited for hardware implementation and energy efficiency, leveraging a winner-selection mechanism based on the earliest spike time.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111070"},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization MBQuant:用于任意位宽网络量化的新型多分支拓扑方法
IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-05 DOI: 10.1016/j.patcog.2024.111061
Yunshan Zhong , Yuyao Zhou , Fei Chao , Rongrong Ji
Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. For completing the computation of a desired bit-width, MBQuant selects multiple branches, ensuring that the computational costs match those of the desired bit-width, to carry out forward propagation. By fixing the weight bit-width, MBQuant substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we observe that the first branch suffers from quantization errors caused by all bit-widths, leading to performance degradation. Thus, we introduce an amortization branch selection strategy that amortizes the errors. Specifically, the first branch is selected only for certain bit-widths, rather than universally, thereby the errors are distributed among the branches more evenly. Finally, we adopt an in-place distillation strategy that uses the largest bit-width to guide the other bit-widths to further enhance MBQuant’s performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is made publicly available at https://github.com/zysxmu/MBQuant.
任意位宽网络量化因其在运行时对各种位宽要求的高度适应性而备受关注。然而,在本文中,我们对现有方法进行了研究,观察到由于权重和激活位宽的切换而造成的量化误差的显著积累,从而导致性能有限。为解决这一问题,我们提出了 MBQuant,这是一种利用多分支拓扑实现任意位宽量化的新方法。MBQuant 将网络主体复制为多个独立分支,其中每个分支的权重量化为固定的 2 位,而激活保持在输入位宽。为了完成所需位宽的计算,MBQuant 会选择多个分支,确保计算成本与所需位宽相匹配,从而进行前向传播。通过固定权重位宽,MBQuant 大大减少了权重位宽切换造成的量化误差。此外,我们发现第一个分支会受到所有位宽引起的量化误差的影响,从而导致性能下降。因此,我们引入了一种可摊销误差的摊销分支选择策略。具体来说,我们只针对特定位宽选择第一分支,而不是全面选择,从而使误差在各分支之间的分布更加均匀。最后,我们采用了就地蒸馏策略,利用最大位宽引导其他位宽,从而进一步提高 MBQuant 的性能。大量实验证明,与现有的任意位宽量化方法相比,MBQuant 的性能有了显著提高。代码可通过 https://github.com/zysxmu/MBQuant 公开获取。
{"title":"MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization","authors":"Yunshan Zhong ,&nbsp;Yuyao Zhou ,&nbsp;Fei Chao ,&nbsp;Rongrong Ji","doi":"10.1016/j.patcog.2024.111061","DOIUrl":"10.1016/j.patcog.2024.111061","url":null,"abstract":"<div><div>Arbitrary bit-width network quantization has received significant attention due to its high adaptability to various bit-width requirements during runtime. However, in this paper, we investigate existing methods and observe a significant accumulation of quantization errors caused by switching weight and activations bit-widths, leading to limited performance. To address this issue, we propose MBQuant, a novel method that utilizes a multi-branch topology for arbitrary bit-width quantization. MBQuant duplicates the network body into multiple independent branches, where the weights of each branch are quantized to a fixed 2-bit and the activations remain in the input bit-width. For completing the computation of a desired bit-width, MBQuant selects multiple branches, ensuring that the computational costs match those of the desired bit-width, to carry out forward propagation. By fixing the weight bit-width, MBQuant substantially reduces quantization errors caused by switching weight bit-widths. Additionally, we observe that the first branch suffers from quantization errors caused by all bit-widths, leading to performance degradation. Thus, we introduce an amortization branch selection strategy that amortizes the errors. Specifically, the first branch is selected only for certain bit-widths, rather than universally, thereby the errors are distributed among the branches more evenly. Finally, we adopt an in-place distillation strategy that uses the largest bit-width to guide the other bit-widths to further enhance MBQuant’s performance. Extensive experiments demonstrate that MBQuant achieves significant performance gains compared to existing arbitrary bit-width quantization methods. Code is made publicly available at <span><span>https://github.com/zysxmu/MBQuant</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"158 ","pages":"Article 111061"},"PeriodicalIF":7.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1