Pub Date : 2024-12-23DOI: 10.1109/TPAMI.2024.3520899
Yue Song;Wei Wang;Nicu Sebe
The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose RankFeat, a simple yet effective post hoc approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. RankFeat achieves state-of-the-art performance and reduces the average false positive rate (FPR95) by 17.90% compared with the previous best method. The success of RankFeat motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose RankWeight which removes the rank-1 weight from the parameter matrices of a single deep layer. Our RankWeight is also post hoc and only requires computing the rank-1 matrix once. As a standalone approach, RankWeight has very competitive performance against other methods across various backbones. Moreover, RankWeight enjoys flexible compatibility with a wide range of OOD detection methods. The combination of RankWeight and RankFeat refreshes the new state-of-the-art performance, achieving the FPR95 as low as 16.13% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.
{"title":"RankFeat&RankWeight: Rank-1 Feature/Weight Removal for Out-of-Distribution Detection","authors":"Yue Song;Wei Wang;Nicu Sebe","doi":"10.1109/TPAMI.2024.3520899","DOIUrl":"10.1109/TPAMI.2024.3520899","url":null,"abstract":"The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose <monospace>RankFeat</monospace>, a simple yet effective <italic>post hoc</i> approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. <monospace>RankFeat</monospace> achieves <italic>state-of-the-art</i> performance and reduces the average false positive rate (FPR95) by 17.90% compared with the previous best method. The success of <monospace>RankFeat</monospace> motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose <monospace>RankWeight</monospace> which removes the rank-1 weight from the parameter matrices of a single deep layer. Our <monospace>RankWeight</monospace> is also <italic>post hoc</i> and only requires computing the rank-1 matrix once. As a standalone approach, <monospace>RankWeight</monospace> has very competitive performance against other methods across various backbones. Moreover, <monospace>RankWeight</monospace> enjoys flexible compatibility with a wide range of OOD detection methods. The combination of <monospace>RankWeight</monospace> and <monospace>RankFeat</monospace> refreshes the new <italic>state-of-the-art</i> performance, achieving the FPR95 as low as 16.13% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2505-2519"},"PeriodicalIF":0.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10812065","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/TPAMI.2024.3521022
Quanxue Gao;Fangfang Li;Qianqian Wang;Xinbo Gao;Dacheng Tao
Although numerous clustering algorithms have been developed, many existing methods still rely on the K-means technique to identify clusters of data points. However, the performance of K-means is highly dependent on the accurate estimation of cluster centers, which is challenging to achieve optimally. Furthermore, it struggles to handle linearly non-separable data. To address these limitations, from the perspective of manifold learning, we reformulate multi-view K-means into a manifold-based multi-view clustering formulation that eliminates the need for computing centroid matrix. This reformulation ensures consistency between the manifold structure and the data labels. Building on this, we propose a novel multi-view K-means model incorporating the tensor rank constraint. Our model employs the indicator matrices from different views to construct a third-order tensor, whose rank is minimized via the tensor Schatten p-norm. This approach effectively leverages the complementary information across views. By utilizing different distance functions, our proposed model can effectively handle linearly non-separable data. Extensive experimental results on multiple databases demonstrate the superiority of our proposed model.
{"title":"Manifold Based Multi-View K-Means","authors":"Quanxue Gao;Fangfang Li;Qianqian Wang;Xinbo Gao;Dacheng Tao","doi":"10.1109/TPAMI.2024.3521022","DOIUrl":"10.1109/TPAMI.2024.3521022","url":null,"abstract":"Although numerous clustering algorithms have been developed, many existing methods still rely on the <italic>K</i>-means technique to identify clusters of data points. However, the performance of <italic>K</i>-means is highly dependent on the accurate estimation of cluster centers, which is challenging to achieve optimally. Furthermore, it struggles to handle linearly non-separable data. To address these limitations, from the perspective of manifold learning, we reformulate multi-view <italic>K</i>-means into a manifold-based multi-view clustering formulation that eliminates the need for computing centroid matrix. This reformulation ensures consistency between the manifold structure and the data labels. Building on this, we propose a novel multi-view <italic>K</i>-means model incorporating the tensor rank constraint. Our model employs the indicator matrices from different views to construct a third-order tensor, whose rank is minimized via the tensor Schatten <italic>p</i>-norm. This approach effectively leverages the complementary information across views. By utilizing different distance functions, our proposed model can effectively handle linearly non-separable data. Extensive experimental results on multiple databases demonstrate the superiority of our proposed model.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"3175-3182"},"PeriodicalIF":0.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/TPAMI.2024.3521416
Jinyuan Liu;Guanyao Wu;Zhu Liu;Di Wang;Zhiying Jiang;Long Ma;Wei Zhong;Xin Fan;Risheng Liu
Infrared-visible image fusion (IVIF) is a fundamental and critical task in the field of computer vision. Its aim is to integrate the unique characteristics of both infrared and visible spectra into a holistic representation. Since 2018, growing amount and diversity IVIF approaches step into a deep-learning era, encompassing introduced a broad spectrum of networks or loss functions for improving visual enhancement. As research deepens and practical demands grow, several intricate issues like data compatibility, perception accuracy, and efficiency cannot be ignored. Regrettably, there is a lack of recent surveys that comprehensively introduce and organize this expanding domain of knowledge. Given the current rapid development, this paper aims to fill the existing gap by providing a comprehensive survey that covers a wide array of aspects. Initially, we introduce a multi-dimensional framework to elucidate the prevalent learning-based IVIF methodologies, spanning topics from basic visual enhancement strategies to data compatibility, task adaptability, and further extensions. Subsequently, we delve into a profound analysis of these new approaches, offering a detailed lookup table to clarify their core ideas. Last but not the least, We also summarize performance comparisons quantitatively and qualitatively, covering registration, fusion and follow-up high-level tasks. Beyond delving into the technical nuances of these learning-based fusion approaches, we also explore potential future directions and open issues that warrant further exploration by the community.
{"title":"Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption","authors":"Jinyuan Liu;Guanyao Wu;Zhu Liu;Di Wang;Zhiying Jiang;Long Ma;Wei Zhong;Xin Fan;Risheng Liu","doi":"10.1109/TPAMI.2024.3521416","DOIUrl":"10.1109/TPAMI.2024.3521416","url":null,"abstract":"Infrared-visible image fusion (IVIF) is a fundamental and critical task in the field of computer vision. Its aim is to integrate the unique characteristics of both infrared and visible spectra into a holistic representation. Since 2018, growing amount and diversity IVIF approaches step into a deep-learning era, encompassing introduced a broad spectrum of networks or loss functions for improving visual enhancement. As research deepens and practical demands grow, several intricate issues like data compatibility, perception accuracy, and efficiency cannot be ignored. Regrettably, there is a lack of recent surveys that comprehensively introduce and organize this expanding domain of knowledge. Given the current rapid development, this paper aims to fill the existing gap by providing a comprehensive survey that covers a wide array of aspects. Initially, we introduce a multi-dimensional framework to elucidate the prevalent learning-based IVIF methodologies, spanning topics from basic visual enhancement strategies to data compatibility, task adaptability, and further extensions. Subsequently, we delve into a profound analysis of these new approaches, offering a detailed lookup table to clarify their core ideas. Last but not the least, We also summarize performance comparisons quantitatively and qualitatively, covering registration, fusion and follow-up high-level tasks. Beyond delving into the technical nuances of these learning-based fusion approaches, we also explore potential future directions and open issues that warrant further exploration by the community.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2349-2369"},"PeriodicalIF":0.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1109/TPAMI.2024.3521589
Chunxiao Fan;Dan Guo;Ziqi Wang;Meng Wang
Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.
{"title":"Multi-Objective Convex Quantization for Efficient Model Compression","authors":"Chunxiao Fan;Dan Guo;Ziqi Wang;Meng Wang","doi":"10.1109/TPAMI.2024.3521589","DOIUrl":"10.1109/TPAMI.2024.3521589","url":null,"abstract":"Quantization is one of the efficient model compression methods, which represents the network with fixed-point or low-bit numbers. Existing quantization methods address the network quantization by treating it as a single-objective optimization that pursues high accuracy (performance optimization) while keeping the quantization constraint. However, owing to the non-differentiability of the quantization operation, it is challenging to integrate the quantization operation into the network training and achieve optimal parameters. In this paper, a novel multi-objective convex quantization for efficient model compression is proposed. Specifically, the network training is modeled as a multi-objective optimization to find the network with both high precision and low quantization error (actually, these two goals are somewhat contradictory and affect each other). To achieve effective multi-objective optimization, this paper designs a quantization error function that is differentiable and ensures the computation convexity in each period, so as to avoid the non-differentiable back-propagation of the quantization operation. Then, we perform a time-series self-distillation training scheme on the multi-objective optimization framework, which distills its past softened labels and combines the hard targets to guarantee controllable and stable performance convergence during training. At last and more importantly, a new dynamic Lagrangian coefficient adaption is designed to adjust the gradient magnitude of quantization loss and performance loss and balance the two losses during training processing. The proposed method is evaluated on well-known benchmarks: MNIST, CIFAR-10/100, ImageNet, Penn Treebank and Microsoft COCO, and experimental results show that the proposed method achieves outstanding performance compared to existing methods.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2313-2329"},"PeriodicalIF":0.0,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142879595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-20DOI: 10.1109/TPAMI.2024.3520708
Miin-Shen Yang;Kristina P. Sinaga
The increasing effect of Internet of Things (IoT) unlocks the massive volume of the availability of Big Data in many fields. Generally, these Big Data may be in a non-independently and identically distributed fashion (non-IID). In this paper, we have contributions in such a way enable multi-view k-means (MVKM) clustering to maintain the privacy of each database by allowing MVKM to be operated on the local principle of clients’ multi-view data. This work integrates the exponential distance to transform the weighted Euclidean distance on MVKM so that it can make full use of development in federated learning via the MVKM clustering algorithm. The proposed algorithm, called a federated MVKM (Fed-MVKM), can provide a whole new level adding a lot of new ideas to produce a much better output. The proposed Fed-MVKM is highly suitable for clustering large data sets. To demonstrate its efficient and applicable, we implement a synthetic and six real multi-view data sets and then perform Federated Peter-Clark in Huang et al. 2023 for causal inference setting to split the data instances over multiple clients, efficiently. The results show that shared-models based local cluster centers with data-driven in the federated environment can generate a satisfying final pattern of one multi-view data that simultaneously improve the clustering performance of (non-federated) MVKM clustering algorithms.
{"title":"Federated Multi-View K-Means Clustering","authors":"Miin-Shen Yang;Kristina P. Sinaga","doi":"10.1109/TPAMI.2024.3520708","DOIUrl":"10.1109/TPAMI.2024.3520708","url":null,"abstract":"The increasing effect of Internet of Things (IoT) unlocks the massive volume of the availability of Big Data in many fields. Generally, these Big Data may be in a non-independently and identically distributed fashion (non-IID). In this paper, we have contributions in such a way enable multi-view k-means (MVKM) clustering to maintain the privacy of each database by allowing MVKM to be operated on the local principle of clients’ multi-view data. This work integrates the exponential distance to transform the weighted Euclidean distance on MVKM so that it can make full use of development in federated learning via the MVKM clustering algorithm. The proposed algorithm, called a federated MVKM (Fed-MVKM), can provide a whole new level adding a lot of new ideas to produce a much better output. The proposed Fed-MVKM is highly suitable for clustering large data sets. To demonstrate its efficient and applicable, we implement a synthetic and six real multi-view data sets and then perform Federated Peter-Clark in Huang et al. 2023 for causal inference setting to split the data instances over multiple clients, efficiently. The results show that shared-models based local cluster centers with data-driven in the federated environment can generate a satisfying final pattern of one multi-view data that simultaneously improve the clustering performance of (non-federated) MVKM clustering algorithms.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2446-2459"},"PeriodicalIF":0.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vision transformers have gained popularity recently, leading to the development of new vision backbones with improved features and consistent performance gains. However, these advancements are not solely attributable to novel feature transformation designs; certain benefits also arise from advanced network-level and block-level architectures. This paper aims to identify the real gains of popular convolution and attention operators through a detailed study. We find that the key difference among these feature transformation modules, such as attention or convolution, lies in their spatial feature aggregation approach, known as the “spatial token mixer” (STM). To facilitate an impartial comparison, we introduce a unified architecture to neutralize the impact of divergent network-level and block-level designs. Subsequently, various STMs are integrated into this unified framework for comprehensive comparative analysis. Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs. Our detailed analysis also reveals various findings about different STMs, including effective receptive fields, invariance, and adversarial robustness tests.
{"title":"Demystify Transformers & Convolutions in Modern Image Deep Networks","authors":"Xiaowei Hu;Min Shi;Weiyun Wang;Sitong Wu;Linjie Xing;Wenhai Wang;Xizhou Zhou;Lewei Lu;Jie Zhou;Xiaogang Wang;Yu Qiao;Jifeng Dai","doi":"10.1109/TPAMI.2024.3520508","DOIUrl":"10.1109/TPAMI.2024.3520508","url":null,"abstract":"Vision transformers have gained popularity recently, leading to the development of new vision backbones with improved features and consistent performance gains. However, these advancements are not solely attributable to novel feature transformation designs; certain benefits also arise from advanced network-level and block-level architectures. This paper aims to identify the real gains of popular convolution and attention operators through a detailed study. We find that the key difference among these feature transformation modules, such as attention or convolution, lies in their spatial feature aggregation approach, known as the “spatial token mixer” (STM). To facilitate an impartial comparison, we introduce a unified architecture to neutralize the impact of divergent network-level and block-level designs. Subsequently, various STMs are integrated into this unified framework for comprehensive comparative analysis. Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs. Our detailed analysis also reveals various findings about different STMs, including effective receptive fields, invariance, and adversarial robustness tests.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2416-2428"},"PeriodicalIF":0.0,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142867131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recommender systems have been widely employed on various online platforms to improve user experience. In these systems, recommendation models are often learned from the users’ historical behaviors that are automatically collected. Notably, recommender systems differ slightly from ordinary supervised learning tasks. In recommender systems, there is an exposure mechanism that decides which items could be presented to each specific user, which breaks the i.i.d assumption of supervised learning and brings biases into the recommendation models. In this paper, we focus on unbiased ranking loss weighted by inversed propensity scores (IPS), which are widely used in recommendations with implicit feedback labels. More specifically, we first highlight the fact that there is a gap between theory and practice in IPS-weighted unbiased loss. The existing pairwise loss could be theoretically unbiased by adopting an IPS weighting scheme. Unfortunately, the propensity scores are hard to estimate due to the inaccessibility of each user-item pair's true exposure status. In practical scenarios, we can only approximate the propensity scores. In this way, the theoretically unbiased loss would be still practically biased. To solve this problem, we first construct a theoretical framework to obtain a generalization upper bound of the current theoretically unbiased loss. The bound illustrates that we can ensure the theoretically unbiased loss's generalization ability if we lower its implementation loss and practical bias at the same time. To that aim, we suggest treating feedback label $Y_{ui}$ as a noisy proxy for exposure result $O_{ui}$ for each user-item pair $(u, i)$. Here we assume the noise rate meets the condition that $hat{P}(O_{ui}=1, Y_{ui}ne O_{ui}) < 1/2$. According to our analysis, this is a mild assumption that can be satisfied by many real-world applications. Based on this, we could train an accurate propensity model directly by leveraging a noise-resistant loss function. Then we could construct a practically unbiased recommendation model weighted by precise propensity scores. Lastly, experimental findings on public datasets demonstrate our suggested method's effectiveness.
{"title":"Practically Unbiased Pairwise Loss for Recommendation With Implicit Feedback","authors":"Tianwei Cao;Qianqian Xu;Zhiyong Yang;Zhanyu Ma;Qingming Huang","doi":"10.1109/TPAMI.2024.3519711","DOIUrl":"10.1109/TPAMI.2024.3519711","url":null,"abstract":"Recommender systems have been widely employed on various online platforms to improve user experience. In these systems, recommendation models are often learned from the users’ historical behaviors that are automatically collected. Notably, recommender systems differ slightly from ordinary supervised learning tasks. In recommender systems, there is an exposure mechanism that decides which items could be presented to each specific user, which breaks the i.i.d assumption of supervised learning and brings biases into the recommendation models. In this paper, we focus on unbiased ranking loss weighted by inversed propensity scores (IPS), which are widely used in recommendations with implicit feedback labels. More specifically, we first highlight the fact that there is a gap between theory and practice in IPS-weighted unbiased loss. The existing pairwise loss could be theoretically unbiased by adopting an IPS weighting scheme. Unfortunately, the propensity scores are hard to estimate due to the inaccessibility of each user-item pair's true exposure status. In practical scenarios, we can only approximate the propensity scores. In this way, the theoretically unbiased loss would be still practically biased. To solve this problem, we first construct a theoretical framework to obtain a generalization upper bound of the current theoretically unbiased loss. The bound illustrates that we can ensure the theoretically unbiased loss's generalization ability if we lower its implementation loss and practical bias at the same time. To that aim, we suggest treating feedback label <inline-formula><tex-math>$Y_{ui}$</tex-math></inline-formula> as a noisy proxy for exposure result <inline-formula><tex-math>$O_{ui}$</tex-math></inline-formula> for each user-item pair <inline-formula><tex-math>$(u, i)$</tex-math></inline-formula>. Here we assume the noise rate meets the condition that <inline-formula><tex-math>$hat{P}(O_{ui}=1, Y_{ui}ne O_{ui}) < 1/2$</tex-math></inline-formula>. According to our analysis, this is a mild assumption that can be satisfied by many real-world applications. Based on this, we could train an accurate propensity model directly by leveraging a noise-resistant loss function. Then we could construct a practically unbiased recommendation model weighted by precise propensity scores. Lastly, experimental findings on public datasets demonstrate our suggested method's effectiveness.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2460-2474"},"PeriodicalIF":0.0,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142858472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The exploration of quantum advantages with Quantum Neural Networks (QNNs) is an exciting endeavor. Recurrent neural networks, the widely used framework in deep learning, suffer from the gradient vanishing and exploding problem, which limits their ability to learn long-term dependencies. To address this challenge, in this work, we develop the sequential model of Quantum Gated Recurrent Neural Networks (QGRNNs). This model naturally integrates the gating mechanism into the framework of the variational ansatz circuit of QNNs, enabling efficient execution on near-term quantum devices. We present rigorous proof that QGRNNs can preserve the gradient norm of long-term interactions throughout the recurrent network, enabling efficient learning of long-term dependencies. Meanwhile, the architectural features of QGRNNs can effectively mitigate the barren plateau phenomenon. The effectiveness of QGRNNs in sequential learning is convincingly demonstrated through various typical tasks, including solving the adding problem, learning gene regulatory networks, and predicting stock prices. The hardware-efficient architecture and superior performance of our QGRNNs indicate their promising potential for finding quantum advantageous applications in the near term.
{"title":"Quantum Gated Recurrent Neural Networks","authors":"Yanan Li;Zhimin Wang;Ruipeng Xing;Changheng Shao;Shangshang Shi;Jiaxin Li;Guoqiang Zhong;Yongjian Gu","doi":"10.1109/TPAMI.2024.3519605","DOIUrl":"10.1109/TPAMI.2024.3519605","url":null,"abstract":"The exploration of quantum advantages with Quantum Neural Networks (QNNs) is an exciting endeavor. Recurrent neural networks, the widely used framework in deep learning, suffer from the gradient vanishing and exploding problem, which limits their ability to learn long-term dependencies. To address this challenge, in this work, we develop the sequential model of Quantum Gated Recurrent Neural Networks (QGRNNs). This model naturally integrates the gating mechanism into the framework of the variational ansatz circuit of QNNs, enabling efficient execution on near-term quantum devices. We present rigorous proof that QGRNNs can preserve the gradient norm of long-term interactions throughout the recurrent network, enabling efficient learning of long-term dependencies. Meanwhile, the architectural features of QGRNNs can effectively mitigate the barren plateau phenomenon. The effectiveness of QGRNNs in sequential learning is convincingly demonstrated through various typical tasks, including solving the adding problem, learning gene regulatory networks, and predicting stock prices. The hardware-efficient architecture and superior performance of our QGRNNs indicate their promising potential for finding quantum advantageous applications in the near term.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2493-2504"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1109/TPAMI.2024.3519803
Xuxiang Sun;Gong Cheng;Hongda Li;Chunbo Lang;Junwei Han
On account of the extreme settings, stealing the black-box model without its training data is difficult in practice. On this topic, along the lines of data diversity, this paper substantially makes the following improvements based on our conference version (dubbed STDatav1, short for Surrogate Training Data). First, to mitigate the undesirable impacts of the potential mode collapse while training the generator, we propose the joint-data optimization scheme, which utilizes both the synthesized data and the proxy data to optimize the surrogate model. Second, we propose the self-conditional data synthesis framework, an interesting effort that builds the pseudo-class mapping framework via grouping class information extraction to hold the class-specific constraints while holding the diversity. Within this new framework, we inherit and integrate the class-specific constraints of STDatav1 and design a dual cross-entropy loss to fit this new framework. Finally, to facilitate comprehensive evaluations, we perform experiments on four commonly adopted datasets, and a total of eight kinds of models are employed. These assessments witness the considerable performance gains compared to our early work and demonstrate the competitive ability and promising potential of our approach.
{"title":"STDatav2: Accessing Efficient Black-Box Stealing for Adversarial Attacks","authors":"Xuxiang Sun;Gong Cheng;Hongda Li;Chunbo Lang;Junwei Han","doi":"10.1109/TPAMI.2024.3519803","DOIUrl":"10.1109/TPAMI.2024.3519803","url":null,"abstract":"On account of the extreme settings, stealing the black-box model without its training data is difficult in practice. On this topic, along the lines of data diversity, this paper substantially makes the following improvements based on our conference version (dubbed STDatav1, short for Surrogate Training Data). First, to mitigate the undesirable impacts of the potential mode collapse while training the generator, we propose the joint-data optimization scheme, which utilizes both the synthesized data and the proxy data to optimize the surrogate model. Second, we propose the self-conditional data synthesis framework, an interesting effort that builds the pseudo-class mapping framework via grouping class information extraction to hold the class-specific constraints while holding the diversity. Within this new framework, we inherit and integrate the class-specific constraints of STDatav1 and design a dual cross-entropy loss to fit this new framework. Finally, to facilitate comprehensive evaluations, we perform experiments on four commonly adopted datasets, and a total of eight kinds of models are employed. These assessments witness the considerable performance gains compared to our early work and demonstrate the competitive ability and promising potential of our approach.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 4","pages":"2429-2445"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142849396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18DOI: 10.1109/TPAMI.2024.3519543
Chen Qiu;Marius Kloft;Stephan Mandt;Maja Rudolph
Data augmentation plays a critical role in self-supervised learning, including anomaly detection. While hand-crafted transformations such as image rotations can achieve impressive performance on image data, effective transformations of non-image data are lacking. In this work, we study learning such transformations for end-to-end anomaly detection on arbitrary data. We find that a contrastive loss–which encourages learning diverse data transformations while preserving the relevant semantic content of the data–is more suitable than previously proposed losses for transformation learning, a fact that we prove theoretically and empirically. We demonstrate that anomaly detection using neural transformation learning can achieve state-of-the-art results for time series data, tabular data, text data and graph data. Furthermore, our approach can make image anomaly detection more interpretable by learning transformations at different levels of abstraction.
{"title":"Self-Supervised Anomaly Detection With Neural Transformations","authors":"Chen Qiu;Marius Kloft;Stephan Mandt;Maja Rudolph","doi":"10.1109/TPAMI.2024.3519543","DOIUrl":"10.1109/TPAMI.2024.3519543","url":null,"abstract":"Data augmentation plays a critical role in self-supervised learning, including anomaly detection. While hand-crafted transformations such as image rotations can achieve impressive performance on image data, effective transformations of non-image data are lacking. In this work, we study <italic>learning</i> such transformations for end-to-end anomaly detection on arbitrary data. We find that a contrastive loss–which encourages learning diverse data transformations while preserving the relevant semantic content of the data–is more suitable than previously proposed losses for transformation learning, a fact that we prove theoretically and empirically. We demonstrate that anomaly detection using neural transformation learning can achieve state-of-the-art results for time series data, tabular data, text data and graph data. Furthermore, our approach can make image anomaly detection more interpretable by learning transformations at different levels of abstraction.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 3","pages":"2170-2185"},"PeriodicalIF":0.0,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10806806","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142848870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}