Pub Date : 2025-12-05DOI: 10.1016/j.patcog.2025.112862
Yuanbo Wen , Jing Qin , Ting Chen , Tao Gao
The removal of rain streaks and raindrops is essential for improving image visibility. However, most existing methods rely on paired rainy and clean images, which are difficult to acquire in real-world scenarios. To this end, we propose prior-oriented and frequency-regularized Schrödinger bridge (PFSB) for rain streaks and raindrops removal with unpaired training. Specifically, we initially formulate unpaired image deraining as a Schrödinger bridge problem. Furthermore, we demonstrate the locally quasi-convexity of structural similarity, and employ the multi-scale structural similarity constraint (MSSC) to minimize the duality gap between the primal and dual problems, ensuring linear convergence of gradient flow while preserving textural details. Meanwhile, we develop a context-preserving consistency modulator (CCM) guide the derained output toward clean content, thereby retaining rain-irrelevant features. Moreover, we propose a domain-representative prompt protocol (DPP), which enforces the generated sample to eliminate rain-relevant information and maintain alignment with the clean domain. Additionally, we utilize Bayesian frequency-domain regularization (BFR) to balance spectral consistency with clean references and repulsion from rainy patterns. Extensive experiments demonstrate that our method surpasses the existing well-performing unpaired learning approaches in both fidelity and photo-realism.
{"title":"Prompt-oriented and frequency-regularized schrödinger bridge for unpaired rain streaks and raindrops removal","authors":"Yuanbo Wen , Jing Qin , Ting Chen , Tao Gao","doi":"10.1016/j.patcog.2025.112862","DOIUrl":"10.1016/j.patcog.2025.112862","url":null,"abstract":"<div><div>The removal of rain streaks and raindrops is essential for improving image visibility. However, most existing methods rely on paired rainy and clean images, which are difficult to acquire in real-world scenarios. To this end, we propose prior-oriented and frequency-regularized Schrödinger bridge (PFSB) for rain streaks and raindrops removal with unpaired training. Specifically, we initially formulate unpaired image deraining as a Schrödinger bridge problem. Furthermore, we demonstrate the locally quasi-convexity of structural similarity, and employ the multi-scale structural similarity constraint (MSSC) to minimize the duality gap between the primal and dual problems, ensuring linear convergence of gradient flow while preserving textural details. Meanwhile, we develop a context-preserving consistency modulator (CCM) guide the derained output toward clean content, thereby retaining rain-irrelevant features. Moreover, we propose a domain-representative prompt protocol (DPP), which enforces the generated sample to eliminate rain-relevant information and maintain alignment with the clean domain. Additionally, we utilize Bayesian frequency-domain regularization (BFR) to balance spectral consistency with clean references and repulsion from rainy patterns. Extensive experiments demonstrate that our method surpasses the existing well-performing unpaired learning approaches in both fidelity and photo-realism.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112862"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1016/j.patcog.2025.112843
Zhaoxin Fan , Gen Li , Zhongkai Zhou
Self-supervised monocular depth estimation has seen remarkable progress with the advent of coarse-to-fine architectures and recurrent refinement frameworks. Despite their success, coarse-to-fine pipelines often depend on deep encoders and hierarchical upsampling, which introduce high computational overhead and propagate spatial inconsistencies. On the other hand, recurrent refinement models, such as R-MSFM and RAFM, suffer from suboptimal feature representations and limited capacity to capture fine-grained structures. In this work, we introduce R-FGDepth, a novel recurrent refinement framework enhanced with frequency-guided mechanisms to address these limitations. Our approach features three key innovations: (1) a Spatial-Semantic Hybrid Convolution Encoder that achieves an optimal trade-off between spatial detail preservation and semantic abstraction, (2) a Frequency-guided Global Depth Initialization Module that enforces global geometric consistency, and (3) a Frequency-guided Adaptive Depth Refinement Module that effectively enhances high-frequency structures, such as thin poles, traffic signs, and pedestrians. Extensive experiments on the KITTI and Cityscapes datasets demonstrate that R-FGDepth surpasses both coarse-to-fine and prior recurrent refinement methods, achieving state-of-the-art accuracy with competitive computational efficiency. Furthermore, qualitative evaluations underscore its ability to preserve object boundaries and generalize effectively across diverse domains. With its lightweight design and robust performance, R-FGDepth sets a new benchmark for real-world self-supervised depth estimation, advancing one step closer to the realization of foundation models in depth perception.
{"title":"R-FGDepth: Towards foundation models for recurrent depth learning with frequency-Guided initialization and refinement","authors":"Zhaoxin Fan , Gen Li , Zhongkai Zhou","doi":"10.1016/j.patcog.2025.112843","DOIUrl":"10.1016/j.patcog.2025.112843","url":null,"abstract":"<div><div>Self-supervised monocular depth estimation has seen remarkable progress with the advent of coarse-to-fine architectures and recurrent refinement frameworks. Despite their success, coarse-to-fine pipelines often depend on deep encoders and hierarchical upsampling, which introduce high computational overhead and propagate spatial inconsistencies. On the other hand, recurrent refinement models, such as R-MSFM and RAFM, suffer from suboptimal feature representations and limited capacity to capture fine-grained structures. In this work, we introduce <strong>R-FGDepth</strong>, a novel recurrent refinement framework enhanced with frequency-guided mechanisms to address these limitations. Our approach features three key innovations: (1) a Spatial-Semantic Hybrid Convolution Encoder that achieves an optimal trade-off between spatial detail preservation and semantic abstraction, (2) a Frequency-guided Global Depth Initialization Module that enforces global geometric consistency, and (3) a Frequency-guided Adaptive Depth Refinement Module that effectively enhances high-frequency structures, such as thin poles, traffic signs, and pedestrians. Extensive experiments on the KITTI and Cityscapes datasets demonstrate that R-FGDepth surpasses both coarse-to-fine and prior recurrent refinement methods, achieving state-of-the-art accuracy with competitive computational efficiency. Furthermore, qualitative evaluations underscore its ability to preserve object boundaries and generalize effectively across diverse domains. With its lightweight design and robust performance, <strong>R-FGDepth</strong> sets a new benchmark for real-world self-supervised depth estimation, advancing one step closer to the realization of foundation models in depth perception.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112843"},"PeriodicalIF":7.6,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.patcog.2025.112816
Xing Yi , Liu Liu , Qiupu Chen , Li Zhang , Dan Guo
3D part segmentation is a crucial task for various applications, including robotics and shape analysis. Despite advancements in data-driven approaches, supervised methods heavily rely on annotated data, limiting their effectiveness in open-world scenarios and handling out-of-distribution test shapes. To address these challenges, we propose a novel interactive Click Segmentation (iClickSeg) method that achieves zero-shot cross-category 3D part segmentation via an iterative user interaction way. Specifically, our approach simulates user interactions through positive and negative clicks to guide the segmentation process, focusing on regions of interest and allowing for iterative refinement. To achieve this goal, we design a click sampling strategy learn shape-based prior information from point cloud data, enabling better feature encoding between points. Under the learned shape prior, the segmentation model can maintain the topology consistency and boost the performance with a simple PointNet++ network incorporation. For better refinement, we also present a post-processing strategy using outlier removal and heuristic click for obtaining the smooth segments. Extensive experiments on PartNet, PartNetE and S3DIS datasets demonstrate the superiority of iClickSeg over category-level segmentation methods and zero-shot methods. Inference tests on the AKB-48 data further validate the method’s effectiveness and practicality in real-world scenarios.
{"title":"iClickSeg: Interactive click segmentation for zero-shot cross-category 3D part segmentation","authors":"Xing Yi , Liu Liu , Qiupu Chen , Li Zhang , Dan Guo","doi":"10.1016/j.patcog.2025.112816","DOIUrl":"10.1016/j.patcog.2025.112816","url":null,"abstract":"<div><div>3D part segmentation is a crucial task for various applications, including robotics and shape analysis. Despite advancements in data-driven approaches, supervised methods heavily rely on annotated data, limiting their effectiveness in open-world scenarios and handling out-of-distribution test shapes. To address these challenges, we propose a novel <em>i</em>nteractive <em>Click Seg</em>mentation (<em>iClickSeg</em>) method that achieves zero-shot cross-category 3D part segmentation via an iterative user interaction way. Specifically, our approach simulates user interactions through positive and negative clicks to guide the segmentation process, focusing on regions of interest and allowing for iterative refinement. To achieve this goal, we design a click sampling strategy learn shape-based prior information from point cloud data, enabling better feature encoding between points. Under the learned shape prior, the segmentation model can maintain the topology consistency and boost the performance with a simple PointNet++ network incorporation. For better refinement, we also present a post-processing strategy using outlier removal and heuristic click for obtaining the smooth segments. Extensive experiments on PartNet, PartNetE and S3DIS datasets demonstrate the superiority of iClickSeg over category-level segmentation methods and zero-shot methods. Inference tests on the AKB-48 data further validate the method’s effectiveness and practicality in real-world scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112816"},"PeriodicalIF":7.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.patcog.2025.112810
Ao Hu , Liangjian Wen , Jiang Duan , Yong Dai , Dongkai Wang , Shudong Huang , Jun Wang , Zenglin Xu
Multivariate time series forecasting (MTSF) is crucial for decision-making in various domains but faces challenges due to the low signal-to-noise ratio (SNR) in real-world data. While frequency-domain methods have been employed to address this challenge, they often discard high-frequency components, assuming they are predominantly noise, thereby overlooking valuable short-term and event-driven information. To address this limitation, we propose a novel disentangled representation learning framework that separates high-frequency components into informative signals and noise using mutual information maximization and minimization strategies. We introduce the Frequency Distangle Network (FDNet), which integrates disentanglement with low- and high-frequency decomposition, gated neural networks, and variable relationship fusion to effectively preserve and utilize high-frequency signals. Extensive experiments on 12 real-world MTSF datasets demonstrate that FDNet significantly outperforms leading frequency-domain and time-domain baselines, highlighting the importance of leveraging rather than eliminating high-frequency information. The source code is publicly available at: https://github.com/aohu1105/FDNet.
{"title":"FDNet: High-frequency disentanglement network with information-theoretic guidance for multivariate time series forecasting","authors":"Ao Hu , Liangjian Wen , Jiang Duan , Yong Dai , Dongkai Wang , Shudong Huang , Jun Wang , Zenglin Xu","doi":"10.1016/j.patcog.2025.112810","DOIUrl":"10.1016/j.patcog.2025.112810","url":null,"abstract":"<div><div>Multivariate time series forecasting (MTSF) is crucial for decision-making in various domains but faces challenges due to the low signal-to-noise ratio (SNR) in real-world data. While frequency-domain methods have been employed to address this challenge, they often discard high-frequency components, assuming they are predominantly noise, thereby overlooking valuable short-term and event-driven information. To address this limitation, we propose a novel disentangled representation learning framework that separates high-frequency components into informative signals and noise using mutual information maximization and minimization strategies. We introduce the Frequency Distangle Network (FDNet), which integrates disentanglement with low- and high-frequency decomposition, gated neural networks, and variable relationship fusion to effectively preserve and utilize high-frequency signals. Extensive experiments on 12 real-world MTSF datasets demonstrate that FDNet significantly outperforms leading frequency-domain and time-domain baselines, highlighting the importance of leveraging rather than eliminating high-frequency information. The source code is publicly available at: <span><span>https://github.com/aohu1105/FDNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112810"},"PeriodicalIF":7.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.patcog.2025.112800
Yiding Sun , Haozhe Cheng , Chaoyi Lu , Zhengqiao Li , Minghong Wu , Huimin Lu , Jihua Zhu
Self-supervised learning has made significant progress in Natural Language Processing and Computer Vision. Nevertheless, it encounters significant obstacles in the 3D domain, primarily due to the scarcity of available data and the considerable challenge of effectively capturing hierarchical structures. Current methods in Euclidean space suffer from feature distortion and fail to model the semantic hierarchies inherent in cross-modal data. These challenges motivate us to adopt hyperbolic space, which excels at capturing multi-scale relationships and preserving the geometric structure of complex data. In this paper, we propose HyperPoint, the first multi-modal 3D foundational model in hyperbolic space. By projecting cross-modal features such as 3D point cloud, 2D images, and text into hyperbolic space, we leverage its tree-like properties to encode semantic hierarchies with minimal distortion. Our method integrates generative and contrastive learning while leveraging multi-loss optimization to enhance feature diversity and consistency. HyperPoint achieves a new state-of-the-art in 3D representation learning, e.g., 96.1 % accuracy on ScanObjectNN, 94.1 % accuracy on 10w10s on ModelNet40. Our code is available at: https://github.com/Issac-Sun/HyperPoint.
{"title":"HyperPoint: Multimodal 3D foundation model in hyperbolic space","authors":"Yiding Sun , Haozhe Cheng , Chaoyi Lu , Zhengqiao Li , Minghong Wu , Huimin Lu , Jihua Zhu","doi":"10.1016/j.patcog.2025.112800","DOIUrl":"10.1016/j.patcog.2025.112800","url":null,"abstract":"<div><div>Self-supervised learning has made significant progress in Natural Language Processing and Computer Vision. Nevertheless, it encounters significant obstacles in the 3D domain, primarily due to the scarcity of available data and the considerable challenge of effectively capturing hierarchical structures. Current methods in Euclidean space suffer from feature distortion and fail to model the semantic hierarchies inherent in cross-modal data. These challenges motivate us to adopt hyperbolic space, which excels at capturing multi-scale relationships and preserving the geometric structure of complex data. In this paper, we propose HyperPoint, the first multi-modal 3D foundational model in hyperbolic space. By projecting cross-modal features such as 3D point cloud, 2D images, and text into hyperbolic space, we leverage its tree-like properties to encode semantic hierarchies with minimal distortion. Our method integrates generative and contrastive learning while leveraging multi-loss optimization to enhance feature diversity and consistency. <em>HyperPoint</em> achieves a new state-of-the-art in 3D representation learning, <em>e.g.</em>, <strong>96.1 %</strong> accuracy on ScanObjectNN, <strong>94.1 %</strong> accuracy on <em>10w10s</em> on ModelNet40. Our code is available at: <span><span>https://github.com/Issac-Sun/HyperPoint</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112800"},"PeriodicalIF":7.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-04DOI: 10.1016/j.patcog.2025.112824
Haotian Chi , Zhaogeng Liu , Xing Chen , Bohao Qu , Jifeng Hu , Yuan Jiang , Hechang Chen , Yi Chang
Deep reinforcement learning (DRL) has achieved remarkable success in sequential decision-making tasks such as video games, robotic control, and autonomous driving. State representation learning (SRL) offers a promising avenue to enhance reinforcement learning (RL) by extracting meaningful information from raw data, thereby boosting sample efficiency. However, most existing SRL methods focus on predicting future states, which limits their ability to fully leverage the information in the differences between consecutive states in RL sequences. These differences reflect the environment’s transition dynamics, which are crucial for effective decision-making. However, their categorical diversity makes them difficult to capture using a single mechanism. To overcome this limitation, we introduce a novel state representation learning approach for RL, state transition difference prediction (STDP). Specifically, we establish the STDP framework to forecast state differences, enabling a forward difference model to train two encoders: one that extracts state structures and another that captures the intrinsic relationship between state and action. Furthermore, we design two optional prediction targets within the STDP framework, thoroughly addressing the diversity of state transition differences to develop representations that embody the environment’s dynamics. Finally, we selectively integrate these representations into the value function and policy networks, providing the agent with comprehensive and relevant information for decision-making. Empirical results indicate that STDP improves sample efficiency in both online and offline settings compared to state-of-the-art methods. Additionally, we perform extensive analyses to validate the effectiveness and robustness of STDP.
{"title":"State transition difference prediction for deep reinforcement learning","authors":"Haotian Chi , Zhaogeng Liu , Xing Chen , Bohao Qu , Jifeng Hu , Yuan Jiang , Hechang Chen , Yi Chang","doi":"10.1016/j.patcog.2025.112824","DOIUrl":"10.1016/j.patcog.2025.112824","url":null,"abstract":"<div><div>Deep reinforcement learning (DRL) has achieved remarkable success in sequential decision-making tasks such as video games, robotic control, and autonomous driving. State representation learning (SRL) offers a promising avenue to enhance reinforcement learning (RL) by extracting meaningful information from raw data, thereby boosting sample efficiency. However, most existing SRL methods focus on predicting future states, which limits their ability to fully leverage the information in the differences between consecutive states in RL sequences. These differences reflect the environment’s transition dynamics, which are crucial for effective decision-making. However, their categorical diversity makes them difficult to capture using a single mechanism. To overcome this limitation, we introduce a novel state representation learning approach for RL, state transition difference prediction (STDP). Specifically, we establish the STDP framework to forecast state differences, enabling a forward difference model to train two encoders: one that extracts state structures and another that captures the intrinsic relationship between state and action. Furthermore, we design two optional prediction targets within the STDP framework, thoroughly addressing the diversity of state transition differences to develop representations that embody the environment’s dynamics. Finally, we selectively integrate these representations into the value function and policy networks, providing the agent with comprehensive and relevant information for decision-making. Empirical results indicate that STDP improves sample efficiency in both online and offline settings compared to state-of-the-art methods. Additionally, we perform extensive analyses to validate the effectiveness and robustness of STDP.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112824"},"PeriodicalIF":7.6,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.patcog.2025.112829
Wentao Qu , Lingchen Kong , Linglong Kong , Bei Jiang
Driven by rapidly growing data volumes and increasing demands for real-time analysis, online subspace clustering has emerged as a valuable tool for processing dynamic data streams. However, existing online subspace clustering methods struggle to capture the complex and evolving distribution of such data due to rigid dictionary learning frameworks. In this paper, we propose a novel ℓ0 elastic net subspace clustering model that integrates the ℓ0 norm and the Frobenius norm to achieve the desirable block diagonal property. To enable dynamic adaptation, we further design a fast online alternating direction method of multipliers featuring an innovative dictionary update strategy based on support points–a compact set capturing the underlying data distribution. By selectively updating dictionary atoms guided by the support points, the proposed method dynamically adapts to shifting data characteristics, thereby enhancing adaptability and computational efficiency. Moreover, we provide rigorous convergence guarantees for the algorithm. Extensive numerical experiments demonstrate superior clustering accuracy and computational efficiency of our method, confirming its suitability for real-time and large-scale data processing tasks.
{"title":"Fast online ℓ0 elastic net subspace clustering via a novel dictionary update strategy","authors":"Wentao Qu , Lingchen Kong , Linglong Kong , Bei Jiang","doi":"10.1016/j.patcog.2025.112829","DOIUrl":"10.1016/j.patcog.2025.112829","url":null,"abstract":"<div><div>Driven by rapidly growing data volumes and increasing demands for real-time analysis, online subspace clustering has emerged as a valuable tool for processing dynamic data streams. However, existing online subspace clustering methods struggle to capture the complex and evolving distribution of such data due to rigid dictionary learning frameworks. In this paper, we propose a novel ℓ<sub>0</sub> elastic net subspace clustering model that integrates the ℓ<sub>0</sub> norm and the Frobenius norm to achieve the desirable block diagonal property. To enable dynamic adaptation, we further design a fast online alternating direction method of multipliers featuring an innovative dictionary update strategy based on support points–a compact set capturing the underlying data distribution. By selectively updating dictionary atoms guided by the support points, the proposed method dynamically adapts to shifting data characteristics, thereby enhancing adaptability and computational efficiency. Moreover, we provide rigorous convergence guarantees for the algorithm. Extensive numerical experiments demonstrate superior clustering accuracy and computational efficiency of our method, confirming its suitability for real-time and large-scale data processing tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112829"},"PeriodicalIF":7.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.patcog.2025.112797
Zhan Heng, Maurice Pagnucco, Erik Meijering, Yang Song
Curvilinear structures are ubiquitous in various domains, such as blood vessels in medical images or roads in satellite images. The automation of curvilinear structure segmentation is highly beneficial because of the laborious and error-prone process of manual annotation. Existing methods produce segmentation results with decent pixel-level performance, but still with presence of incorrect connectivity. To overcome the challenge, this paper proposes Curvi-Tracker, a novel refinement framework that improves initial coarse segmentation results by deploying tracker agents on detected foreground pixels. The proposed framework has two main components: a Direction-Net and a Forward-Net, which jointly guide the movement of trackers in order to track the curvilinear object. A Direction-Aware Multi-Label loss and a Stepwise Masked loss are proposed for accurate tracking of curvilinear structures. Experiments on public datasets of various curvilinear objects including retinal vessels, roads and pavement cracks demonstrate that the proposed method consistently improves the topological correctness of coarse segmentation results coarse segmentation results, averaging overall 10 % of improvement in all three topological metrics.
{"title":"Curvi-Tracker: Curvilinear structure segmentation refinement by iterative tracking","authors":"Zhan Heng, Maurice Pagnucco, Erik Meijering, Yang Song","doi":"10.1016/j.patcog.2025.112797","DOIUrl":"10.1016/j.patcog.2025.112797","url":null,"abstract":"<div><div>Curvilinear structures are ubiquitous in various domains, such as blood vessels in medical images or roads in satellite images. The automation of curvilinear structure segmentation is highly beneficial because of the laborious and error-prone process of manual annotation. Existing methods produce segmentation results with decent pixel-level performance, but still with presence of incorrect connectivity. To overcome the challenge, this paper proposes Curvi-Tracker, a novel refinement framework that improves initial coarse segmentation results by deploying tracker agents on detected foreground pixels. The proposed framework has two main components: a Direction-Net and a Forward-Net, which jointly guide the movement of trackers in order to track the curvilinear object. A Direction-Aware Multi-Label loss and a Stepwise Masked loss are proposed for accurate tracking of curvilinear structures. Experiments on public datasets of various curvilinear objects including retinal vessels, roads and pavement cracks demonstrate that the proposed method consistently improves the topological correctness of coarse segmentation results coarse segmentation results, averaging overall 10 % of improvement in all three topological metrics.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112797"},"PeriodicalIF":7.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145738129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.patcog.2025.112758
Zhiguo Long , Yinghao He , Hua Meng , Tianrui Li
Data reconstruction, a key focus in data mining, aims to represent high-dimensional data in low-dimensional spaces while preserving structural integrity for downstream tasks. Spectral embedding methods are widely used for data reconstruction of diverse data structures. However, traditional density-based spectral embedding approaches face two limitations: (1) relying heavily on local structural information (e.g., distances between local neighbors) to characterize similarity, and (2) failing to distinguish different levels of distant relationships of data (known as hierarchical order relationships), potentially distorting original data structures. To address these issues, we propose Hierarchical Order Preserving Spectral Embedding (HORSE). HORSE combines local density estimation and similarities between subclusters to jointly capture local and global structures to improve the similarity measure. To better preserve hierarchical order relationships, HORSE introduces a quadruplet loss function based on hierarchical groups of subclusters to guide the reconstructed data to have a similar hierarchical order relationships with the original data. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in both data reconstruction and clustering.
{"title":"Hierarchical order preserving spectral embedding","authors":"Zhiguo Long , Yinghao He , Hua Meng , Tianrui Li","doi":"10.1016/j.patcog.2025.112758","DOIUrl":"10.1016/j.patcog.2025.112758","url":null,"abstract":"<div><div>Data reconstruction, a key focus in data mining, aims to represent high-dimensional data in low-dimensional spaces while preserving structural integrity for downstream tasks. Spectral embedding methods are widely used for data reconstruction of diverse data structures. However, traditional density-based spectral embedding approaches face two limitations: (1) relying heavily on local structural information (e.g., distances between local neighbors) to characterize similarity, and (2) failing to distinguish different levels of distant relationships of data (known as <em>hierarchical order relationships</em>), potentially distorting original data structures. To address these issues, we propose Hierarchical Order Preserving Spectral Embedding (HORSE). HORSE combines local density estimation and similarities between subclusters to jointly capture local and global structures to improve the similarity measure. To better preserve hierarchical order relationships, HORSE introduces a quadruplet loss function based on hierarchical groups of subclusters to guide the reconstructed data to have a similar hierarchical order relationships with the original data. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of our approach in both data reconstruction and clustering.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112758"},"PeriodicalIF":7.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-03DOI: 10.1016/j.patcog.2025.112804
Shaojun Shi , Yibing Liu , Canyu Zhang , Sisi Wang , Feiping Nie
Multi-view clustering technique utilizes the complementarity and consistency among different view features to divide the samples into different classes. Subspace learning garners considerable attention since it can explore the local structure in different dimensions. Although, multi-view subspace clustering algorithms have obtained remarkable performance, there are still some issues: 1) Nonlinear separable data sets cannot be exactly cut, which makes the flexibility be restricted; 2) The noise and outliers reduce the model robustness; 3) The clustering effectiveness is not outstanding. To solve these problems, this paper proposes a Robust and Flexible Multi-view Subspace Clustering with Nuclear Norm (RFMSC_NN), which integrates Multiple Kernel Learning (MKL) and Low-Rank Representation (LRR) within a cohesive framework. Specifically, firstly, projecting the linearly non-separable data to the Reproducing Kernel Hilbert Space (RKHS); Subsequently, learning a self-representation matrix to measure the similarity among samples; Then, by imposing the low rank constraint to reduce the noise interference; Next, adopting a self-weighted strategy to learn the weights of diverse views; Finally, using the k-means algorithm to obtain the clustering results. An alternate iteration optimization technique is employed to solve the model. Comprehensive experiments are conducted. The experimental results demonstrate enhanced clustering performance comparing with contemporary advanced multi-view clustering approaches.
{"title":"Robust and flexible multi-view subspace clustering with nuclear norm","authors":"Shaojun Shi , Yibing Liu , Canyu Zhang , Sisi Wang , Feiping Nie","doi":"10.1016/j.patcog.2025.112804","DOIUrl":"10.1016/j.patcog.2025.112804","url":null,"abstract":"<div><div>Multi-view clustering technique utilizes the complementarity and consistency among different view features to divide the samples into different classes. Subspace learning garners considerable attention since it can explore the local structure in different dimensions. Although, multi-view subspace clustering algorithms have obtained remarkable performance, there are still some issues: 1) Nonlinear separable data sets cannot be exactly cut, which makes the flexibility be restricted; 2) The noise and outliers reduce the model robustness; 3) The clustering effectiveness is not outstanding. To solve these problems, this paper proposes a Robust and Flexible Multi-view Subspace Clustering with Nuclear Norm (RFMSC_NN), which integrates Multiple Kernel Learning (MKL) and Low-Rank Representation (LRR) within a cohesive framework. Specifically, firstly, projecting the linearly non-separable data to the Reproducing Kernel Hilbert Space (RKHS); Subsequently, learning a self-representation matrix to measure the similarity among samples; Then, by imposing the low rank constraint to reduce the noise interference; Next, adopting a self-weighted strategy to learn the weights of diverse views; Finally, using the k-means algorithm to obtain the clustering results. An alternate iteration optimization technique is employed to solve the model. Comprehensive experiments are conducted. The experimental results demonstrate enhanced clustering performance comparing with contemporary advanced multi-view clustering approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"173 ","pages":"Article 112804"},"PeriodicalIF":7.6,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}