Pub Date : 2026-10-01Epub Date: 2026-03-02DOI: 10.1016/j.csl.2026.101968
Yahao Hu , Wei Tao , Yifei Xie , Tianfeng Wang , Zhisong Pan
Text style transfer involves altering the style of a sentence to a specified style while maintaining the content that is independent of style. A prevalent approach assigns a embedding to each style, facilitating control over the style of the generated sentence. However, in unsupervised learning, vanilla style embedding tends to imitate training corpus characteristics beyond the style attributes, leading to compromised generalization capabilities. Moreover, this approach may struggle to capture the relationships between different styles, thereby further constraining the transfer performance. In this paper, we introduce a novel approach that leverages the prior knowledge of Distinctiveness and Commonness to refine style embedding. Specifically, we employ contrastive learning to achieve distinctiveness by clustering positive samples together and distancing negative samples. Additionally, we explore conventional pooling strategies to extract the stylistic commonality across multiple samples of the same style, ultimately deriving a representative style embedding. Experiments on three benchmark datasets show that our proposed method outperforms several embedding-based baselines, confirming the efficacy of our method.
{"title":"Incorporating prior knowledge into style embedding for unsupervised text style transfer","authors":"Yahao Hu , Wei Tao , Yifei Xie , Tianfeng Wang , Zhisong Pan","doi":"10.1016/j.csl.2026.101968","DOIUrl":"10.1016/j.csl.2026.101968","url":null,"abstract":"<div><div>Text style transfer involves altering the style of a sentence to a specified style while maintaining the content that is independent of style. A prevalent approach assigns a embedding to each style, facilitating control over the style of the generated sentence. However, in unsupervised learning, vanilla style embedding tends to imitate training corpus characteristics beyond the style attributes, leading to compromised generalization capabilities. Moreover, this approach may struggle to capture the relationships between different styles, thereby further constraining the transfer performance. In this paper, we introduce a novel approach that leverages the prior knowledge of <em>Distinctiveness</em> and <em>Commonness</em> to refine style embedding. Specifically, we employ contrastive learning to achieve distinctiveness by clustering positive samples together and distancing negative samples. Additionally, we explore conventional pooling strategies to extract the stylistic commonality across multiple samples of the same style, ultimately deriving a representative style embedding. Experiments on three benchmark datasets show that our proposed method outperforms several embedding-based baselines, confirming the efficacy of our method.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101968"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-10-01Epub Date: 2026-01-29DOI: 10.1016/j.csl.2026.101947
Jin-Seong Choi , Jae-Hong Lee , Joon-Hyuk Chang
Black-box unsupervised domain adaptation (BUDA) presents a challenging scenario in which only unlabeled target data are available, and access to the source model’s parameters is limited. Recent BUDA methods that rely on consistency training struggle with error accumulation caused by fixed source representations. In this paper, we propose a novel framework called bumper-guided representation interpolation (BGRI), which introduces a bumper model that interpolates between the source and target domain representation spaces. Using interpolated representations, the bumper model delivers generalized source information and enables stable and effective knowledge transfer to the target model. Through extensive experiments conducted in real-world scenarios across diverse acoustic and linguistic domains, BGRI consistently outperforms the existing BUDA approaches in terms of adaptation performance and robustness.
{"title":"Bumper-guided representation interpolation for black-box unsupervised domain adaptation","authors":"Jin-Seong Choi , Jae-Hong Lee , Joon-Hyuk Chang","doi":"10.1016/j.csl.2026.101947","DOIUrl":"10.1016/j.csl.2026.101947","url":null,"abstract":"<div><div>Black-box unsupervised domain adaptation (BUDA) presents a challenging scenario in which only unlabeled target data are available, and access to the source model’s parameters is limited. Recent BUDA methods that rely on consistency training struggle with error accumulation caused by fixed source representations. In this paper, we propose a novel framework called bumper-guided representation interpolation (BGRI), which introduces a bumper model that interpolates between the source and target domain representation spaces. Using interpolated representations, the bumper model delivers generalized source information and enables stable and effective knowledge transfer to the target model. Through extensive experiments conducted in real-world scenarios across diverse acoustic and linguistic domains, BGRI consistently outperforms the existing BUDA approaches in terms of adaptation performance and robustness.</div></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":"100 ","pages":"Article 101947"},"PeriodicalIF":3.4,"publicationDate":"2026-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147385935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-09-01Epub Date: 2026-02-09DOI: 10.1016/j.comgeo.2026.102256
Minati De , Sambhav Khurana , Satyam Singh
We study the online minimum dominating set and minimum coloring problems in the context of geometric intersection graphs. We consider a graph parameter: the independent kissing number ζ, which is the number equal to “the size of the largest induced star in the graph −1”. For a graph with an independent kissing number of ζ, we show that the famous greedy algorithm achieves an optimal competitive ratio of ζ for the minimum dominating set and the minimum independent dominating set problems. However, for the minimum connected dominating set problem, we obtain a competitive ratio of at most 2ζ. To complement this, we prove that for the minimum connected dominating set problem, any deterministic online algorithm achieves a competitive ratio of at least , for the geometric intersection graph of translates of a convex object in . Next, for the minimum coloring problem, we obtain an algorithm with a competitive ratio of for geometric intersection graphs of α-fat objects in having widths in , where is the independent kissing number of the geometric intersection graph of α-fat objects having widths in . Finally, we investigate the value of ζ for geometric intersection graphs of various families of geometric objects.
{"title":"Online dominating set and coloring for geometric intersection graphs","authors":"Minati De , Sambhav Khurana , Satyam Singh","doi":"10.1016/j.comgeo.2026.102256","DOIUrl":"10.1016/j.comgeo.2026.102256","url":null,"abstract":"<div><div>We study the online minimum dominating set and minimum coloring problems in the context of geometric intersection graphs. We consider a graph parameter: the independent kissing number <em>ζ</em>, which is the number equal to “the size of the largest induced star in the graph −1”. For a graph with an independent kissing number of <em>ζ</em>, we show that the famous greedy algorithm achieves an optimal competitive ratio of <em>ζ</em> for the minimum dominating set and the minimum independent dominating set problems. However, for the minimum connected dominating set problem, we obtain a competitive ratio of at most 2<em>ζ</em>. To complement this, we prove that for the minimum connected dominating set problem, any deterministic online algorithm achieves a competitive ratio of at least <span><math><mn>2</mn><mo>(</mo><mi>ζ</mi><mo>−</mo><mn>1</mn><mo>)</mo></math></span>, for the geometric intersection graph of translates of a convex object in <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>. Next, for the minimum coloring problem, we obtain an algorithm with a competitive ratio of <span><math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>ζ</mi></mrow><mrow><mo>′</mo></mrow></msup><mi>log</mi><mo></mo><mi>m</mi><mo>)</mo></mrow></math></span> for geometric intersection graphs of <em>α</em>-fat objects in <span><math><msup><mrow><mi>R</mi></mrow><mrow><mi>d</mi></mrow></msup></math></span> having widths in <span><math><mo>[</mo><mn>1</mn><mo>,</mo><mi>m</mi><mo>]</mo></math></span>, where <span><math><msup><mrow><mi>ζ</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> is the independent kissing number of the geometric intersection graph of <em>α</em>-fat objects having widths in <span><math><mo>[</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>]</mo></math></span>. Finally, we investigate the value of <em>ζ</em> for geometric intersection graphs of various families of geometric objects.</div></div>","PeriodicalId":51001,"journal":{"name":"Computational Geometry-Theory and Applications","volume":"134 ","pages":"Article 102256"},"PeriodicalIF":0.7,"publicationDate":"2026-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-09-01Epub Date: 2026-02-09DOI: 10.1016/j.comgeo.2026.102255
Franz J. Brandenburg
A graph is an optimal right angle crossing graph (also called an optimal RAC graph for short) if it has n vertices and 4n–10 edges and admits a straight-line drawing in the plane such that each edge is crossed at most once and edges cross only at a right angle. This implies that the drawing is 3T- or TTX-framed, that is, the outer face is a triangle that is adjacent to three triangles or to two triangles and a crossing. An optimal pseudo-RAC graph is the topological version of an optimal RAC graph, where the restrictions to straight-line edges and right angle crossings are dropped.
We show that every 3T-framed optimal pseudo-RAC graph is an optimal RAC graph, that is, 3T-framed optimal pseudo-RAC embeddings can be stretched and orthogonalized. This is not true for TTX-framed embeddings. There are n-vertex 3T- and TTX-framed optimal RAC graphs for every , and eleven optimal RAC and fourteen optimal pseudo-RAC graphs with at most eight vertices. Optimal pseudo-RAC graphs can be recognized in time, where the recognition algorithm demonstrates that every optimal pseudo-RAC graph has at most three 1-planar embeddings, in which edges are crossed at most once.
{"title":"Optimal right angles crossing graphs","authors":"Franz J. Brandenburg","doi":"10.1016/j.comgeo.2026.102255","DOIUrl":"10.1016/j.comgeo.2026.102255","url":null,"abstract":"<div><div>A graph is an <em>optimal right angle crossing graph</em> (also called an optimal RAC graph for short) if it has n vertices and 4n–10 edges and admits a straight-line drawing in the plane such that each edge is crossed at most once and edges cross only at a right angle. This implies that the drawing is <em>3T-</em> or <em>TTX-framed</em>, that is, the outer face is a triangle that is adjacent to three triangles or to two triangles and a crossing. An optimal <em>pseudo-RAC graph</em> is the topological version of an optimal RAC graph, where the restrictions to straight-line edges and right angle crossings are dropped.</div><div>We show that every 3T-framed optimal pseudo-RAC graph is an optimal RAC graph, that is, 3T-framed optimal pseudo-RAC embeddings can be stretched and orthogonalized. This is not true for TTX-framed embeddings. There are <em>n</em>-vertex 3T- and TTX-framed optimal RAC graphs for every <span><math><mi>n</mi><mo>≥</mo><mn>9</mn></math></span>, and eleven optimal RAC and fourteen optimal pseudo-RAC graphs with at most eight vertices. Optimal pseudo-RAC graphs can be recognized in <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>3</mn></mrow></msup><mo>)</mo></math></span> time, where the recognition algorithm demonstrates that every optimal pseudo-RAC graph has at most three 1-planar embeddings, in which edges are crossed at most once.</div></div>","PeriodicalId":51001,"journal":{"name":"Computational Geometry-Theory and Applications","volume":"134 ","pages":"Article 102255"},"PeriodicalIF":0.7,"publicationDate":"2026-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-08-01Epub Date: 2026-01-29DOI: 10.1016/j.patcog.2026.113191
Qing-Ling Shu , Si-Bao Chen , Xiao Wang , Zhi-Hui You , Wei Lu , Jin Tang , Bin Luo
Accurate detection of road and bridge changes is crucial for urban planning and transportation management, yet presents unique challenges for general change detection (CD). Key difficulties arise from maintaining the continuity of roads and bridges as linear structures and disambiguating visually similar land covers (e.g., road construction vs. bare land). Existing spatial-domain models struggle with these issues, further hindered by the lack of specialized, semantically rich datasets. To fill these gaps, we introduce the Road and Bridge Semantic Change Detection (RB-SCD) dataset. Unlike existing benchmarks that primarily focus on general land cover changes, RB-SCD is the first to systematically target 11 specific semantic change transition types (e.g., water → bridge) anchored to traffic infrastructure. This enables a detailed analysis of traffic infrastructure evolution. Building on this, we propose a novel framework, the Multimodal Frequency-Driven Change Detector (MFDCD). MFDCD integrates multimodal features in the frequency domain through two key components: (1) the Dynamic Frequency Coupler (DFC), which leverages wavelet transform to decompose visual features, enabling it to robustly model the continuity of linear transitions; and (2) the Textual Frequency Filter (TFF), which encodes semantic priors into frequency-domain graphs and applies filter banks to align them with visual features, resolving semantic ambiguities. Experiments demonstrate the state-of-the-art performance of MFDCD on RB-SCD and three public CD datasets. The code will be available at https://github.com/DaGuangDaGuang/RB-SCD.
{"title":"Semantic change detection of roads and bridges: A fine-grained dataset and multimodal frequency-driven detector","authors":"Qing-Ling Shu , Si-Bao Chen , Xiao Wang , Zhi-Hui You , Wei Lu , Jin Tang , Bin Luo","doi":"10.1016/j.patcog.2026.113191","DOIUrl":"10.1016/j.patcog.2026.113191","url":null,"abstract":"<div><div>Accurate detection of road and bridge changes is crucial for urban planning and transportation management, yet presents unique challenges for general change detection (CD). Key difficulties arise from maintaining the continuity of roads and bridges as linear structures and disambiguating visually similar land covers (e.g., road construction vs. bare land). Existing spatial-domain models struggle with these issues, further hindered by the lack of specialized, semantically rich datasets. To fill these gaps, we introduce the Road and Bridge Semantic Change Detection (RB-SCD) dataset. Unlike existing benchmarks that primarily focus on general land cover changes, RB-SCD is the first to systematically target 11 specific semantic change transition types (e.g., water → bridge) anchored to traffic infrastructure. This enables a detailed analysis of traffic infrastructure evolution. Building on this, we propose a novel framework, the Multimodal Frequency-Driven Change Detector (MFDCD). MFDCD integrates multimodal features in the frequency domain through two key components: (1) the Dynamic Frequency Coupler (DFC), which leverages wavelet transform to decompose visual features, enabling it to robustly model the continuity of linear transitions; and (2) the Textual Frequency Filter (TFF), which encodes semantic priors into frequency-domain graphs and applies filter banks to align them with visual features, resolving semantic ambiguities. Experiments demonstrate the state-of-the-art performance of MFDCD on RB-SCD and three public CD datasets. The code will be available at <span><span>https://github.com/DaGuangDaGuang/RB-SCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113191"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-08-01Epub Date: 2026-02-06DOI: 10.1016/j.patcog.2026.113218
Yang Zhang , Jie Shi , Yanfang Liu , Hong Zhao
Feature selection for long-tailed data has become a research hotspot due to high-dimensional features and imbalanced distributions in real-world data. Although some of them effectively balance the data, correctly classifying tail classes and distinguishing easy-confused classes in long-tailed data are still two significant challenges. To address these issues, we propose a distribution-aware hierarchical feature selection method for long-tailed classification (DAFS). First, we embed sample distribution-based punishment coefficients into loss and regularization terms to balance feature weights for head and tail classes, which enhances the accuracy of classifying tail classes. Then, we use multi-granularity knowledge and similarities among classes to design feature differentiation regularization terms for improving the distinguishability of easy-confused classes. Finally, extensive experimental results demonstrate that DAFS outperforms the other ten traditional and advanced feature selection methods on different datasets.
{"title":"DAFS: A distribution-aware hierarchical feature selection method for long-tailed classification","authors":"Yang Zhang , Jie Shi , Yanfang Liu , Hong Zhao","doi":"10.1016/j.patcog.2026.113218","DOIUrl":"10.1016/j.patcog.2026.113218","url":null,"abstract":"<div><div>Feature selection for long-tailed data has become a research hotspot due to high-dimensional features and imbalanced distributions in real-world data. Although some of them effectively balance the data, correctly classifying tail classes and distinguishing easy-confused classes in long-tailed data are still two significant challenges. To address these issues, we propose a distribution-aware hierarchical feature selection method for long-tailed classification (DAFS). First, we embed sample distribution-based punishment coefficients into loss and regularization terms to balance feature weights for head and tail classes, which enhances the accuracy of classifying tail classes. Then, we use multi-granularity knowledge and similarities among classes to design feature differentiation regularization terms for improving the distinguishability of easy-confused classes. Finally, extensive experimental results demonstrate that DAFS outperforms the other ten traditional and advanced feature selection methods on different datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113218"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-08-01Epub Date: 2026-02-06DOI: 10.1016/j.patcog.2026.113223
Linlin Ma , Hui Li , Xincheng Liu , Huihui Chu , Yue Guan , Yuzhen Zhao , Yawen Chen , Da Wang , Wenke Zang
The Density Peak Clustering (DPC) algorithm is simple and efficient. But DPC and its variants identify clusters only by identifying the centers of single or multiple sparse clusters without considering the coherence of the clustering structure, which tends to result in clusters that cannot be accurately captured. In addition, relative distance and density are only used to identify the centers of clusters and do not provide a description of the relative positions of the remaining sample points. To address these issues, this paper proposes an adaptive density peak clustering algorithm based on centrifugal degree (CD-DPC). The centrifugal degree reflects the relative position of the sample points in the cluster. The CD-DPC categorizes sample points into support, structural, coherent and decoration points based on centrifugal degree. Based on this, the number of clusters is automatically obtained by using different association methods for sample points with different centrifugal degrees, which greatly reduces the influence of human factors. Finally, the clustering results are further improved by introducing shared nearest neighbors for the final association of decorated points. Extensive experiments on synthetic and UCI datasets show that this algorithm outperforms other comparative algorithms.
{"title":"CD-DPC: Centrifugal degree based density peaks clustering algorithm","authors":"Linlin Ma , Hui Li , Xincheng Liu , Huihui Chu , Yue Guan , Yuzhen Zhao , Yawen Chen , Da Wang , Wenke Zang","doi":"10.1016/j.patcog.2026.113223","DOIUrl":"10.1016/j.patcog.2026.113223","url":null,"abstract":"<div><div>The Density Peak Clustering (DPC) algorithm is simple and efficient. But DPC and its variants identify clusters only by identifying the centers of single or multiple sparse clusters without considering the coherence of the clustering structure, which tends to result in clusters that cannot be accurately captured. In addition, relative distance and density are only used to identify the centers of clusters and do not provide a description of the relative positions of the remaining sample points. To address these issues, this paper proposes an adaptive density peak clustering algorithm based on centrifugal degree (CD-DPC). The centrifugal degree reflects the relative position of the sample points in the cluster. The CD-DPC categorizes sample points into support, structural, coherent and decoration points based on centrifugal degree. Based on this, the number of clusters is automatically obtained by using different association methods for sample points with different centrifugal degrees, which greatly reduces the influence of human factors. Finally, the clustering results are further improved by introducing shared nearest neighbors for the final association of decorated points. Extensive experiments on synthetic and UCI datasets show that this algorithm outperforms other comparative algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113223"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monocular 3D object detection has gained significant attention due to its cost-effectiveness and practicality in real-world applications. However, existing monocular methods often struggle with depth estimation and spatial consistency, limiting their accuracy in complex environments. In this work, we introduce a Temporal Deep Feature Learning framework, which enhances monocular 3D object detection by integrating temporal features across sequential frames. Our approach leverages a novel deep feature auxiliary module based on convolutional recurrent structures, effectively capturing spatiotemporal information to improve depth perception and detection robustness. The proposed module is model-agnostic and can be seamlessly integrated into various existing monocular detection frameworks. Extensive experiments across multiple state-of-the-art monocular 3D object detection models demonstrate consistent performance improvements, particularly in detecting small or partially occluded objects. Our results highlight the effectiveness and generalizability of the proposed approach, making it a promising solution for real-world autonomous perception systems. The source code of this work is at: https://github.com/Shuray36/MonoTDF-Temporal-Deep-Feature-Learning-for-Generalizable-Monocular-3D-Object-Detection.
{"title":"MonoTDF: Temporal deep feature learning for generalizable monocular 3D object detection","authors":"Xiu-Zhi Chen , Yi-Kai Chiu , Chih-Sheng Huang , Yen-Lin Chen","doi":"10.1016/j.patcog.2026.113184","DOIUrl":"10.1016/j.patcog.2026.113184","url":null,"abstract":"<div><div>Monocular 3D object detection has gained significant attention due to its cost-effectiveness and practicality in real-world applications. However, existing monocular methods often struggle with depth estimation and spatial consistency, limiting their accuracy in complex environments. In this work, we introduce a Temporal Deep Feature Learning framework, which enhances monocular 3D object detection by integrating temporal features across sequential frames. Our approach leverages a novel deep feature auxiliary module based on convolutional recurrent structures, effectively capturing spatiotemporal information to improve depth perception and detection robustness. The proposed module is model-agnostic and can be seamlessly integrated into various existing monocular detection frameworks. Extensive experiments across multiple state-of-the-art monocular 3D object detection models demonstrate consistent performance improvements, particularly in detecting small or partially occluded objects. Our results highlight the effectiveness and generalizability of the proposed approach, making it a promising solution for real-world autonomous perception systems. The source code of this work is at: <span><span>https://github.com/Shuray36/MonoTDF-Temporal-Deep-Feature-Learning-for-Generalizable-Monocular-3D-Object-Detection</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113184"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-08-01Epub Date: 2026-02-07DOI: 10.1016/j.patcog.2026.113155
Yunxiu Zhao , Shigang Wang , Feiyong Jia , Honghua Li , Jinyang Wu , Jian Wei , Yan Zhao
Scale-dependent approaches have shown great potential in diagnosing autism spectrum disorder (ASD). However, such methods often involve lengthy evaluation procedures and require substantial resources, including trained professionals and specialized equipment, which significantly limit their scalability and feasibility for large-scale or routine clinical assessments. In this paper, we propose a novel multimodal behavioral signal analysis (MBSA) approach for the intelligent assessment of ASD. Specifically, we first leverage speech and visual cues to identify the Target Movement Area (TMA), thereby enhancing recognition efficiency. Then, an adaptive fine-tuning strategy is employed to improve the generalization and efficiency of pre-trained models in small-sample action recognition tasks. An attention-based detection method is further incorporated to strengthen the semantic understanding of observed behavioral patterns. To enable effective ASD classification, we develop a behavioral quantification scoring method that structurally models the relationship between behavioral features and disease indicators. We collected a multimodal behavioral database of 160 participants in a real clinical setting and assessed ASD using this data. Extensive experiments demonstrate that the proposed MBSA approach significantly outperforms many state-of-the-art methods. With competitive performance and a solid theoretical foundation, MBSA provides a practical and scalable solution for ASD screening and holds promise for broader applications in the intelligent diagnosis of other neurodevelopmental disorders.
{"title":"Multimodal behavioral analysis for autism spectrum disorder assessment","authors":"Yunxiu Zhao , Shigang Wang , Feiyong Jia , Honghua Li , Jinyang Wu , Jian Wei , Yan Zhao","doi":"10.1016/j.patcog.2026.113155","DOIUrl":"10.1016/j.patcog.2026.113155","url":null,"abstract":"<div><div>Scale-dependent approaches have shown great potential in diagnosing autism spectrum disorder (ASD). However, such methods often involve lengthy evaluation procedures and require substantial resources, including trained professionals and specialized equipment, which significantly limit their scalability and feasibility for large-scale or routine clinical assessments. In this paper, we propose a novel multimodal behavioral signal analysis (MBSA) approach for the intelligent assessment of ASD. Specifically, we first leverage speech and visual cues to identify the Target Movement Area (TMA), thereby enhancing recognition efficiency. Then, an adaptive fine-tuning strategy is employed to improve the generalization and efficiency of pre-trained models in small-sample action recognition tasks. An attention-based detection method is further incorporated to strengthen the semantic understanding of observed behavioral patterns. To enable effective ASD classification, we develop a behavioral quantification scoring method that structurally models the relationship between behavioral features and disease indicators. We collected a multimodal behavioral database of 160 participants in a real clinical setting and assessed ASD using this data. Extensive experiments demonstrate that the proposed MBSA approach significantly outperforms many state-of-the-art methods. With competitive performance and a solid theoretical foundation, MBSA provides a practical and scalable solution for ASD screening and holds promise for broader applications in the intelligent diagnosis of other neurodevelopmental disorders.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113155"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-08-01Epub Date: 2026-01-29DOI: 10.1016/j.patcog.2026.113192
Tingting Hang , Ya Guo , Jun Huang , Yirui Wu , Umapada Pal , Shivakumara Palaiahnakote
Continual Relation Extraction (CRE) has achieved significant success due to its ability to adapt to new relations without frequent retraining. However, existing methods still face challenges such as overfitting and representation bias. Inspired by the wake-sleep memory consolidation process of the human brain, this paper proposes a Wake-Sleep Memory Consolidation (WSMC) framework to address these issues systematically. During the wake phase, the model simulates the brain’s information processing mechanism, quickly encoding new relations and storing them in short-term memory. We also introduce the Experience Iterative Learning (EIL) approach, which dynamically adjusts the distribution of relation samples. This approach corrects the model’s representation bias and enhances memory stability through experience replay. During the sleep phase, the model consolidates existing knowledge by replaying long-term memory. Moreover, the framework generates diverse dream data from existing memory sets, thereby increasing the diversity of the training data and improving the model’s generalization capability. Experimental results show that WSMC significantly outperforms other CRE baseline methods on FewRel and TACRED datasets, demonstrating its superior performance compared to baseline methods. Our source code is available at https://github.com/Gyanis9/WSMC.git.
{"title":"Continual relation extraction with wake-sleep memory consolidation","authors":"Tingting Hang , Ya Guo , Jun Huang , Yirui Wu , Umapada Pal , Shivakumara Palaiahnakote","doi":"10.1016/j.patcog.2026.113192","DOIUrl":"10.1016/j.patcog.2026.113192","url":null,"abstract":"<div><div>Continual Relation Extraction (CRE) has achieved significant success due to its ability to adapt to new relations without frequent retraining. However, existing methods still face challenges such as overfitting and representation bias. Inspired by the wake-sleep memory consolidation process of the human brain, this paper proposes a <strong>W</strong>ake-<strong>S</strong>leep <strong>M</strong>emory <strong>C</strong>onsolidation (WSMC) framework to address these issues systematically. During the wake phase, the model simulates the brain’s information processing mechanism, quickly encoding new relations and storing them in short-term memory. We also introduce the Experience Iterative Learning (EIL) approach, which dynamically adjusts the distribution of relation samples. This approach corrects the model’s representation bias and enhances memory stability through experience replay. During the sleep phase, the model consolidates existing knowledge by replaying long-term memory. Moreover, the framework generates diverse dream data from existing memory sets, thereby increasing the diversity of the training data and improving the model’s generalization capability. Experimental results show that WSMC significantly outperforms other CRE baseline methods on FewRel and TACRED datasets, demonstrating its superior performance compared to baseline methods. Our source code is available at <span><span>https://github.com/Gyanis9/WSMC.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"176 ","pages":"Article 113192"},"PeriodicalIF":7.6,"publicationDate":"2026-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}