Accurate traffic flow forecasting serves as a cornerstone for intelligent transportation systems, enabling proactive accident prevention and metropolitan mobility optimization. However, existing approaches face fundamental limitations in modeling the spatiotemporal heterogeneity of traffic dynamics, particularly in simultaneously addressing (1) the decaying significance of temporal dependencies across input sequences and prediction horizons, (2) multi-scale spatial interactions spanning local congestion patterns and global functional correlations, and (3) inter-sample temporal variance in evolving traffic states. To address these limitations, this paper proposes MVA-DCNet (Multi-View Attention Dilated Convolutional Network), a novel deep learning architecture incorporating a multidimensional temporal analysis framework that systematically examines temporal influence mechanisms through three complementary perspectives: inter-sample variance, intra-sequence temporal importance, and output sequence temporal propagation. The proposed model systematically addresses temporal data heterogeneity through three innovative mechanisms: variance-aware data augmentation, adaptive temporal attention, and decaying loss weighting. For enhanced spatial correlation modeling, we develop a dilated convolutional architecture with enhanced receptive field coverage and multi-scale spatial pattern recognition capabilities. Empirical validation on two urban traffic datasets demonstrates superior efficacy in capturing complex spatiotemporal evolution patterns, achieving relative reductions of 12.7% and 9.3% in Root Mean Square Error (RMSE) respectively compared with state-of-the-art benchmarks.
{"title":"Enhancing traffic flow prediction through multi-view attention mechanism and dilated convolutional networks","authors":"Wei Li, Hao Wei, Xin Liu, Jialin Liu, Dazhi Zhan, Xiao Han, Wei Tao","doi":"10.1007/s40747-025-02146-7","DOIUrl":"https://doi.org/10.1007/s40747-025-02146-7","url":null,"abstract":"Accurate traffic flow forecasting serves as a cornerstone for intelligent transportation systems, enabling proactive accident prevention and metropolitan mobility optimization. However, existing approaches face fundamental limitations in modeling the spatiotemporal heterogeneity of traffic dynamics, particularly in simultaneously addressing (1) the decaying significance of temporal dependencies across input sequences and prediction horizons, (2) multi-scale spatial interactions spanning local congestion patterns and global functional correlations, and (3) inter-sample temporal variance in evolving traffic states. To address these limitations, this paper proposes MVA-DCNet (Multi-View Attention Dilated Convolutional Network), a novel deep learning architecture incorporating a multidimensional temporal analysis framework that systematically examines temporal influence mechanisms through three complementary perspectives: inter-sample variance, intra-sequence temporal importance, and output sequence temporal propagation. The proposed model systematically addresses temporal data heterogeneity through three innovative mechanisms: variance-aware data augmentation, adaptive temporal attention, and decaying loss weighting. For enhanced spatial correlation modeling, we develop a dilated convolutional architecture with enhanced receptive field coverage and multi-scale spatial pattern recognition capabilities. Empirical validation on two urban traffic datasets demonstrates superior efficacy in capturing complex spatiotemporal evolution patterns, achieving relative reductions of 12.7% and 9.3% in Root Mean Square Error (RMSE) respectively compared with state-of-the-art benchmarks.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"17 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145752822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1007/s40747-025-02143-w
Summra Saleem, Muhammad Nabeel Asim, Andreas Dengel
Within software development life-cycle, requirements guide the entire development process from inception to completion by ensuring alignment between stakeholder expectations and the final product. Requirements extraction from miscellaneous information is a challenging and complex task. Manual extraction of requirements is not only prone to human error but also contributes to increased project costs and delayed project timelines. To automate the requirement extraction process, researchers have investigated the potential of deep learning architectures, large language models (LLM) and generative language models such as ChatGPT and Gemini. However, the performance of requirements extraction could be further enhanced through the development of predictive pipelines by utilizing the combined potential of language models and deep learning architectures. To develop a powerful AI application for requirements extraction by utilizing the combined potential of LLMs and DL architectures, this study presents ReqNet framework. The framework encompasses 7 most widely used LLMs variants (small, large, Xlarge, XXlarge) and 2 DL architectures (LSTM, GRU). The framework facilitates the development of three distinct types predictive pipelines, namely standalone LLMs, LLMs + external classifiers and an ensemble of multiple LLMs representation + external classifiers. Extensive experimentation of 48 predictive pipelines across 2 public core datasets and 1 independent test set, demonstrates that predictive pipelines made up from LLMs and DL architectures generally exhibited superior performance compared to pipelines solely reliant on LLMs. In addition, a ensemble of three distinct LLMs (ALBERT, BERT and XLNet) and LSTM classifier achieved a 3% improvement in F1-score over state-of-the-art predictors on the PURE dataset, a 10% improvement on the Dronology dataset and a 3% improvement on the RFI independent test set.
{"title":"ReqNet: an LLM-driven computational framework for automated requirements extraction from unstructured documents","authors":"Summra Saleem, Muhammad Nabeel Asim, Andreas Dengel","doi":"10.1007/s40747-025-02143-w","DOIUrl":"https://doi.org/10.1007/s40747-025-02143-w","url":null,"abstract":"Within software development life-cycle, requirements guide the entire development process from inception to completion by ensuring alignment between stakeholder expectations and the final product. Requirements extraction from miscellaneous information is a challenging and complex task. Manual extraction of requirements is not only prone to human error but also contributes to increased project costs and delayed project timelines. To automate the requirement extraction process, researchers have investigated the potential of deep learning architectures, large language models (LLM) and generative language models such as ChatGPT and Gemini. However, the performance of requirements extraction could be further enhanced through the development of predictive pipelines by utilizing the combined potential of language models and deep learning architectures. To develop a powerful AI application for requirements extraction by utilizing the combined potential of LLMs and DL architectures, this study presents ReqNet framework. The framework encompasses 7 most widely used LLMs variants (small, large, Xlarge, XXlarge) and 2 DL architectures (LSTM, GRU). The framework facilitates the development of three distinct types predictive pipelines, namely standalone LLMs, LLMs + external classifiers and an ensemble of multiple LLMs representation + external classifiers. Extensive experimentation of 48 predictive pipelines across 2 public core datasets and 1 independent test set, demonstrates that predictive pipelines made up from LLMs and DL architectures generally exhibited superior performance compared to pipelines solely reliant on LLMs. In addition, a ensemble of three distinct LLMs (ALBERT, BERT and XLNet) and LSTM classifier achieved a 3% improvement in F1-score over state-of-the-art predictors on the PURE dataset, a 10% improvement on the Dronology dataset and a 3% improvement on the RFI independent test set.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145753180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-15DOI: 10.1007/s40747-025-02152-9
Wanchun Sun, Shujia Li, Xinyu Duan
The core challenge in image-level weakly supervised semantic segmentation lies in generating high-quality object localization maps from simple image labels. Class Activation Map (CAM) produced by existing methods commonly suffer from two major flaws: incomplete coverage of target regions and severe background interference. To address these issues, we present a CAM-native perception-optimization framework for weakly supervised semantic segmentation. First, design a CAM generation mechanism guided by image-level weak supervision, which refines activated regions via discriminative region enhancement and spatial noise suppression. This process promotes fine-grained pixel clustering and improves the completeness of object localization. Second, introduce a spatial cue generator to enhance the adaptability of class representations, coupled with an inter-class relation propagation module that explicitly models inter-class relationships to suppress erroneous activations and significantly reduce spatial noise. Additionally, incorporate a dynamic contrastive matching strategy to eliminate background activations closely associated with the target object, ultimately producing class activation maps that are both complete and compact. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 show that our method substantially outperforms existing weakly supervised approaches, validating the effectiveness of class-aware guidance and inter-class relational modeling in improving segmentation accuracy.
{"title":"Seed perception learning for weakly supervised semantic segmentation","authors":"Wanchun Sun, Shujia Li, Xinyu Duan","doi":"10.1007/s40747-025-02152-9","DOIUrl":"https://doi.org/10.1007/s40747-025-02152-9","url":null,"abstract":"The core challenge in image-level weakly supervised semantic segmentation lies in generating high-quality object localization maps from simple image labels. Class Activation Map (CAM) produced by existing methods commonly suffer from two major flaws: incomplete coverage of target regions and severe background interference. To address these issues, we present a CAM-native perception-optimization framework for weakly supervised semantic segmentation. First, design a CAM generation mechanism guided by image-level weak supervision, which refines activated regions via discriminative region enhancement and spatial noise suppression. This process promotes fine-grained pixel clustering and improves the completeness of object localization. Second, introduce a spatial cue generator to enhance the adaptability of class representations, coupled with an inter-class relation propagation module that explicitly models inter-class relationships to suppress erroneous activations and significantly reduce spatial noise. Additionally, incorporate a dynamic contrastive matching strategy to eliminate background activations closely associated with the target object, ultimately producing class activation maps that are both complete and compact. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 show that our method substantially outperforms existing weakly supervised approaches, validating the effectiveness of class-aware guidance and inter-class relational modeling in improving segmentation accuracy.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"148 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145752823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-10DOI: 10.1007/s40747-025-02186-z
Yang Lian, Ruizhi Han, Shiyuan Han, Defu Qiu, Jin Zhou
Skin cancer research is essential to finding new treatments and improving survival rates in computer-aided medicine. Within this research, the accurate segmentation of skin lesion images is an important step for both early diagnosis and personalized treatment strategies. However, while current popular Transformer-based models have achieved competitive segmentation results, they often ignore the computational complexity and the high costs associated with their training. In this paper, we propose a lightweight network, a multi-scale atrous attention network for skin lesion segmentation (MAAN). Firstly, we optimize the residual basic block by constructing a dual-path framework with both high and low-resolution paths, which reduces the number of parameters while maintaining effective feature extraction capability. Secondly, to better capture the information in the skin lesion images and further improve the model performance, we design an adaptive multi-scale atrous attention module at the final stage of the low-resolution path. The experiments conducted on the ISIC 2017 and ISIC2018 datasets show that the proposed model MAAN achieves mIoU of 85.20 and 85.67% respectively, outperforming recent MHorNet while maintaining only 0.37M parameters and 0.23G FLOPs computational complexity. Additionally, through ablation studies, we demonstrate that the AMAA module can work as a plug-and-play module for performance improvement on CNN-based methods.
{"title":"MAAN: multi-scale atrous attention network for skin lesion segmentation","authors":"Yang Lian, Ruizhi Han, Shiyuan Han, Defu Qiu, Jin Zhou","doi":"10.1007/s40747-025-02186-z","DOIUrl":"https://doi.org/10.1007/s40747-025-02186-z","url":null,"abstract":"Skin cancer research is essential to finding new treatments and improving survival rates in computer-aided medicine. Within this research, the accurate segmentation of skin lesion images is an important step for both early diagnosis and personalized treatment strategies. However, while current popular Transformer-based models have achieved competitive segmentation results, they often ignore the computational complexity and the high costs associated with their training. In this paper, we propose a lightweight network, a multi-scale atrous attention network for skin lesion segmentation (MAAN). Firstly, we optimize the residual basic block by constructing a dual-path framework with both high and low-resolution paths, which reduces the number of parameters while maintaining effective feature extraction capability. Secondly, to better capture the information in the skin lesion images and further improve the model performance, we design an adaptive multi-scale atrous attention module at the final stage of the low-resolution path. The experiments conducted on the ISIC 2017 and ISIC2018 datasets show that the proposed model MAAN achieves mIoU of 85.20 and 85.67% respectively, outperforming recent MHorNet while maintaining only 0.37M parameters and 0.23G FLOPs computational complexity. Additionally, through ablation studies, we demonstrate that the AMAA module can work as a plug-and-play module for performance improvement on CNN-based methods.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"22 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145711460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-09DOI: 10.1007/s40747-025-02156-5
Hongguan Hu, Jianjun Peng, Zhidong Xiao, Li Guo, Yi Hu, Di Wu
Continuous Sign Language Recognition (CSLR) is fundamental to bridging the communication gap between hearing-impaired individuals and the broader society. The primary challenge lies in effectively modeling the complex spatial-temporal dynamic features in sign language videos. Current approaches typically employ independent processing strategies for motion feature extraction and temporal modeling, which impedes the unified modeling of action continuity and semantic integrity in sign language sequences. To address these limitations, we propose the Motion-Temporal Calibration Network (MTCNet), a novel framework for continuous sign language recognition that integrates dynamic feature enhancement and temporal calibration. The framework consists of two key innovative modules. First, the Cross-Frame Motion Refinement (CFMR) module implements an inter-frame differential attention mechanism combined with residual learning strategies, enabling precise motion feature modeling and effective enhancement of dynamic information between adjacent frames. Second, the Temporal-Channel Adaptive Recalibration (TCAR) module utilizes adaptive convolution kernel design and a dual-branch feature extraction architecture, facilitating joint optimization in both temporal and channel dimensions. In experimental evaluations, our method demonstrates competitive performance on the widely-used PHOENIX-2014 and PHOENIX-2014-T datasets, achieving results comparable to leading unimodal approaches. Moreover, it achieves state-of-the-art performance on the Chinese Sign Language (CSL) dataset. Through comprehensive ablation studies and quantitative analysis, we validate the effectiveness of our proposed method in fine-grained dynamic feature modeling and long-term dependency capture while maintaining computational efficiency.
{"title":"Motion-temporal calibration network for continuous sign language recognition","authors":"Hongguan Hu, Jianjun Peng, Zhidong Xiao, Li Guo, Yi Hu, Di Wu","doi":"10.1007/s40747-025-02156-5","DOIUrl":"https://doi.org/10.1007/s40747-025-02156-5","url":null,"abstract":"Continuous Sign Language Recognition (CSLR) is fundamental to bridging the communication gap between hearing-impaired individuals and the broader society. The primary challenge lies in effectively modeling the complex spatial-temporal dynamic features in sign language videos. Current approaches typically employ independent processing strategies for motion feature extraction and temporal modeling, which impedes the unified modeling of action continuity and semantic integrity in sign language sequences. To address these limitations, we propose the Motion-Temporal Calibration Network (MTCNet), a novel framework for continuous sign language recognition that integrates dynamic feature enhancement and temporal calibration. The framework consists of two key innovative modules. First, the Cross-Frame Motion Refinement (CFMR) module implements an inter-frame differential attention mechanism combined with residual learning strategies, enabling precise motion feature modeling and effective enhancement of dynamic information between adjacent frames. Second, the Temporal-Channel Adaptive Recalibration (TCAR) module utilizes adaptive convolution kernel design and a dual-branch feature extraction architecture, facilitating joint optimization in both temporal and channel dimensions. In experimental evaluations, our method demonstrates competitive performance on the widely-used PHOENIX-2014 and PHOENIX-2014-T datasets, achieving results comparable to leading unimodal approaches. Moreover, it achieves state-of-the-art performance on the Chinese Sign Language (CSL) dataset. Through comprehensive ablation studies and quantitative analysis, we validate the effectiveness of our proposed method in fine-grained dynamic feature modeling and long-term dependency capture while maintaining computational efficiency.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"134 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145704005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}