It is a critical challenge to realize the control of dissolved oxygen (DO) in uncertain aeration process, due to the inherent nonlinearity, dynamic and unknown disturbances in wastewater treatment process (WWTP). To address this issue, the incremental multi-subreservoirs echo state network (IMSESN) controller is proposed. First, the echo state network (ESN) is employed as the approximator for the unknown system state, and the disturbance observer is constructed to handle the unmeasurable disturbances.Second, to further improve controller adaptability, the error-driven subreservoir increment mechanism is incorporated, in which the new subreservoirs are inserted into the network to enhance uncertainty approximation.Moreover, the minimum learning parameter (MLP) algorithm is introduced to update only the norm of output weights, significantly reducing computational complexity while maintaining control accuracy.Third, the Lyapunov stability theory is applied to demonstrate the semiglobal ultimate boundedness of the closed-loop signals. Under diverse weather conditions, the simulations on the benchmark simulation model no. 1 (BSM1) show that the proposed controller has outperformed existing methods in tracking accuracy and computational efficiency.
{"title":"Incremental multi-subreservoirs echo state network control for uncertain aeration process.","authors":"Cuili Yang, Qingrun Zhang, Jiahang Zhang, Jian Tang","doi":"10.1016/j.neunet.2025.108454","DOIUrl":"10.1016/j.neunet.2025.108454","url":null,"abstract":"<p><p>It is a critical challenge to realize the control of dissolved oxygen (DO) in uncertain aeration process, due to the inherent nonlinearity, dynamic and unknown disturbances in wastewater treatment process (WWTP). To address this issue, the incremental multi-subreservoirs echo state network (IMSESN) controller is proposed. First, the echo state network (ESN) is employed as the approximator for the unknown system state, and the disturbance observer is constructed to handle the unmeasurable disturbances.Second, to further improve controller adaptability, the error-driven subreservoir increment mechanism is incorporated, in which the new subreservoirs are inserted into the network to enhance uncertainty approximation.Moreover, the minimum learning parameter (MLP) algorithm is introduced to update only the norm of output weights, significantly reducing computational complexity while maintaining control accuracy.Third, the Lyapunov stability theory is applied to demonstrate the semiglobal ultimate boundedness of the closed-loop signals. Under diverse weather conditions, the simulations on the benchmark simulation model no. 1 (BSM1) show that the proposed controller has outperformed existing methods in tracking accuracy and computational efficiency.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"196 ","pages":"108454"},"PeriodicalIF":6.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-04-01Epub Date: 2025-12-08DOI: 10.1016/j.neunet.2025.108450
Di Yuan, Huayi Zhu, Rui Chen, Sida Zhou, Jianing Tang, Xiu Shu, Qiao Liu
The rapid development of deep learning provides an excellent solution for end-to-end multi-modal image fusion. However, existing methods mainly focus on the spatial domain and fail to fully utilize valuable information in the frequency domain. Moreover, even if spatial domain learning methods can optimize convergence to an ideal solution, there are still significant differences in high-frequency details between the fused image and the source images. Therefore, we propose a Cross-Modal Multi-Domain Learning (CMMDL) method for image fusion. Firstly, CMMDL employs the Restormer structure equipped with the proposed Spatial-Frequency domain Cascaded Attention (SFCA) mechanism to provide comprehensive and detailed pixel-level features for subsequent multi-domain learning. Then, we propose a dual-domain parallel learning strategy. The proposed Spatial Domain Learning Block (SDLB) focuses on extracting modality-specific features in the spatial domain through a dual-branch invertible neural network, while the proposed Frequency Domain Learning Block (FDLB) captures continuous and precise global contextual information using cross-modal deep perceptual Fourier transforms. Finally, the proposed Heterogeneous Domain Feature Fusion Block (HDFFB) promotes feature interaction and fusion between different domains through various pixel-level attention structures to obtain the final output image. Extensive experiments demonstrate that the proposed CMMDL achieves state-of-the-art performance on multiple datasets. The code is available at: https://github.com/Ist-Zhy/CMMDL.
{"title":"CMMDL: Cross-modal multi-domain learning method for image fusion.","authors":"Di Yuan, Huayi Zhu, Rui Chen, Sida Zhou, Jianing Tang, Xiu Shu, Qiao Liu","doi":"10.1016/j.neunet.2025.108450","DOIUrl":"10.1016/j.neunet.2025.108450","url":null,"abstract":"<p><p>The rapid development of deep learning provides an excellent solution for end-to-end multi-modal image fusion. However, existing methods mainly focus on the spatial domain and fail to fully utilize valuable information in the frequency domain. Moreover, even if spatial domain learning methods can optimize convergence to an ideal solution, there are still significant differences in high-frequency details between the fused image and the source images. Therefore, we propose a Cross-Modal Multi-Domain Learning (CMMDL) method for image fusion. Firstly, CMMDL employs the Restormer structure equipped with the proposed Spatial-Frequency domain Cascaded Attention (SFCA) mechanism to provide comprehensive and detailed pixel-level features for subsequent multi-domain learning. Then, we propose a dual-domain parallel learning strategy. The proposed Spatial Domain Learning Block (SDLB) focuses on extracting modality-specific features in the spatial domain through a dual-branch invertible neural network, while the proposed Frequency Domain Learning Block (FDLB) captures continuous and precise global contextual information using cross-modal deep perceptual Fourier transforms. Finally, the proposed Heterogeneous Domain Feature Fusion Block (HDFFB) promotes feature interaction and fusion between different domains through various pixel-level attention structures to obtain the final output image. Extensive experiments demonstrate that the proposed CMMDL achieves state-of-the-art performance on multiple datasets. The code is available at: https://github.com/Ist-Zhy/CMMDL.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"196 ","pages":"108450"},"PeriodicalIF":6.3,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The technique of structural re-parameterization has been widely adopted in Convolutional Neural Networks (CNNs) and Multi-Layer Perceptrons (MLPs) for image-related tasks. However, its integration with attention mechanisms in the video domain remains relatively unexplored. Moreover, video analysis tasks continue to face challenges due to high computational costs, particularly during inference. In this paper, we investigate the re-parameterization of widely-used 3D attention mechanism for video understanding by incorporating a spatiotemporal coherence prior. This approach allows the learning of more robust video features while introducing negligible computational overhead at inference time. Specifically, we propose a SpatioTemporally Augmented 3D Attention (STA-3DA) module as a building block for Transformer architectures. The STA-3DA integrates 3D, spatial, and temporal attention branches during training, serving as an effective replacement for standard 3D attention in existing Transformer models and leading to improved performance. During testing, the different branches are merged into a single 3D attention operation via learned fusion weights, resulting in minimal additional computational cost. Experimental results demonstrate that the proposed method achieves competitive video understanding performance on benchmark datasets such as Kinetics-400 and Something-Something V2.
{"title":"RepAttn3D: Re-parameterizing 3D attention with spatiotemporal augmentation for video understanding.","authors":"Xiusheng Lu, Lechao Cheng, Sicheng Zhao, Ying Zheng, Yongheng Wang, Guiguang Ding, Mingli Song","doi":"10.1016/j.neunet.2025.108313","DOIUrl":"10.1016/j.neunet.2025.108313","url":null,"abstract":"<p><p>The technique of structural re-parameterization has been widely adopted in Convolutional Neural Networks (CNNs) and Multi-Layer Perceptrons (MLPs) for image-related tasks. However, its integration with attention mechanisms in the video domain remains relatively unexplored. Moreover, video analysis tasks continue to face challenges due to high computational costs, particularly during inference. In this paper, we investigate the re-parameterization of widely-used 3D attention mechanism for video understanding by incorporating a spatiotemporal coherence prior. This approach allows the learning of more robust video features while introducing negligible computational overhead at inference time. Specifically, we propose a SpatioTemporally Augmented 3D Attention (STA-3DA) module as a building block for Transformer architectures. The STA-3DA integrates 3D, spatial, and temporal attention branches during training, serving as an effective replacement for standard 3D attention in existing Transformer models and leading to improved performance. During testing, the different branches are merged into a single 3D attention operation via learned fusion weights, resulting in minimal additional computational cost. Experimental results demonstrate that the proposed method achieves competitive video understanding performance on benchmark datasets such as Kinetics-400 and Something-Something V2.</p>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"195 ","pages":"108313"},"PeriodicalIF":6.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145589948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1177/2167647X261423109
Xianfeng Gong, Mingyang Mao
This study intends to identify the critical factors that shape college students' adoption of AI-generated news, with a specific focus on integrating Big Data methodologies into the Technology Acceptance Model (TAM) framework. Building on TAM, the research incorporates "trust" as a core variable to develop a dual-path theoretical model that combines technological cognition (e.g., perceived usefulness, perceived ease of use) and psychological emotions. Unlike traditional TAM-based studies relying solely on questionnaire data, this research enriches its data sources by leveraging Big Data techniques-including the collection and analysis of college students' real-time behavioral data (e.g., AI news reading duration, sharing frequency, source verification clicks) and unstructured text data (e.g., sentiment orientation in comment sections)-to complement the survey data from 300 college students. Through a questionnaire survey of 300 college students and data analysis using the structural equation model, the study found that trust has the strongest direct positive impact on the willingness to use (β = 0.49, p < 0.001), and its influence is significantly greater than perceived usefulness (β = 0.35, p < 0.001). Meanwhile, although perceived ease of use does not directly affect the willingness to use, it has significant indirect effects by enhancing trust and perceived usefulness. The results show that in the AI news context with high-risk perception, trust is a more crucial psychological mechanism than traditional technological cognitive factors. These findings have expanded the explanatory boundaries of the TAM model in new technology fields and provided empirical evidence and practical inspiration for AI developers to optimize system credibility and for educators to conduct algorithmic literacy training.
{"title":"Perceived Usefulness, Trust, and Behavioral Intention: A Study on College Student User Adoption Behaviors of Artificial Intelligence Generated News Based on Technology Acceptance Model.","authors":"Xianfeng Gong, Mingyang Mao","doi":"10.1177/2167647X261423109","DOIUrl":"https://doi.org/10.1177/2167647X261423109","url":null,"abstract":"<p><p>This study intends to identify the critical factors that shape college students' adoption of AI-generated news, with a specific focus on integrating Big Data methodologies into the Technology Acceptance Model (TAM) framework. Building on TAM, the research incorporates \"trust\" as a core variable to develop a dual-path theoretical model that combines technological cognition (e.g., perceived usefulness, perceived ease of use) and psychological emotions. Unlike traditional TAM-based studies relying solely on questionnaire data, this research enriches its data sources by leveraging Big Data techniques-including the collection and analysis of college students' real-time behavioral data (e.g., AI news reading duration, sharing frequency, source verification clicks) and unstructured text data (e.g., sentiment orientation in comment sections)-to complement the survey data from 300 college students. Through a questionnaire survey of 300 college students and data analysis using the structural equation model, the study found that trust has the strongest direct positive impact on the willingness to use (β = 0.49, <i>p</i> < 0.001), and its influence is significantly greater than perceived usefulness (β = 0.35, <i>p</i> < 0.001). Meanwhile, although perceived ease of use does not directly affect the willingness to use, it has significant indirect effects by enhancing trust and perceived usefulness. The results show that in the AI news context with high-risk perception, trust is a more crucial psychological mechanism than traditional technological cognitive factors. These findings have expanded the explanatory boundaries of the TAM model in new technology fields and provided empirical evidence and practical inspiration for AI developers to optimize system credibility and for educators to conduct algorithmic literacy training.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X261423109"},"PeriodicalIF":2.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146144007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/TIE.2026.3654285
{"title":"IEEE Transactions on Industrial Electronics Information for Authors","authors":"","doi":"10.1109/TIE.2026.3654285","DOIUrl":"https://doi.org/10.1109/TIE.2026.3654285","url":null,"abstract":"","PeriodicalId":13402,"journal":{"name":"IEEE Transactions on Industrial Electronics","volume":"73 2","pages":"C4-C4"},"PeriodicalIF":7.2,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11383834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146139103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1109/MAP.2025.3638524
Vikass Monebhurrun
Provides society information that may include news, reviews or technical notes that should be of interest to practitioners and researchers.
{"title":"Ninth IEEE RADIO International Conference, 27–30 October 2025, Mauritius [AP-S Committees & Activities]","authors":"Vikass Monebhurrun","doi":"10.1109/MAP.2025.3638524","DOIUrl":"https://doi.org/10.1109/MAP.2025.3638524","url":null,"abstract":"Provides society information that may include news, reviews or technical notes that should be of interest to practitioners and researchers.","PeriodicalId":13090,"journal":{"name":"IEEE Antennas and Propagation Magazine","volume":"68 1","pages":"114-115"},"PeriodicalIF":5.7,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11385831","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146139116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-09DOI: 10.1177/2167647X251411174
Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung
Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.
{"title":"Advancing Dysarthric Speech-to-Text Recognition with LATTE: A Low-Latency Acoustic Modeling Approach for Real-Time Communication.","authors":"Qurat Ul Ain, Hammad Afzal, Fazli Subhan, Mazliham Mohd Suud, Younhyun Jung","doi":"10.1177/2167647X251411174","DOIUrl":"https://doi.org/10.1177/2167647X251411174","url":null,"abstract":"<p><p>Dysarthria, a motor speech disorder characterized by slurred and often unintelligible speech, presents substantial challenges for effective communication. Conventional automatic speech recognition systems frequently underperform on dysarthric speech, particularly in severe cases. To address this gap, we introduce low-latency acoustic transcription and textual encoding (LATTE), an advanced framework designed for real-time dysarthric speech recognition. LATTE integrates preprocessing, acoustic processing, and transcription mapping into a unified pipeline, with its core powered by a hybrid architecture that combines convolutional layers for acoustic feature extraction with bidirectional temporal layers for modeling temporal dependencies. Evaluated on the UA-Speech dataset, LATTE achieves a word error rate of 12.5%, phoneme error rate of 8.3%, and a character error rate of 1%. By enabling accurate, low-latency transcription of impaired speech, LATTE provides a robust foundation for enhancing communication and accessibility in both digital applications and real-time interactive environments.</p>","PeriodicalId":51314,"journal":{"name":"Big Data","volume":" ","pages":"2167647X251411174"},"PeriodicalIF":2.6,"publicationDate":"2026-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146143844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}