Pub Date : 2024-11-01DOI: 10.1016/j.engappai.2024.109512
Darshana Subhash , Jyothish Lal G. , Premjith B. , Vinayakumar Ravi
State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robustness to accents and generalization across dialects remains a persistent challenge, particularly in real-time settings. In response, this study introduces a novel approach leveraging Variational Mode Decomposition (VMD) to enhance accented speech signals, aiming to mitigate noise interference and improve generalization on unseen accented speech datasets. Our method employs decomposed modes of the VMD algorithm for signal reconstruction, followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCC). These features are subsequently classified using machine learning models such as 1D Convolutional Neural Network (1D-CNN), Support Vector Machine (SVM), Random Forest, and Decision Trees, as well as a deep learning model based on a 2D Convolutional Neural Network (2D-CNN). Experimental results demonstrate superior performance, with the SVM classifier achieving an accuracy of approximately 87.5% on a standard dataset and 99.3% on the AccentBase dataset. The 2D-CNN model further improves the results in multi-class accent classification tasks. This research contributes to advancing automatic speech recognition robustness and accent-inclusive speaker recognition, addressing critical challenges in real-world applications.
{"title":"A robust accent classification system based on variational mode decomposition","authors":"Darshana Subhash , Jyothish Lal G. , Premjith B. , Vinayakumar Ravi","doi":"10.1016/j.engappai.2024.109512","DOIUrl":"10.1016/j.engappai.2024.109512","url":null,"abstract":"<div><div>State-of-the-art automatic speech recognition models often struggle to capture nuanced features inherent in accented speech, leading to sub-optimal performance in speaker recognition based on regional accents. Despite substantial progress in the field of automatic speech recognition, ensuring robustness to accents and generalization across dialects remains a persistent challenge, particularly in real-time settings. In response, this study introduces a novel approach leveraging Variational Mode Decomposition (VMD) to enhance accented speech signals, aiming to mitigate noise interference and improve generalization on unseen accented speech datasets. Our method employs decomposed modes of the VMD algorithm for signal reconstruction, followed by feature extraction using Mel-Frequency Cepstral Coefficients (MFCC). These features are subsequently classified using machine learning models such as 1D Convolutional Neural Network (1D-CNN), Support Vector Machine (SVM), Random Forest, and Decision Trees, as well as a deep learning model based on a 2D Convolutional Neural Network (2D-CNN). Experimental results demonstrate superior performance, with the SVM classifier achieving an accuracy of approximately 87.5% on a standard dataset and 99.3% on the AccentBase dataset. The 2D-CNN model further improves the results in multi-class accent classification tasks. This research contributes to advancing automatic speech recognition robustness and accent-inclusive speaker recognition, addressing critical challenges in real-world applications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109512"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1016/j.engappai.2024.109565
Chao-Ming Yu, Yu-Hsien Lin
This study develops a visual-based docking system (VDS) for an autonomous underwater vehicle (AUV), significantly enhancing docking performance by integrating intelligent object recognition and deep reinforcement learning (DRL). The system overcomes traditional navigation limitations in complex and unpredictable environments by using a variable information dock (VID) for precise multi-sensor docking recognition in the AUV. Employing image-based visual servoing (IBVS) technology, the VDS efficiently converts 2D visual data into accurate 3D motion control commands. It integrates the YOLO (short for You Only Look Once) algorithm for object recognition and the deep deterministic policy gradient (DDPG) algorithm, improving continuous motion control, docking accuracy, and adaptability. Experimental validation at the National Cheng Kung University towing tank demonstrates that the VDS enhances control stability and operational reliability, reducing the mean absolute error (MAE) in depth control by 42.03% and pitch control by 98.02% compared to the previous method. These results confirm the VDS's reliability and its potential for transforming AUV docking.
本研究为自主潜水器(AUV)开发了基于视觉的对接系统(VDS),通过集成智能物体识别和深度强化学习(DRL),显著提高了对接性能。该系统利用可变信息停靠点(VID)对自动潜航器进行精确的多传感器停靠识别,从而克服了复杂和不可预测环境中的传统导航限制。VDS 采用基于图像的视觉伺服(IBVS)技术,可有效地将二维视觉数据转换为精确的三维运动控制指令。它集成了用于物体识别的 YOLO(You Only Look Once 的缩写)算法和深度确定性策略梯度(DDPG)算法,提高了连续运动控制、对接精度和适应性。在成功大学拖曳坦克上进行的实验验证表明,VDS 增强了控制稳定性和运行可靠性,与以前的方法相比,深度控制的平均绝对误差(MAE)减少了 42.03%,俯仰控制减少了 98.02%。这些结果证实了 VDS 的可靠性及其改变 AUV 停靠的潜力。
{"title":"The docking control system of an autonomous underwater vehicle combining intelligent object recognition and deep reinforcement learning","authors":"Chao-Ming Yu, Yu-Hsien Lin","doi":"10.1016/j.engappai.2024.109565","DOIUrl":"10.1016/j.engappai.2024.109565","url":null,"abstract":"<div><div>This study develops a visual-based docking system (VDS) for an autonomous underwater vehicle (AUV), significantly enhancing docking performance by integrating intelligent object recognition and deep reinforcement learning (DRL). The system overcomes traditional navigation limitations in complex and unpredictable environments by using a variable information dock (VID) for precise multi-sensor docking recognition in the AUV. Employing image-based visual servoing (IBVS) technology, the VDS efficiently converts 2D visual data into accurate 3D motion control commands. It integrates the YOLO (short for You Only Look Once) algorithm for object recognition and the deep deterministic policy gradient (DDPG) algorithm, improving continuous motion control, docking accuracy, and adaptability. Experimental validation at the National Cheng Kung University towing tank demonstrates that the VDS enhances control stability and operational reliability, reducing the mean absolute error (MAE) in depth control by 42.03% and pitch control by 98.02% compared to the previous method. These results confirm the VDS's reliability and its potential for transforming AUV docking.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109565"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-01DOI: 10.1016/j.engappai.2024.109476
Yuan Xu , Zhen-Zhen Zhao , Tong-Wei Lu , Wei Ke , Yi Luo , Yan-Lin He , Qun-Xiong Zhu , Yang Zhang , Ming-Qing Zhang
This paper presents an innovative latent temporal smoothness-induced Schatten- norm factorization (SpFLTS) method aimed at addressing challenges in sequential subspace clustering tasks. Globally, SpFLTS employs a low-rank subspace clustering framework based on Schatten-2/3 norm factorization to enhance the comprehensive capture of the original data features. Locally, a total variation smoothing term is induced to the temporal gradients of latent subspace matrices obtained from sub-orthogonal projections, thereby preserving smoothness in the sequential latent space. To efficiently solve the closed-form optimization problem, a fast Fourier transform is combined with the non-convex alternating direction method of multipliers to optimize latent subspace matrix, which greatly speeds up computation. Experimental results demonstrate that the proposed SpFLTS method surpasses existing techniques on multiple benchmark databases, highlighting its superior clustering performance and extensive application potential.
{"title":"Latent temporal smoothness-induced Schatten-p norm factorization for sequential subspace clustering","authors":"Yuan Xu , Zhen-Zhen Zhao , Tong-Wei Lu , Wei Ke , Yi Luo , Yan-Lin He , Qun-Xiong Zhu , Yang Zhang , Ming-Qing Zhang","doi":"10.1016/j.engappai.2024.109476","DOIUrl":"10.1016/j.engappai.2024.109476","url":null,"abstract":"<div><div>This paper presents an innovative latent temporal smoothness-induced Schatten-<span><math><mi>p</mi></math></span> norm factorization (SpFLTS) method aimed at addressing challenges in sequential subspace clustering tasks. Globally, SpFLTS employs a low-rank subspace clustering framework based on Schatten-2/3 norm factorization to enhance the comprehensive capture of the original data features. Locally, a total variation smoothing term is induced to the temporal gradients of latent subspace matrices obtained from sub-orthogonal projections, thereby preserving smoothness in the sequential latent space. To efficiently solve the closed-form optimization problem, a fast Fourier transform is combined with the non-convex alternating direction method of multipliers to optimize latent subspace matrix, which greatly speeds up computation. Experimental results demonstrate that the proposed SpFLTS method surpasses existing techniques on multiple benchmark databases, highlighting its superior clustering performance and extensive application potential.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109476"},"PeriodicalIF":7.5,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.engappai.2024.109264
Junhui Cao, Jing Hu, Rongguo Zhang
There are problems of relying on data initialization and ignoring data structure in existing incomplete multi-view clustering algorithms, an adaptive graph learning incomplete multi-view clustering image segmentation algorithm is proposed. Firstly, the similarity matrix of each non missing view is adaptive learned, and the index matrix of the missing view is used to complete the similarity matrix and unify the dimensions,which ensure the authenticity of the data and revealing the data structure. Secondly, the low dimension representation of the complete similarity matrix under spectral constraints is calculated, and a discrete clustering index matrix is directly obtained through adaptive weighted spectral rotation, avoiding post-processing. The clustering index matrix is used to obtain clustering of multi-view features, thereby obtaining image segmentation results. Finally, an iterative algorithm optimization model is presented, which is compared with six existing algorithms using seven evaluation metrics on six datasets. The results show significant improvements in clustering performance and segmentation performance.
{"title":"Adaptive graph learning algorithm for incomplete multi-view clustered image segmentation","authors":"Junhui Cao, Jing Hu, Rongguo Zhang","doi":"10.1016/j.engappai.2024.109264","DOIUrl":"10.1016/j.engappai.2024.109264","url":null,"abstract":"<div><div>There are problems of relying on data initialization and ignoring data structure in existing incomplete multi-view clustering algorithms, an adaptive graph learning incomplete multi-view clustering image segmentation algorithm is proposed. Firstly, the similarity matrix of each non missing view is adaptive learned, and the index matrix of the missing view is used to complete the similarity matrix and unify the dimensions,which ensure the authenticity of the data and revealing the data structure. Secondly, the low dimension representation of the complete similarity matrix under spectral constraints is calculated, and a discrete clustering index matrix is directly obtained through adaptive weighted spectral rotation, avoiding post-processing. The clustering index matrix is used to obtain clustering of multi-view features, thereby obtaining image segmentation results. Finally, an iterative algorithm optimization model is presented, which is compared with six existing algorithms using seven evaluation metrics on six datasets. The results show significant improvements in clustering performance and segmentation performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109264"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.engappai.2024.109549
Mahshad Jamdar , Kiarash M. Dolatshahi , Omid Yazdanpanah
This study presents a nonmodel-based machine learning framework for estimating engineering demand parameters (EDPs) of eccentrically braced frames with soil-structure interaction effects. The objective is to estimate residual and peak story drift ratio, peak floor acceleration, and develop fragility curves using traditional regression equations and advanced machine-learning techniques. Correction coefficients are developed to improve prediction accuracy by accounting for soil-structure interaction. A comprehensive database, including incremental dynamic analysis results of 4- and 8-story frames, is developed, consisting of 109,841 data points. The database includes fixed-base models and models with various soil-structure interaction values, subjected to 44 far-field ground motions. Four scenarios are introduced considering various input variables to compare the impact of soil-structure interaction. Findings reveal the effects of soil-structure interaction features on the performance of machine learning algorithms, increasing by up to 17.61% of the coefficient of determination. Utilizing the predicted story drift ratio, two types of fragility curves indicate more precise predictions, emphasizing the impact of soil-structure interaction effects at lower damage levels. A graphical user interface has been developed to predict fragility curves based on various inputs to promote the practical use of machine learning in engineering. Two new 4-story frames are used as case studies, subjected to unseen ground motions to assess the application of trained machine learning algorithms. Prediction errors in input-output scenarios considering soil-structure interaction range from 3% to 18% for new frames. The proposed approach for predicting EDPs is further acknowledged by evaluating a real instrumented five-story steel frame office building.
{"title":"Data-driven nonmodel seismic assessment of eccentrically braced frames with soil-structure interaction","authors":"Mahshad Jamdar , Kiarash M. Dolatshahi , Omid Yazdanpanah","doi":"10.1016/j.engappai.2024.109549","DOIUrl":"10.1016/j.engappai.2024.109549","url":null,"abstract":"<div><div>This study presents a nonmodel-based machine learning framework for estimating engineering demand parameters (EDPs) of eccentrically braced frames with soil-structure interaction effects. The objective is to estimate residual and peak story drift ratio, peak floor acceleration, and develop fragility curves using traditional regression equations and advanced machine-learning techniques. Correction coefficients are developed to improve prediction accuracy by accounting for soil-structure interaction. A comprehensive database, including incremental dynamic analysis results of 4- and 8-story frames, is developed, consisting of 109,841 data points. The database includes fixed-base models and models with various soil-structure interaction values, subjected to 44 far-field ground motions. Four scenarios are introduced considering various input variables to compare the impact of soil-structure interaction. Findings reveal the effects of soil-structure interaction features on the performance of machine learning algorithms, increasing by up to 17.61% of the coefficient of determination. Utilizing the predicted story drift ratio, two types of fragility curves indicate more precise predictions, emphasizing the impact of soil-structure interaction effects at lower damage levels. A graphical user interface has been developed to predict fragility curves based on various inputs to promote the practical use of machine learning in engineering. Two new 4-story frames are used as case studies, subjected to unseen ground motions to assess the application of trained machine learning algorithms. Prediction errors in input-output scenarios considering soil-structure interaction range from 3% to 18% for new frames. The proposed approach for predicting EDPs is further acknowledged by evaluating a real instrumented five-story steel frame office building.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109549"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.engappai.2024.109569
Mohammad Hassan Ranjbar , Ali Abdi , Ju Hong Park
One-shot action recognition, which refers to recognizing human-performed actions using only a single training example, holds significant promise in advancing video analysis, particularly in domains requiring rapid adaptation to new actions. However, existing algorithms for one-shot action recognition face multiple challenges, including high computational complexity, limited accuracy, and difficulties in generalization to unseen actions. To address these issues, we propose a novel kinematic-based skeleton representation that effectively reduces computational demands while enhancing recognition performance. This representation leverages skeleton locations, velocities, and accelerations to formulate the one-shot action recognition task as a metric learning problem, where a model projects kinematic data into an embedding space. In this space, actions are distinguished based on Euclidean distances, facilitating efficient nearest-neighbour searches among activity reference samples. Our approach not only reduces computational complexity but also achieves higher accuracy and better generalization compared to existing methods. Specifically, our model achieved a validation accuracy of 78.5%, outperforming state-of-the-art methods by 8.66% under comparable training conditions. These findings underscore the potential of our method for practical applications in real-time action recognition systems.
{"title":"Kinematic matrix: One-shot human action recognition using kinematic data structure","authors":"Mohammad Hassan Ranjbar , Ali Abdi , Ju Hong Park","doi":"10.1016/j.engappai.2024.109569","DOIUrl":"10.1016/j.engappai.2024.109569","url":null,"abstract":"<div><div>One-shot action recognition, which refers to recognizing human-performed actions using only a single training example, holds significant promise in advancing video analysis, particularly in domains requiring rapid adaptation to new actions. However, existing algorithms for one-shot action recognition face multiple challenges, including high computational complexity, limited accuracy, and difficulties in generalization to unseen actions. To address these issues, we propose a novel kinematic-based skeleton representation that effectively reduces computational demands while enhancing recognition performance. This representation leverages skeleton locations, velocities, and accelerations to formulate the one-shot action recognition task as a metric learning problem, where a model projects kinematic data into an embedding space. In this space, actions are distinguished based on Euclidean distances, facilitating efficient nearest-neighbour searches among activity reference samples. Our approach not only reduces computational complexity but also achieves higher accuracy and better generalization compared to existing methods. Specifically, our model achieved a validation accuracy of 78.5%, outperforming state-of-the-art methods by 8.66% under comparable training conditions. These findings underscore the potential of our method for practical applications in real-time action recognition systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109569"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.engappai.2024.109559
Jingbin Hao , Xiaokai Sun , Xinhua Liu , Dezheng Hua , Jianhua Hu
With the advancement of intelligent transportation systems, accurate identification of driver abnormal behavior is crucial for enhancing road safety. However, the limited computing power of vehicular systems poses a challenge for running efficient and explainable behavior recognition models. This paper proposes a lightweight and explainable driver abnormal behavior recognition model based on an improved You Only Look Once version 8 (YOLOv8). Firstly, a Spatial and Channel Reconstruction Convolution (SCConv) module is introduced to optimize the Convolution to Feature (C2f) structure, enhancing the model's feature extraction capabilities while reducing parameter redundancy. Secondly, a Spatial Pyramid Pooling with Fast Large Separable Kernel Attention (SPPF-LSKA) module is designed to better capture image context and integrate global information. Additionally, a Dynamic upsample (Dysample) module is introduced to improve the model's ability to capture subtle driver movements. Lastly, a Lightweight Shared Group Normalization Convolution Detection Head (LSGCDH) is designed to enhance the model's generalization ability, significantly reducing the model's computational load, parameter count, and size. Experimental results demonstrate that our approach has significant advantages for edge device deployment compared to mainstream algorithms. The visualization results effectively corroborate the role of each improved structure, enhancing the explainability of the abnormal behavior recognition model, which is beneficial for deployment in vehicular systems and contributes to improving road traffic safety.
随着智能交通系统的发展,准确识别驾驶员的异常行为对提高道路安全至关重要。然而,车辆系统的计算能力有限,这对运行高效且可解释的行为识别模型提出了挑战。本文基于改进的 You Only Look Once version 8(YOLOv8),提出了一种轻量级、可解释的驾驶员异常行为识别模型。首先,引入空间和通道重构卷积(SCConv)模块,优化卷积到特征(C2f)结构,增强模型的特征提取能力,同时减少参数冗余。其次,为了更好地捕捉图像上下文并整合全局信息,设计了空间金字塔池化与快速大型可分离内核关注(SPPF-LSKA)模块。此外,还引入了动态上采样(Dysample)模块,以提高模型捕捉驾驶员细微动作的能力。最后,我们还设计了一个轻量级共享组归一化卷积检测头(LSGCDH),以增强模型的泛化能力,从而显著降低模型的计算负荷、参数数量和大小。实验结果表明,与主流算法相比,我们的方法在边缘设备部署方面具有显著优势。可视化结果有效地证实了每个改进结构的作用,增强了异常行为识别模型的可解释性,有利于在车辆系统中的部署,有助于提高道路交通安全。
{"title":"A lightweight and explainable model for driver abnormal behavior recognition","authors":"Jingbin Hao , Xiaokai Sun , Xinhua Liu , Dezheng Hua , Jianhua Hu","doi":"10.1016/j.engappai.2024.109559","DOIUrl":"10.1016/j.engappai.2024.109559","url":null,"abstract":"<div><div>With the advancement of intelligent transportation systems, accurate identification of driver abnormal behavior is crucial for enhancing road safety. However, the limited computing power of vehicular systems poses a challenge for running efficient and explainable behavior recognition models. This paper proposes a lightweight and explainable driver abnormal behavior recognition model based on an improved You Only Look Once version 8 (YOLOv8). Firstly, a Spatial and Channel Reconstruction Convolution (SCConv) module is introduced to optimize the Convolution to Feature (C2f) structure, enhancing the model's feature extraction capabilities while reducing parameter redundancy. Secondly, a Spatial Pyramid Pooling with Fast Large Separable Kernel Attention (SPPF-LSKA) module is designed to better capture image context and integrate global information. Additionally, a Dynamic upsample (Dysample) module is introduced to improve the model's ability to capture subtle driver movements. Lastly, a Lightweight Shared Group Normalization Convolution Detection Head (LSGCDH) is designed to enhance the model's generalization ability, significantly reducing the model's computational load, parameter count, and size. Experimental results demonstrate that our approach has significant advantages for edge device deployment compared to mainstream algorithms. The visualization results effectively corroborate the role of each improved structure, enhancing the explainability of the abnormal behavior recognition model, which is beneficial for deployment in vehicular systems and contributes to improving road traffic safety.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109559"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.engappai.2024.109535
Qinghui Chen , Lunqian Wang , Zekai Zhang , Xinghua Wang , Weilin Liu , Bo Xia , Hao Ding , Jinglin Zhang , Sen Xu , Xin Wang
While Transformer-based approaches have recently achieved notable success in super-resolution, their extensive computational requirements impede widespread practical adoption. High-resolution meteorological satellite cloud imagery is essential for weather analysis and forecasting. Enhancing image resolution through super-resolution techniques facilitates the accurate identification and localization of geographic features by meteorological systems. However, current super-resolution methods fail to restore the intricacies of cloud formations and complex regions fully. This research introduces a novel dual-path aggregation Transformer network (DPAT) tailored to enhance the super-resolution of meteorological satellite cloud images. The DPAT network adeptly captures cloud imagery's subtle details and textures, effectively addressing occlusions and the variability inherent in satellite imagery. It bolsters the model's ability to manage the complex attributes of cloud images through the introduction of the Dual-path Aggregation Self-Attention (DASA) mechanism and the Multi-scale Feature Aggregation Block (MFAB), thereby enhancing performance in processing intricate cloud features. The DASA mechanism synthesizes features across spatial, depth, and channel dimensions via a dual-path approach, thoroughly exploiting feature correlations. The MFAB, designed to supplant the multilayer perceptron, incorporates shift convolution and a multi-scale interaction block to augment feature information, compensating for the deficiency in local information absorption due to fixed receptive fields. Experimental outcomes indicate that DPAT delivers superior super-resolution outcomes. With a parameter count of only 32% of the Enhanced Deep Residual Network (EDSR) or 77% of the Image Restoration using Shift Window Transformer (SwinIR), DPAT matches SwinIR's performance on the satellite cloud dataset. Moreover, DPAT balances accuracy and parameter economy across various datasets. This technology is expected to improve image super-resolution capabilities in multiple fields such as human action recognition and industrial recognition, and indirectly improve the accuracy of image perception tasks.
{"title":"Dual-path aggregation transformer network for super-resolution with images occlusions and variability","authors":"Qinghui Chen , Lunqian Wang , Zekai Zhang , Xinghua Wang , Weilin Liu , Bo Xia , Hao Ding , Jinglin Zhang , Sen Xu , Xin Wang","doi":"10.1016/j.engappai.2024.109535","DOIUrl":"10.1016/j.engappai.2024.109535","url":null,"abstract":"<div><div>While Transformer-based approaches have recently achieved notable success in super-resolution, their extensive computational requirements impede widespread practical adoption. High-resolution meteorological satellite cloud imagery is essential for weather analysis and forecasting. Enhancing image resolution through super-resolution techniques facilitates the accurate identification and localization of geographic features by meteorological systems. However, current super-resolution methods fail to restore the intricacies of cloud formations and complex regions fully. This research introduces a novel dual-path aggregation Transformer network (DPAT) tailored to enhance the super-resolution of meteorological satellite cloud images. The DPAT network adeptly captures cloud imagery's subtle details and textures, effectively addressing occlusions and the variability inherent in satellite imagery. It bolsters the model's ability to manage the complex attributes of cloud images through the introduction of the Dual-path Aggregation Self-Attention (DASA) mechanism and the Multi-scale Feature Aggregation Block (MFAB), thereby enhancing performance in processing intricate cloud features. The DASA mechanism synthesizes features across spatial, depth, and channel dimensions via a dual-path approach, thoroughly exploiting feature correlations. The MFAB, designed to supplant the multilayer perceptron, incorporates shift convolution and a multi-scale interaction block to augment feature information, compensating for the deficiency in local information absorption due to fixed receptive fields. Experimental outcomes indicate that DPAT delivers superior super-resolution outcomes. With a parameter count of only 32% of the Enhanced Deep Residual Network (EDSR) or 77% of the Image Restoration using Shift Window Transformer (SwinIR), DPAT matches SwinIR's performance on the satellite cloud dataset. Moreover, DPAT balances accuracy and parameter economy across various datasets. This technology is expected to improve image super-resolution capabilities in multiple fields such as human action recognition and industrial recognition, and indirectly improve the accuracy of image perception tasks.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109535"},"PeriodicalIF":7.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.engappai.2024.109523
Wenhe Shen , Xinjue Hu , Jialun Liu , Shijie Li , Hongdong Wang
The advancement of autonomous maritime surface ships has increased the need for accurate and rapid multi-step prediction of ship motion for decision-making, motion planning, and real-time control tasks. This paper proposes a multi-step prediction method based on Informer with a pre-trained strategy to achieve accurate and fast motion prediction for ships, which substitutes generative inference for rolling prediction to avoid the cumulative error caused by the increasing time horizon. Due to the difference in temporal features from long-term control actions and short-term state sequences, heterogeneous inputs of encoder and decoder are designed to respectively capture their information without information redundancy. To address the bottleneck between the high cost of real data acquisition and the high demand for deep learning methods for data, we propose a mechanism-data dual-driven framework. This framework utilizes a prior mechanism model to generate virtual data incorporating a range of excitation signals designed in accordance with the results of free-running model tests. To reduce the need for real data and increase interpretability, the improved Informer is pre-trained by virtual data from the mechanism model before being trained by real data. Our experiments for multi-step ship motion prediction demonstrate that the proposed method respectively reduces the error and time to 41.36% and 13.20% on average compared to state-of-the-art and classical methods.
{"title":"A pre-trained multi-step prediction informer for ship motion prediction with a mechanism-data dual-driven framework","authors":"Wenhe Shen , Xinjue Hu , Jialun Liu , Shijie Li , Hongdong Wang","doi":"10.1016/j.engappai.2024.109523","DOIUrl":"10.1016/j.engappai.2024.109523","url":null,"abstract":"<div><div>The advancement of autonomous maritime surface ships has increased the need for accurate and rapid multi-step prediction of ship motion for decision-making, motion planning, and real-time control tasks. This paper proposes a multi-step prediction method based on Informer with a pre-trained strategy to achieve accurate and fast motion prediction for ships, which substitutes generative inference for rolling prediction to avoid the cumulative error caused by the increasing time horizon. Due to the difference in temporal features from long-term control actions and short-term state sequences, heterogeneous inputs of encoder and decoder are designed to respectively capture their information without information redundancy. To address the bottleneck between the high cost of real data acquisition and the high demand for deep learning methods for data, we propose a mechanism-data dual-driven framework. This framework utilizes a prior mechanism model to generate virtual data incorporating a range of excitation signals designed in accordance with the results of free-running model tests. To reduce the need for real data and increase interpretability, the improved Informer is pre-trained by virtual data from the mechanism model before being trained by real data. Our experiments for multi-step ship motion prediction demonstrate that the proposed method respectively reduces the error and time to 41.36% and 13.20% on average compared to state-of-the-art and classical methods.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109523"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-30DOI: 10.1016/j.engappai.2024.109519
Wensheng Liu, Song Han, Na Rong
Data-driven methods have been extensively applied in the field of power system transient stability assessment (TSA) owing to their robust capabilities to excavate valuable features. However, TSA methods still face significant challenges in predictive accuracy and generalization ability under variable operation conditions with fluctuating loads or power generations. To address this, a data-driven ensemble TSA method which integrates convolutional block attention module (CBAM) with residual network (ResNet) is proposed to enhance the prediction accuracy. Meanwhile, the traditional cross entropy loss function is replaced by the focal loss function, aiming to reduce the misclassification of unstable samples. Moreover, a rapid updating strategy integrating active learning and fine turning techniques is suggested. It can renew the classifier quickly with limited labeled samples and less time when the network topology changes substantially and makes the pre-trained TSA model unavailable, thus ensuring optimal performance on the new topology. Finally, case studies conducted on the New England 10-machine 39-bus system and the Western Electricity Coordinating Council (WECC) 29-machine 179-bus system validate the effectiveness and robustness of the proposed TSA method. The accuracy of the proposed TSA method achieves 99.56% on 10-machine system and 99.47% on 29-machine system separately, demonstrating the superiority of the proposed TSA method.
{"title":"A novel ensemble method based on residual convolutional neural network with attention module for transient stability assessment considering operational variability","authors":"Wensheng Liu, Song Han, Na Rong","doi":"10.1016/j.engappai.2024.109519","DOIUrl":"10.1016/j.engappai.2024.109519","url":null,"abstract":"<div><div>Data-driven methods have been extensively applied in the field of power system transient stability assessment (TSA) owing to their robust capabilities to excavate valuable features. However, TSA methods still face significant challenges in predictive accuracy and generalization ability under variable operation conditions with fluctuating loads or power generations. To address this, a data-driven ensemble TSA method which integrates convolutional block attention module (CBAM) with residual network (ResNet) is proposed to enhance the prediction accuracy. Meanwhile, the traditional cross entropy loss function is replaced by the focal loss function, aiming to reduce the misclassification of unstable samples. Moreover, a rapid updating strategy integrating active learning and fine turning techniques is suggested. It can renew the classifier quickly with limited labeled samples and less time when the network topology changes substantially and makes the pre-trained TSA model unavailable, thus ensuring optimal performance on the new topology. Finally, case studies conducted on the New England 10-machine 39-bus system and the Western Electricity Coordinating Council (WECC) 29-machine 179-bus system validate the effectiveness and robustness of the proposed TSA method. The accuracy of the proposed TSA method achieves 99.56% on 10-machine system and 99.47% on 29-machine system separately, demonstrating the superiority of the proposed TSA method.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"139 ","pages":"Article 109519"},"PeriodicalIF":7.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}