Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114109
Kuangchi Sun , Aijun Yin , Yihua Hu
Spatiotemporal graph has become a research hotspot for it can excavate spatiotemporal information in multi-sensor fault diagnosis. However, the existing methods do not fully consider the physical attenuation characteristics in edge when the fault features are transmitted to the next sensor in the case of cross-sensor spatial temporal correlation. Besides, existing spatiotemporal convolutional networks pay much attention to the integration of all nodes for information update and the network structure design without realize the aggregation of edge information with different attributes. To address these issues, we propose Separable Physical Spatiotemporal Graph Message Aggregation (SPSGMA) for Fault Diagnosis. Firstly, a spatiotemporal graph of physical connection properties across sensors is proposed to assign different properties to different edges. Then, a novel wavelet frequency selection method is proposed for node feature extraction of different physical edge. Finally, a separable message aggregation network is designed to realize aggregation of frequency messages on different physical edges and classification rather than unified feature extraction. Three different datasets are used to verify the effectiveness of SPSGMA. Compared with other methods, SPSGMA achieves the best diagnostic performance in long chain sensor data diagnosis, and its average diagnosis accuracy in different diagnosis respectively are 99.99%, 98.59%, and 99.93%.
{"title":"Separable physical spatiotemporal graph message aggregation for fault diagnosis","authors":"Kuangchi Sun , Aijun Yin , Yihua Hu","doi":"10.1016/j.engappai.2026.114109","DOIUrl":"10.1016/j.engappai.2026.114109","url":null,"abstract":"<div><div>Spatiotemporal graph has become a research hotspot for it can excavate spatiotemporal information in multi-sensor fault diagnosis. However, the existing methods do not fully consider the physical attenuation characteristics in edge when the fault features are transmitted to the next sensor in the case of cross-sensor spatial temporal correlation. Besides, existing spatiotemporal convolutional networks pay much attention to the integration of all nodes for information update and the network structure design without realize the aggregation of edge information with different attributes. To address these issues, we propose Separable Physical Spatiotemporal Graph Message Aggregation (SPSGMA) for Fault Diagnosis. Firstly, a spatiotemporal graph of physical connection properties across sensors is proposed to assign different properties to different edges. Then, a novel wavelet frequency selection method is proposed for node feature extraction of different physical edge. Finally, a separable message aggregation network is designed to realize aggregation of frequency messages on different physical edges and classification rather than unified feature extraction. Three different datasets are used to verify the effectiveness of SPSGMA. Compared with other methods, SPSGMA achieves the best diagnostic performance in long chain sensor data diagnosis, and its average diagnosis accuracy in different diagnosis respectively are 99.99%, 98.59%, and 99.93%.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114109"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114128
Shahid Mohammad Ganie , Rama Chaithanya Tanguturi , Manahil Mohammed Alfuraydan
Brain tumors represent a significant health issue and are a leading cause of cancer-related fatalities globally. Early detection and accurate classification approaches are essential for addressing this critical health issue. This study proposes a novel hybrid deep multiscale integration network (DMI-Net) model for brain tumor diagnosis using magnetic resonance imaging (MRI) dataset. Image preprocessing included resizing, contrast enhancement using contrast-limited adaptive histogram equalization (CLAHE), normalization, and Gaussian filtering to enhance image quality. A lightweight parallel depthwise separable convolutional neural network (PD-CNN) is designed to extract multiscale relevant features with minimum computational resources. Principal component analysis (PCA), linear discriminant analysis (LDA), uniform manifold approximation and projection (UMAP), and t-distributed stochastic neighbor embedding (t-SNE) were used to visualize and validate the class-separable structure of the feature space in interpretability assessment. The hybrid framework was developed by stacking and concatenating three top-performing transfer learning (TL) models and integrating them with the PD-CNN architecture. Evaluation was conducted using standard performance metrics. For interpretability in clinical decision-support, model outputs were analyzed using shapley additive explanations (SHAP) and gradient-weighted class activation mapping (Grad-CAM) and its variants. The DMI-Net model demonstrated superior results compared with eight TL models, achieving an accuracy of 99.24%, precision of 99.00%, recall of 98.42%, F1-score of 98.54%, and area under the receiver operating characteristic curve of 98.85%. It outperformed existing state-of-the-art studies in the literature. The results indicate the potential utility of the proposed model for increasing confidence in diagnosing brain tumors, supporting clinical decision-making.
{"title":"Explainable artificial intelligence-Infused hybrid transfer learning framework with multiscale feature fusion for brain tumor detection and classification","authors":"Shahid Mohammad Ganie , Rama Chaithanya Tanguturi , Manahil Mohammed Alfuraydan","doi":"10.1016/j.engappai.2026.114128","DOIUrl":"10.1016/j.engappai.2026.114128","url":null,"abstract":"<div><div>Brain tumors represent a significant health issue and are a leading cause of cancer-related fatalities globally. Early detection and accurate classification approaches are essential for addressing this critical health issue. This study proposes a novel hybrid deep multiscale integration network (DMI-Net) model for brain tumor diagnosis using magnetic resonance imaging (MRI) dataset. Image preprocessing included resizing, contrast enhancement using contrast-limited adaptive histogram equalization (CLAHE), normalization, and Gaussian filtering to enhance image quality. A lightweight parallel depthwise separable convolutional neural network (PD-CNN) is designed to extract multiscale relevant features with minimum computational resources. Principal component analysis (PCA), linear discriminant analysis (LDA), uniform manifold approximation and projection (UMAP), and t-distributed stochastic neighbor embedding (t-SNE) were used to visualize and validate the class-separable structure of the feature space in interpretability assessment. The hybrid framework was developed by stacking and concatenating three top-performing transfer learning (TL) models and integrating them with the PD-CNN architecture. Evaluation was conducted using standard performance metrics. For interpretability in clinical decision-support, model outputs were analyzed using shapley additive explanations (SHAP) and gradient-weighted class activation mapping (Grad-CAM) and its variants. The DMI-Net model demonstrated superior results compared with eight TL models, achieving an accuracy of 99.24%, precision of 99.00%, recall of 98.42%, F1-score of 98.54%, and area under the receiver operating characteristic curve of 98.85%. It outperformed existing state-of-the-art studies in the literature. The results indicate the potential utility of the proposed model for increasing confidence in diagnosing brain tumors, supporting clinical decision-making.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114128"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146193288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114142
Chuanting Liu , Genshen Fang , Zuopeng Wen , Ke Li , Yaojun Ge
Flutter derivatives are crucial parameters for aerodynamic performance analysis of long-span bridges, which are typically identified through time-consuming and costly methods such as wind tunnel tests or computational fluid dynamics (CFD). This study proposes a deep learning approach for the rapid identification of the flutter derivatives of closed-box girders, utilizing feature-fusion residual network architecture (FF-ResNet). We construct a dataset comprising flutter derivatives of 113 cross-sections at eight reduced wind speeds, and the flutter derivatives are identified via multi-frequency forced vibration CFD simulations. Then, the reduced wind speed and a pre-processed image of the cross-section are used as inputs, and the model is trained to learn multi-modal features. Bayesian optimization is employed to enhance predictive accuracy for flutter derivatives, with the model achieving r-squared () values exceeding 0.97 on the training set and 0.92 on the validation set; in 10-fold cross-validation, the average of the validation set across ten folds also exceeds 0.92, demonstrating high accuracy. Next, the model is used to analyze the variation of flutter derivatives across the aerodynamic shape range, and the SHapley Additive exPlanations (SHAP) algorithm is applied to investigate the importance of the geometric parameters. The predicted flutter derivatives are then employed to compute the critical wind speed distribution over the range of considered cross-section variations.
{"title":"Prediction of flutter derivatives for closed-box bridge girder: A feature-fusion residual neural network algorithm","authors":"Chuanting Liu , Genshen Fang , Zuopeng Wen , Ke Li , Yaojun Ge","doi":"10.1016/j.engappai.2026.114142","DOIUrl":"10.1016/j.engappai.2026.114142","url":null,"abstract":"<div><div>Flutter derivatives are crucial parameters for aerodynamic performance analysis of long-span bridges, which are typically identified through time-consuming and costly methods such as wind tunnel tests or computational fluid dynamics (CFD). This study proposes a deep learning approach for the rapid identification of the flutter derivatives of closed-box girders, utilizing feature-fusion residual network architecture (FF-ResNet). We construct a dataset comprising flutter derivatives of 113 cross-sections at eight reduced wind speeds, and the flutter derivatives are identified via multi-frequency forced vibration CFD simulations. Then, the reduced wind speed and a pre-processed image of the cross-section are used as inputs, and the model is trained to learn multi-modal features. Bayesian optimization is employed to enhance predictive accuracy for flutter derivatives, with the model achieving r-squared (<span><math><mrow><msup><mi>R</mi><mn>2</mn></msup></mrow></math></span>) values exceeding 0.97 on the training set and 0.92 on the validation set; in 10-fold cross-validation, the average <span><math><mrow><msup><mi>R</mi><mn>2</mn></msup></mrow></math></span> of the validation set across ten folds also exceeds 0.92, demonstrating high accuracy. Next, the model is used to analyze the variation of flutter derivatives across the aerodynamic shape range, and the SHapley Additive exPlanations (SHAP) algorithm is applied to investigate the importance of the geometric parameters. The predicted flutter derivatives are then employed to compute the critical wind speed distribution over the range of considered cross-section variations.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114142"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146193285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114139
Feihong Tan , Ping Lu , Fulin Zhang , Xin Ye , Bo Hu , Xing Shu
Developing robust decision-making and control systems for autonomous driving in complex, dynamic environments involving multi-vehicle interactions at intersections, roundabouts, and merging ramps remains a significant hurdle. In this context, Reinforcement Learning (RL) emerges as a highly promising approach. The primary methods for applying RL, however, present a core dilemma. On one hand, offline RL cannot adapt well to real-world conditions because it learns from a fixed dataset. On the other hand, online RL requires learning through real-world interaction, which is inherently unsafe for driving. To address these issues, this paper proposes a Transformer-based Offline-to-online Reinforcement Learning (TORL) framework. Firstly, the framework's offline learning paradigm integrates a Transformer architecture with a maximum entropy mechanism. This synergistic approach allows the model to capture long-term temporal dependencies for high-performance decision-making and control while ensuring the initial policy is robust and generalizable. Building on this foundation, the framework employs a trifecta of synergistic mechanisms during online fine-tuning, including Human-in-the-Loop (HITL) safe exploration, a hybrid replay buffer, and a mixed data-source learning approach, to simultaneously mitigate performance degradation from distributional shifts and neutralize the critical safety risks of online exploration. Comprehensive experiments conducted in the MetaDrive simulation environment demonstrate that TORL surpasses baseline methods, achieving an absolute increase of approximately 29.4% in normalized return and 46.1% in task success rate, while maintaining a zero-collision record. Furthermore, the framework's real-time feasibility was validated on an experimental autonomous vehicle platform, demonstrating low computational latency suitable for practical deployment. This study demonstrates that the proposed offline-to-online RL paradigm offers a robust and effective solution for developing high-performance decision-making and control systems for autonomous vehicles.
{"title":"Transformer-based offline-to-online reinforcement learning for decision-making and control in autonomous driving","authors":"Feihong Tan , Ping Lu , Fulin Zhang , Xin Ye , Bo Hu , Xing Shu","doi":"10.1016/j.engappai.2026.114139","DOIUrl":"10.1016/j.engappai.2026.114139","url":null,"abstract":"<div><div>Developing robust decision-making and control systems for autonomous driving in complex, dynamic environments involving multi-vehicle interactions at intersections, roundabouts, and merging ramps remains a significant hurdle. In this context, Reinforcement Learning (RL) emerges as a highly promising approach. The primary methods for applying RL, however, present a core dilemma. On one hand, offline RL cannot adapt well to real-world conditions because it learns from a fixed dataset. On the other hand, online RL requires learning through real-world interaction, which is inherently unsafe for driving. To address these issues, this paper proposes a Transformer-based Offline-to-online Reinforcement Learning (TORL) framework. Firstly, the framework's offline learning paradigm integrates a Transformer architecture with a maximum entropy mechanism. This synergistic approach allows the model to capture long-term temporal dependencies for high-performance decision-making and control while ensuring the initial policy is robust and generalizable. Building on this foundation, the framework employs a trifecta of synergistic mechanisms during online fine-tuning, including Human-in-the-Loop (HITL) safe exploration, a hybrid replay buffer, and a mixed data-source learning approach, to simultaneously mitigate performance degradation from distributional shifts and neutralize the critical safety risks of online exploration. Comprehensive experiments conducted in the MetaDrive simulation environment demonstrate that TORL surpasses baseline methods, achieving an absolute increase of approximately 29.4% in normalized return and 46.1% in task success rate, while maintaining a zero-collision record. Furthermore, the framework's real-time feasibility was validated on an experimental autonomous vehicle platform, demonstrating low computational latency suitable for practical deployment. This study demonstrates that the proposed offline-to-online RL paradigm offers a robust and effective solution for developing high-performance decision-making and control systems for autonomous vehicles.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114139"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114163
Yunjin Wang , Leyi Zheng , Gong Chen , Jianlong Zhang , Hao Bai , Hanxuan Song , Tingxue Jiang , Fujian Zhou
Accurate and robust prediction of fracturing performance is essential for optimizing fracturing strategies. Here, a fracturing learning curve is proposed based on the fracturing characteristics in Gimsar shale oil, and is used as a theoretical guide to build a theory-guided data-driven (TgDD) model to predict the fracturing performance. The fracturing learning curve is further decomposed into dimensionless trends and local fluctuations. Convolutional neural network (CNN) and gated recurrent unit (GRU) were combined to build a CNN-GRU to predict the dimensionless trend. Using adaptive boosting (AdaBoost) integrated random forest (RF) to build an AdaBoost-RF to predict the local fluctuations. The results show that dimensionless trend has time series characteristics. CNN-GRU can extract and select the features, and its prediction ability is 28.1 % and 12.9 % higher than that of CNN and GRU. AdaBoost-RF can dynamically adjust the weights, and its prediction ability is about 37% higher than that of the RF. TgDD is more sensitive to engineering parameters. Relative to the direct prediction, the prediction accuracy of the TgDD is improved by 47.6 %. There are two main reasons for the higher prediction accuracy of TgDD. One is that the dimensionless trend belongs to the time series data, for which the established CNN-GRU model has an extremely strong prediction ability. The second is that the fluctuation amplitude of local fluctuations is reduced, which improves the data quality. The engineering parameters of the newly fractured wells were optimized using TgDD, and its estimated ultimate recovery was improved from 0.4847 to 0.4917.
{"title":"Theory-guided data-driven based on the learning curve for fracturing performance prediction","authors":"Yunjin Wang , Leyi Zheng , Gong Chen , Jianlong Zhang , Hao Bai , Hanxuan Song , Tingxue Jiang , Fujian Zhou","doi":"10.1016/j.engappai.2026.114163","DOIUrl":"10.1016/j.engappai.2026.114163","url":null,"abstract":"<div><div>Accurate and robust prediction of fracturing performance is essential for optimizing fracturing strategies. Here, a fracturing learning curve is proposed based on the fracturing characteristics in Gimsar shale oil, and is used as a theoretical guide to build a theory-guided data-driven (TgDD) model to predict the fracturing performance. The fracturing learning curve is further decomposed into dimensionless trends and local fluctuations. Convolutional neural network (CNN) and gated recurrent unit (GRU) were combined to build a CNN-GRU to predict the dimensionless trend. Using adaptive boosting (AdaBoost) integrated random forest (RF) to build an AdaBoost-RF to predict the local fluctuations. The results show that dimensionless trend has time series characteristics. CNN-GRU can extract and select the features, and its prediction ability is 28.1 % and 12.9 % higher than that of CNN and GRU. AdaBoost-RF can dynamically adjust the weights, and its prediction ability is about 37% higher than that of the RF. TgDD is more sensitive to engineering parameters. Relative to the direct prediction, the prediction accuracy of the TgDD is improved by 47.6 %. There are two main reasons for the higher prediction accuracy of TgDD. One is that the dimensionless trend belongs to the time series data, for which the established CNN-GRU model has an extremely strong prediction ability. The second is that the fluctuation amplitude of local fluctuations is reduced, which improves the data quality. The engineering parameters of the newly fractured wells were optimized using TgDD, and its estimated ultimate recovery was improved from 0.4847 to 0.4917.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114163"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146193284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-13DOI: 10.1016/j.engappai.2026.114127
Huihui Li , Huiqi Han , Chunlin Xu , Tongbao Chen , Xiaoyong Liu , Guihua Wen
Multimodal Emotion Recognition (MER), as a key component of affective computing, significantly improves the accuracy and robustness of emotion recognition by integrating multiple modalities such as text, audio, and visual information. However, most existing studies are based on the assumption of data integrity, while missing modality data is inevitable in practical applications, which poses new challenges to MER. This paper, for the first time, conducts a comprehensive and systematic review of MER methods from complete modality to missing modality, covering the analysis of common datasets, feature extraction techniques, information fusion mechanisms, and the latest methods. In particular, we elaborate on the construction methods of missing modality data and conduct a comprehensive comparison of MER methods under both complete and missing modalities. Furthermore, we summarize the common evaluation metrics in the field of MER, deeply discuss the core challenges currently faced, and prospect the future research directions. This review aims to provide researchers with a comprehensive understanding of the state of MER technology, thereby offering directional suggestions for subsequent research.
{"title":"Multimodal emotion recognition from complete modality to missing modality based on text, audio, and visual: A review","authors":"Huihui Li , Huiqi Han , Chunlin Xu , Tongbao Chen , Xiaoyong Liu , Guihua Wen","doi":"10.1016/j.engappai.2026.114127","DOIUrl":"10.1016/j.engappai.2026.114127","url":null,"abstract":"<div><div>Multimodal Emotion Recognition (MER), as a key component of affective computing, significantly improves the accuracy and robustness of emotion recognition by integrating multiple modalities such as text, audio, and visual information. However, most existing studies are based on the assumption of data integrity, while missing modality data is inevitable in practical applications, which poses new challenges to MER. This paper, for the first time, conducts a comprehensive and systematic review of MER methods from complete modality to missing modality, covering the analysis of common datasets, feature extraction techniques, information fusion mechanisms, and the latest methods. In particular, we elaborate on the construction methods of missing modality data and conduct a comprehensive comparison of MER methods under both complete and missing modalities. Furthermore, we summarize the common evaluation metrics in the field of MER, deeply discuss the core challenges currently faced, and prospect the future research directions. This review aims to provide researchers with a comprehensive understanding of the state of MER technology, thereby offering directional suggestions for subsequent research.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"170 ","pages":"Article 114127"},"PeriodicalIF":8.0,"publicationDate":"2026-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.114055
Ting Ke, Mingzhu Meng, Feifei Yin
To address the low efficiency issues of support vector machine (SVM)-based multi-classification methods, the hyper-sphere support vector machine has been widely adopted. However, it still suffers from challenges such as feature correlation and inconsistent feature scales. To overcome these limitations, this paper proposes a maximal margin hyper-ellipsoid support vector machine (M3HE-SVM) approach. Unlike conventional methods that use Euclidean distance, this approach employs Mahalanobis distance for optimal margin measurement, aimed at not only decorrelating features, eliminating dimensional discrepancies, and achieving implicit feature selection, but also further capturing the geometric information of data and the probability distribution of the population. Extensive experiments are conducted on three categories of datasets: (1) a variety of representative synthetic datasets covering scenarios with linear separability, nonlinear distributions, class imbalance, non-spherical structures, and high-dimensional multi-class data; (2) multiple real-world datasets from the University of California, Irvine (UCI) Machine Learning Repository; and (3) large-scale real-world datasets and NDC datasets. Experimental results demonstrate that M3HE-SVM consistently outperforms the maximal margin hypersphere support vector machine (M3HS-SVM) and other traditional methods in both classification accuracy and testing efficiency, exhibiting strong robustness and generalization ability.
{"title":"Maximal margin hyper-ellipsoid support vector machine for multi-class classification","authors":"Ting Ke, Mingzhu Meng, Feifei Yin","doi":"10.1016/j.engappai.2026.114055","DOIUrl":"10.1016/j.engappai.2026.114055","url":null,"abstract":"<div><div>To address the low efficiency issues of support vector machine (SVM)-based multi-classification methods, the hyper-sphere support vector machine has been widely adopted. However, it still suffers from challenges such as feature correlation and inconsistent feature scales. To overcome these limitations, this paper proposes a maximal margin hyper-ellipsoid support vector machine (M<sup>3</sup>HE-SVM) approach. Unlike conventional methods that use Euclidean distance, this approach employs Mahalanobis distance for optimal margin measurement, aimed at not only decorrelating features, eliminating dimensional discrepancies, and achieving implicit feature selection, but also further capturing the geometric information of data and the probability distribution of the population. Extensive experiments are conducted on three categories of datasets: (1) a variety of representative synthetic datasets covering scenarios with linear separability, nonlinear distributions, class imbalance, non-spherical structures, and high-dimensional multi-class data; (2) multiple real-world datasets from the University of California, Irvine (UCI) Machine Learning Repository; and (3) large-scale real-world datasets and NDC datasets. Experimental results demonstrate that M<sup>3</sup>HE-SVM consistently outperforms the maximal margin hypersphere support vector machine (M<sup>3</sup>HS-SVM) and other traditional methods in both classification accuracy and testing efficiency, exhibiting strong robustness and generalization ability.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114055"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Global annual shipments of lithium-ion batteries reached 1545.1 GW-hours (GWh) in 2024, representing a substantial increase. Notably, the energy-storage segment alone experienced a year-on-year growth of 64.9 %. Prior to dispatch, lithium-ion batteries must undergo self-discharge testing to ensure safety and reliability. In practice, identifying the approximately 2% of batteries exhibiting excessive self-discharge requires a prolonged resting period (10-30 days) to track self-discharge voltage drop (SDV-drop), which accounts for nearly two-thirds of the overall production cycle and severely limits manufacturing efficiency. Rapid and accurate prediction of self-discharge behavior has thus become a pressing engineering challenge. This study presents an artificial intelligence enabled framework that predicts a 28-day voltage drop using formation-stage data, thereby obviating the prolonged rest period. The approach integrates latent feature extraction from charge-discharge curves, unsupervised clustering, and transfer learning. Specifically, both comprehensive temporal and static features are automatically extracted from current, voltage, and capacity trajectories, along with scalar performance indicators. A hybrid K-means-t-distributed stochastic neighbor embedding (t-SNE) algorithm partitions the dataset into internally homogeneous clusters, enhancing intra-cluster consistency and inter-cluster separability. During transfer learning, maximum mean discrepancy aligns feature distributions between source and target domains, while a feature-label consistency constraint further mitigates domain shift and improves generalization. Comparative experiments demonstrate that the proposed model markedly outperforms state-of-the-art baselines in predicting SDV-drop. This framework thus provides a theoretical foundation and practical pathway for rapid self-discharge assessment, which enables significant reductions in production cycle time and improves manufacturing efficiency.
{"title":"Self-discharge estimation for lithium-ion batteries based on formation data in production","authors":"Haoyuan Zheng , Shaobin Yang , Weihua Xue , Shouzhen Xiao , Ding Shen , Wei Dong , Xu Zhang","doi":"10.1016/j.engappai.2026.114180","DOIUrl":"10.1016/j.engappai.2026.114180","url":null,"abstract":"<div><div>Global annual shipments of lithium-ion batteries reached 1545.1 GW-hours (GWh) in 2024, representing a substantial increase. Notably, the energy-storage segment alone experienced a year-on-year growth of 64.9 %. Prior to dispatch, lithium-ion batteries must undergo self-discharge testing to ensure safety and reliability. In practice, identifying the approximately 2% of batteries exhibiting excessive self-discharge requires a prolonged resting period (10-30 days) to track self-discharge voltage drop (SDV-drop), which accounts for nearly two-thirds of the overall production cycle and severely limits manufacturing efficiency. Rapid and accurate prediction of self-discharge behavior has thus become a pressing engineering challenge. This study presents an artificial intelligence enabled framework that predicts a 28-day voltage drop using formation-stage data, thereby obviating the prolonged rest period. The approach integrates latent feature extraction from charge-discharge curves, unsupervised clustering, and transfer learning. Specifically, both comprehensive temporal and static features are automatically extracted from current, voltage, and capacity trajectories, along with scalar performance indicators. A hybrid K-means-t-distributed stochastic neighbor embedding (t-SNE) algorithm partitions the dataset into internally homogeneous clusters, enhancing intra-cluster consistency and inter-cluster separability. During transfer learning, maximum mean discrepancy aligns feature distributions between source and target domains, while a feature-label consistency constraint further mitigates domain shift and improves generalization. Comparative experiments demonstrate that the proposed model markedly outperforms state-of-the-art baselines in predicting SDV-drop. This framework thus provides a theoretical foundation and practical pathway for rapid self-discharge assessment, which enables significant reductions in production cycle time and improves manufacturing efficiency.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114180"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-12DOI: 10.1016/j.engappai.2026.113947
Guyang Zhang, Waleed Abdulla
Transformers have become the architecture of choice for learning long-range dependencies, yet their adoption in hyperspectral imaging (HSI) is still emerging. We reviewed more than 300 papers published up to 2025 and present the first end-to-end survey dedicated to Transformer-based HSI classification. The study categorizes every stage of a typical pipeline—pre-processing, patch or pixel embedding, positional encoding, spatial–spectral feature extraction, multi-head self-attention variants, skip connections, and loss design—and contrasts alternative design choices with the unique spatial–spectral properties of HSI. We map the field’s progress against persistent obstacles: scarce labeled data, extreme spectral dimensionality, computational overhead, and limited model explainability. Finally, we outline a research agenda prioritizing valuable public data sets, lightweight on-edge models, illumination and sensor shifts robustness, and intrinsically interpretable attention mechanisms. Our goal is to guide researchers in selecting, combining, or extending Transformer components that are truly fit for purpose for next-generation HSI applications.
{"title":"Transformers meet hyperspectral imaging: A comprehensive study of models, challenges and open problems","authors":"Guyang Zhang, Waleed Abdulla","doi":"10.1016/j.engappai.2026.113947","DOIUrl":"10.1016/j.engappai.2026.113947","url":null,"abstract":"<div><div>Transformers have become the architecture of choice for learning long-range dependencies, yet their adoption in hyperspectral imaging (HSI) is still emerging. We reviewed more than 300 papers published up to 2025 and present the first end-to-end survey dedicated to Transformer-based HSI classification. The study categorizes every stage of a typical pipeline—pre-processing, patch or pixel embedding, positional encoding, spatial–spectral feature extraction, multi-head self-attention variants, skip connections, and loss design—and contrasts alternative design choices with the unique spatial–spectral properties of HSI. We map the field’s progress against persistent obstacles: scarce labeled data, extreme spectral dimensionality, computational overhead, and limited model explainability. Finally, we outline a research agenda prioritizing valuable public data sets, lightweight on-edge models, illumination and sensor shifts robustness, and intrinsically interpretable attention mechanisms. Our goal is to guide researchers in selecting, combining, or extending Transformer components that are truly fit for purpose for next-generation HSI applications.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 113947"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To reduce the computational cost in the multi-criteria optimization process based on multi-axial vibration fatigue life and improve the fairness of multi-criteria decision-making, this paper proposes a multi-axial vibration fatigue optimization strategy based on artificial intelligence. This strategy combines a hybrid surrogate model using Random Forest (RF) and Radial Basis Function Neural Network (RBFNN), the Non-dominated Sorting Genetic Algorithm II (NSGA-II), and a Modified Preference Selection Index (MPSI). The multiaxial vibration fatigue formula is derived for calculating the fatigue life of structures. The RF-RBFNN hybrid surrogate model is used to fit the relationship between variables and responses. The NSGA-II is employed to mine the Approximation set from the hybrid surrogate model, and the Modified Preference Selection Index method is used to determine the best compromise solution. The research results indicate that, while satisfying all constraint indicators, the weight reduction rate of the vehicle frame is 4.9%, and the prediction accuracy of the fatigue life surrogate model is 97.015%. The main contribution of this paper is to extend the artificial intelligence algorithm to the field of multi-axis vibration fatigue optimization and apply it to the solution process of the fatigue life of the tractor frame.
{"title":"Multi-axial vibration fatigue optimization strategy based on the artificial intelligence algorithm","authors":"Xiaopeng Zhang , Boqiang Zhang , Dengfeng Wang , Zihao Meng , Fengmin Lian , Jialin Dong , Shang Zhang , Hongli Chen , Haijun Ruan","doi":"10.1016/j.engappai.2026.114084","DOIUrl":"10.1016/j.engappai.2026.114084","url":null,"abstract":"<div><div>To reduce the computational cost in the multi-criteria optimization process based on multi-axial vibration fatigue life and improve the fairness of multi-criteria decision-making, this paper proposes a multi-axial vibration fatigue optimization strategy based on artificial intelligence. This strategy combines a hybrid surrogate model using Random Forest (RF) and Radial Basis Function Neural Network (RBFNN), the Non-dominated Sorting Genetic Algorithm II (NSGA-II), and a Modified Preference Selection Index (MPSI). The multiaxial vibration fatigue formula is derived for calculating the fatigue life of structures. The RF-RBFNN hybrid surrogate model is used to fit the relationship between variables and responses. The NSGA-II is employed to mine the Approximation set from the hybrid surrogate model, and the Modified Preference Selection Index method is used to determine the best compromise solution. The research results indicate that, while satisfying all constraint indicators, the weight reduction rate of the vehicle frame is 4.9%, and the prediction accuracy of the fatigue life surrogate model is 97.015%. The main contribution of this paper is to extend the artificial intelligence algorithm to the field of multi-axis vibration fatigue optimization and apply it to the solution process of the fatigue life of the tractor frame.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"169 ","pages":"Article 114084"},"PeriodicalIF":8.0,"publicationDate":"2026-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146174863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}