Engineering Applications of Artificial Intelligence最新文献_第9页

Parallel segmentation network for real-time semantic segmentation

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110487

Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song

Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.

{"title":"Parallel segmentation network for real-time semantic segmentation","authors":"Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song","doi":"10.1016/j.engappai.2025.110487","DOIUrl":"10.1016/j.engappai.2025.110487","url":null,"abstract":"<div><div>Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110487"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Information extraction from multi-layout invoice images using FATURA dataset

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110478

Mahmoud Limam, Marwa Dhiaf, Yousri Kessentini

Document analysis and understanding models often require extensive annotated data to be trained. However, various document-related tasks extend beyond mere text transcription, requiring both textual content and precise bounding-box annotations to identify different document elements. Collecting such data becomes particularly challenging, especially in the context of invoices, where privacy concerns add an additional layer of complexity. Invoices are critical documents in many business processes, but existing datasets for invoice analysis are limited in size and diversity, hindering the development of robust models. Current datasets do not adequately address the need for diverse layouts and comprehensive annotations, which are essential for training models capable of handling real-world variations in invoice documents. In this paper, we introduce FATURA, a pivotal resource for researchers in the field of document analysis and understanding. FATURA is a highly diverse dataset featuring multi-layout, annotated invoice document images. Comprising 10,000 invoices with 50 distinct layouts, it represents the largest openly accessible image dataset of invoice documents known to date. We provide an extensive evaluation using different information extraction methods under diverse training and evaluation scenarios, including precision, recall, and F1-score. The evaluation includes a visual-based approach using object detection for text region classification, a multi-modal strategy integrating visual and textual data for granular content comprehension, and a hybrid approach combining these methods. The dataset is freely accessible at this https://zenodo.org/record/8261508, empowering researchers to advance the field of document analysis and understanding.

{"title":"Information extraction from multi-layout invoice images using FATURA dataset","authors":"Mahmoud Limam, Marwa Dhiaf, Yousri Kessentini","doi":"10.1016/j.engappai.2025.110478","DOIUrl":"10.1016/j.engappai.2025.110478","url":null,"abstract":"<div><div>Document analysis and understanding models often require extensive annotated data to be trained. However, various document-related tasks extend beyond mere text transcription, requiring both textual content and precise bounding-box annotations to identify different document elements. Collecting such data becomes particularly challenging, especially in the context of invoices, where privacy concerns add an additional layer of complexity. Invoices are critical documents in many business processes, but existing datasets for invoice analysis are limited in size and diversity, hindering the development of robust models. Current datasets do not adequately address the need for diverse layouts and comprehensive annotations, which are essential for training models capable of handling real-world variations in invoice documents. In this paper, we introduce FATURA, a pivotal resource for researchers in the field of document analysis and understanding. FATURA is a highly diverse dataset featuring multi-layout, annotated invoice document images. Comprising 10,000 invoices with 50 distinct layouts, it represents the largest openly accessible image dataset of invoice documents known to date. We provide an extensive evaluation using different information extraction methods under diverse training and evaluation scenarios, including precision, recall, and F1-score. The evaluation includes a visual-based approach using object detection for text region classification, a multi-modal strategy integrating visual and textual data for granular content comprehension, and a hybrid approach combining these methods. The dataset is freely accessible at this <span><span>https://zenodo.org/record/8261508</span><svg><path></path></svg></span>, empowering researchers to advance the field of document analysis and understanding.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"149 ","pages":"Article 110478"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110373

Zhang Minghao , Song Bifeng , Yang Xiaojun , Wang Liang

This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.

{"title":"A plug-and-play fully on-the-job real-time reinforcement learning algorithm for a direct-drive tandem-wing experiment platforms under multiple random operating conditions","authors":"Zhang Minghao , Song Bifeng , Yang Xiaojun , Wang Liang","doi":"10.1016/j.engappai.2025.110373","DOIUrl":"10.1016/j.engappai.2025.110373","url":null,"abstract":"<div><div>This study addresses the motion control problem of the Direct-Drive Tandem-Wing Experiment Platform (DDTWEP), focusing on designing effective direct and transitional operating strategies for pitch, roll, and yaw under nonlinear, unsteady aerodynamic interference caused by high-frequency oscillations and closely spaced tandem wings by leveraging advanced artificial intelligence (AI) techniques. The Concerto Reinforcement Learning Extension (CRL2E) algorithm, a novel AI approach, is proposed to tackle this challenge, featuring the innovative Physics-Inspired Rule-Based Policy Composer strategy and experimental validation. The results demonstrate that the CRL2E algorithm maintains safety and efficiency throughout the training process, even with randomly initialized policy weights. In DDTWEP's plug-and-play, fully on-the-job motion control problem, the algorithm achieves a performance improvement of at least fourteen-fold and up to sixty-six-fold within the first five hundred interactions compared to Soft Actor-Critic (SAC), Proximal Policy Optimization (PPO), and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms. Furthermore, to further verify the rationality and performance of the module and algorithm design, this study introduces two perturbations: Time-Interleaved Capability Perturbation and Composer Perturbation, and develops multiple algorithms for comparative experiments. The experimental results show that compared to existing Concerto Reinforcement Learning (CRL) frameworks, the CRL2E algorithm achieves an 8.3%–60.4% enhancement in tracking accuracy, a 36.11%–57.64% improvement in convergence speed over the CRL with Composer Perturbation algorithm, and a 43.52%–65.85% improvement over the CRL with Time-Interleaved Capability Perturbation and Composer Perturbation algorithms, indicating the rationality of the CRL2E algorithm design. Regarding generalizability, the CRL2E algorithm demonstrates significant applicability in quadrotor flight control, highlighting its potential versatility. From a technical affinity perspective, the CRL2E algorithm is well-suited for integrating pretraining techniques, demonstrating excellent safety and efficiency in addressing cross-task plug-and-play and fully on-the-job fine-tuning problems. Regarding deplorability, hardware requirements were analyzed through ten thousand runs on diverse edge computing platforms, computational models, and operating systems to guide real-world deployment. Based on the experimental results, a real-time hardware-in-the-loop simulation system was constructed to validate the algorithm's effectiveness under realistic conditions. Additionally, an innovative yaw mechanism and its corresponding system model are introduced in this study to enhance the complexity of the system dynamics. These contributions provide valuable insights for addressing motion control challenges in complex mechanical systems.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110373"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Determining the superiority of a robust cloud fault tolerance mechanism using a spherical cubic fuzzy set-based decision approach

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110402

Mohannad T. Mohammed , Mohamed Safaa Shubber , Sarah Qahtan , Hassan A. Alsatta , Nahia Mourad , A.A. Zaidan , B.B. Zaidan

Ensuring the availability of robust fault tolerance mechanisms is crucial for providing reliable cloud computing services. The complexity of cloud system components, combined with the wide range of fault tolerance frameworks proposed in numerous studies, makes identifying the optimal cloud fault tolerance framework a significant challenge. These frameworks, typically based on either reactive fault tolerance (RFT) or proactive fault tolerance (PFT) mechanisms, can be evaluated using distinct attributes. However, determining the superiority of one framework over another is not straightforward due to several factors: the multiplicity of performance attributes, trade-offs among these attributes, decisions regarding their relative importance, observed variations in attribute data across different frameworks, and nature of the subjective evaluation. To address this challenge, this paper proposes a decision-making approach using multiple attributes decision-making (MADM) methods, including the Fuzzy Decision by Opinion Score Method (FDOSM) and the Fuzzy Weighted with Zero Inconsistency Criterion (FWZIC) method, extended and formulated within Spherical Cubic Fuzzy Sets (SCFS) and integrated with the Preference Ranking Organization Method for Enrichment of Evaluations (PROMETHEE). The developed SCFS–FWZIC method prioritizes the performance attributes of cloud fault tolerance frameworks, while the SCFS–FDOSM method is developed to transform the evaluation values of each framework into scores. PROMETHEE is then employed to rank 19 frameworks under the RFT category and 7 frameworks under the PFT category to identify the optimal one. The results indicate that AFTRC under RFT and ASSURE under PFT ranked highest due to their essential attributes, while SAFTP under RFT and PFHC₂ under PFT received the lowest ranks. Sensitivity, correlation, and comparison analyses were conducted to validate and assess the stability and robustness of the proposed methods. The implications of this study are likely to benefit a variety of stakeholders, including organizations and managers.

{"title":"Determining the superiority of a robust cloud fault tolerance mechanism using a spherical cubic fuzzy set-based decision approach","authors":"Mohannad T. Mohammed , Mohamed Safaa Shubber , Sarah Qahtan , Hassan A. Alsatta , Nahia Mourad , A.A. Zaidan , B.B. Zaidan","doi":"10.1016/j.engappai.2025.110402","DOIUrl":"10.1016/j.engappai.2025.110402","url":null,"abstract":"<div><div>Ensuring the availability of robust fault tolerance mechanisms is crucial for providing reliable cloud computing services. The complexity of cloud system components, combined with the wide range of fault tolerance frameworks proposed in numerous studies, makes identifying the optimal cloud fault tolerance framework a significant challenge. These frameworks, typically based on either reactive fault tolerance (RFT) or proactive fault tolerance (PFT) mechanisms, can be evaluated using distinct attributes. However, determining the superiority of one framework over another is not straightforward due to several factors: the multiplicity of performance attributes, trade-offs among these attributes, decisions regarding their relative importance, observed variations in attribute data across different frameworks, and nature of the subjective evaluation. To address this challenge, this paper proposes a decision-making approach using multiple attributes decision-making (MADM) methods, including the Fuzzy Decision by Opinion Score Method (FDOSM) and the Fuzzy Weighted with Zero Inconsistency Criterion (FWZIC) method, extended and formulated within Spherical Cubic Fuzzy Sets (SCFS) and integrated with the Preference Ranking Organization Method for Enrichment of Evaluations (PROMETHEE). The developed SCFS–FWZIC method prioritizes the performance attributes of cloud fault tolerance frameworks, while the SCFS–FDOSM method is developed to transform the evaluation values of each framework into scores. PROMETHEE is then employed to rank 19 frameworks under the RFT category and 7 frameworks under the PFT category to identify the optimal one. The results indicate that AFTRC under RFT and ASSURE under PFT ranked highest due to their essential attributes, while SAFTP under RFT and PFHC<sub>2</sub> under PFT received the lowest ranks. Sensitivity, correlation, and comparison analyses were conducted to validate and assess the stability and robustness of the proposed methods. The implications of this study are likely to benefit a variety of stakeholders, including organizations and managers.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110402"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A deep learning-based adaptive denoising approach for fine identification of rock microcracks from noisy strain data

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110471

Shuai Zhao , Dian-Rui Mu , Dao-Yuan Tan

Most of the existing deep learning approaches about crack identification are difficult to obtain a satisfactory result when confronting highly noisy distributed fibre optic sensing (DFOS) data. To address this limitation, this research develops a hybrid attention residual shrinkage network (HARSNet) to enhance feature extraction ability from highly noisy DFOS data and achieve a high rock (micro) crack identification accuracy. Considering that it is challenging to set appropriate threshold values to eliminate data noise, a hybrid attention module is developed as trainable modules for the HARSNet to automatically and adaptively capture the thresholds relevant to data noise, so that the professional expertise on threshold determination is not required. Then a soft thresholding layer embedded in the HARSNet uses the captured thresholds to automatically eliminate noise to make the crack-related features more discriminative. The effectiveness of the proposed HARSNet in improving crack identification accuracy is examined through experimental comparisons with the other five state-of-the-art deep learning models. Results indicate that the proposed HARSNet outperforms the five deep learning models by yielding a maximal accuracy improvement of 12.9% on the highly noisy DFOS dataset with a signal-to-noise ratio of 0, which is satisfying in rock (micro) crack identification from highly noisy DFOS data.

{"title":"A deep learning-based adaptive denoising approach for fine identification of rock microcracks from noisy strain data","authors":"Shuai Zhao , Dian-Rui Mu , Dao-Yuan Tan","doi":"10.1016/j.engappai.2025.110471","DOIUrl":"10.1016/j.engappai.2025.110471","url":null,"abstract":"<div><div>Most of the existing deep learning approaches about crack identification are difficult to obtain a satisfactory result when confronting highly noisy distributed fibre optic sensing (DFOS) data. To address this limitation, this research develops a hybrid attention residual shrinkage network (HARSNet) to enhance feature extraction ability from highly noisy DFOS data and achieve a high rock (micro) crack identification accuracy. Considering that it is challenging to set appropriate threshold values to eliminate data noise, a hybrid attention module is developed as trainable modules for the HARSNet to automatically and adaptively capture the thresholds relevant to data noise, so that the professional expertise on threshold determination is not required. Then a soft thresholding layer embedded in the HARSNet uses the captured thresholds to automatically eliminate noise to make the crack-related features more discriminative. The effectiveness of the proposed HARSNet in improving crack identification accuracy is examined through experimental comparisons with the other five state-of-the-art deep learning models. Results indicate that the proposed HARSNet outperforms the five deep learning models by yielding a maximal accuracy improvement of 12.9% on the highly noisy DFOS dataset with a signal-to-noise ratio of 0, which is satisfying in rock (micro) crack identification from highly noisy DFOS data.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110471"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic feature capturing in a fluid flow reduced-order model using attention-augmented autoencoders

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110463

Alireza Beiki, Reza Kamali

This study looks into how adding adaptive attention to convolutional autoencoders can help reconstruct flow fields in fluid dynamics applications. The study compares the effectiveness of the proposed adaptive attention mechanism with the convolutional block attention module approach using two different sets of datasets. The analysis encompasses the evaluation of reconstruction loss, latent space characteristics, and the application of attention mechanisms to time series forecasting. Combining adaptive attention with involution layers enhances its ability to identify and highlight significant features, surpassing the capabilities of the convolutional block attention module. This result demonstrates an increase of over 20% in the accuracy of reconstruction. Latent space analysis shows the adaptive attention mechanism’s complex and flexible encoding, which makes it easier for the model to represent different types of data. The study also looks at how attention works and how it affects time series forecasting. It shows that a new method that combines multi-head attention and bidirectional long-short-term memory works well for forecasting over 5 s of futures of flow fields. This research provides valuable insights into the role of attention mechanisms in improving model accuracy, generalization, and forecasting capabilities in the field of fluid dynamics.

{"title":"Dynamic feature capturing in a fluid flow reduced-order model using attention-augmented autoencoders","authors":"Alireza Beiki, Reza Kamali","doi":"10.1016/j.engappai.2025.110463","DOIUrl":"10.1016/j.engappai.2025.110463","url":null,"abstract":"<div><div>This study looks into how adding adaptive attention to convolutional autoencoders can help reconstruct flow fields in fluid dynamics applications. The study compares the effectiveness of the proposed adaptive attention mechanism with the convolutional block attention module approach using two different sets of datasets. The analysis encompasses the evaluation of reconstruction loss, latent space characteristics, and the application of attention mechanisms to time series forecasting. Combining adaptive attention with involution layers enhances its ability to identify and highlight significant features, surpassing the capabilities of the convolutional block attention module. This result demonstrates an increase of over 20% in the accuracy of reconstruction. Latent space analysis shows the adaptive attention mechanism’s complex and flexible encoding, which makes it easier for the model to represent different types of data. The study also looks at how attention works and how it affects time series forecasting. It shows that a new method that combines multi-head attention and bidirectional long-short-term memory works well for forecasting over 5 s of futures of flow fields. This research provides valuable insights into the role of attention mechanisms in improving model accuracy, generalization, and forecasting capabilities in the field of fluid dynamics.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"149 ","pages":"Article 110463"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Phased Noise Enhanced Multiple Feature Discrimination Network for fabric defect detection

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110480

Haoran Ma , Zuoyong Li , Haoyi Fan , Xiangpan Zheng , Jiaquan Yan , Rong Hu

Fabric defect detection is crucial for evaluating the quality of textile products. However, the subtlety and scarcity of fabric defects pose challenges to the task of detecting. Therefore, we propose a Phased Noise Enhanced Multiple Feature Discrimination Network, which is based on phased noise enhancement strategy and multiple feature discrimination module to improve the model’s ability to identify complex and subtle flaws. Specifically, we propose the phased noise enhancement strategy in the feature space to simulate feature-level anomalies that are closer to reality. This strategy can improve the input quality of the feature reconstructor, so that helps its perception and reconstruction ability. Then, we propose the multiple feature discrimination module, which has dual feature branches to improve its ability to distinguish more complex detailed texture features. In addition, we propose a subsampling module to reduce feature redundancy and ensure efficient inference speed. Finally, we conduct extensive experiments and ablation studies on two publicly available fabric datasets, AITEX and Kaggle Fabric. The experimental results show that the proposed method achieved 92% and 100% image level metrics and 97.5% and 67.1% pixel level metrics on two datasets, respectively, which is superior to the current state-of-the-art methods. In addition, our method also demonstrated significant performance in generalization experiments.

{"title":"Phased Noise Enhanced Multiple Feature Discrimination Network for fabric defect detection","authors":"Haoran Ma , Zuoyong Li , Haoyi Fan , Xiangpan Zheng , Jiaquan Yan , Rong Hu","doi":"10.1016/j.engappai.2025.110480","DOIUrl":"10.1016/j.engappai.2025.110480","url":null,"abstract":"<div><div>Fabric defect detection is crucial for evaluating the quality of textile products. However, the subtlety and scarcity of fabric defects pose challenges to the task of detecting. Therefore, we propose a <strong>P</strong>hased <strong>N</strong>oise Enhanced <strong>M</strong>ultiple <strong>F</strong>eature <strong>D</strong>iscrimination Network, which is based on phased noise enhancement strategy and multiple feature discrimination module to improve the model’s ability to identify complex and subtle flaws. Specifically, we propose the phased noise enhancement strategy in the feature space to simulate feature-level anomalies that are closer to reality. This strategy can improve the input quality of the feature reconstructor, so that helps its perception and reconstruction ability. Then, we propose the multiple feature discrimination module, which has dual feature branches to improve its ability to distinguish more complex detailed texture features. In addition, we propose a subsampling module to reduce feature redundancy and ensure efficient inference speed. Finally, we conduct extensive experiments and ablation studies on two publicly available fabric datasets, AITEX and Kaggle Fabric. The experimental results show that the proposed method achieved 92% and 100% image level metrics and 97.5% and 67.1% pixel level metrics on two datasets, respectively, which is superior to the current state-of-the-art methods. In addition, our method also demonstrated significant performance in generalization experiments.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"149 ","pages":"Article 110480"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An enhanced artificial bee colony algorithm with self-learning optimization mechanism for multi-objective path planning problem

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-11 DOI: 10.1016/j.engappai.2025.110444

Fan Ye , Peng Duan , Leilei Meng , Hongyan Sang , Kaizhou Gao

In recent years, path planning has been one of the most concerned problems in mobile robotics. This study investigates a multi-objective path planning problem focused on minimizing path length and maximizing path safety. Based on the characteristics of this problem, a mathematical model is established, and then an enhanced artificial bee colony algorithm is proposed to solve this problem. In the proposed algorithm, a new hybrid initialization strategy is designed to generate a high-quality initial population. In the employed bee phase, in addition to the crossover and mutation operators, two objective-oriented evolutionary operators are developed. In the onlooker bee phase, two self-learning optimization mechanisms are applied to the non-dominated and dominated individuals, respectively. Specifically, the collaborative-based optimization mechanism is designed to improve the quality of the non-dominated individuals. The dominance-guide optimization mechanism is developed to guide the dominated individuals to learn from the non-dominated ones. In the scout bee phase, a novel individual-restart strategy that considers the useful information of global best solutions is investigated, which increases the proposed algorithm’s exploration ability. Finally, the proposed algorithm is compared with five state-of-the-art algorithms on sixteen instances from four representative environments. Simulation results show that the proposed algorithm achieved average improvements of 2.60% and 90.77% on the hypervolume and inverted generational distance metrics, respectively, compared with the algorithm with the second-best performance. These demonstrate the effectiveness and high performance of the proposed algorithm for solving multi-objective path planning problems in terms of both population diversity and solution quality.

近年来，路径规划一直是移动机器人领域最受关注的问题之一。本研究探讨了一个多目标路径规划问题，重点是最小化路径长度和最大化路径安全。根据该问题的特点，建立了一个数学模型，然后提出了一种增强型人工蜂群算法来解决该问题。在所提出的算法中，设计了一种新的混合初始化策略来生成高质量的初始种群。在受雇蜂阶段，除了交叉和突变算子外，还开发了两个面向目标的进化算子。在观察蜂阶段，两种自学优化机制分别应用于非优势个体和优势个体。具体来说，基于协作的优化机制旨在提高非优势个体的质量。主导引导优化机制是为了引导主导个体向非主导个体学习。在侦察蜂阶段，研究了一种考虑全局最优解有用信息的新型个体重启策略，从而提高了拟议算法的探索能力。最后，在四个代表性环境的 16 个实例上，将所提出的算法与五种最先进的算法进行了比较。仿真结果表明，与性能第二好的算法相比，所提算法在超体积和倒代距离指标上分别平均提高了 2.60% 和 90.77%。这证明了所提出的算法在解决多目标路径规划问题时，在种群多样性和解决方案质量方面的有效性和高性能。

{"title":"An enhanced artificial bee colony algorithm with self-learning optimization mechanism for multi-objective path planning problem","authors":"Fan Ye , Peng Duan , Leilei Meng , Hongyan Sang , Kaizhou Gao","doi":"10.1016/j.engappai.2025.110444","DOIUrl":"10.1016/j.engappai.2025.110444","url":null,"abstract":"<div><div>In recent years, path planning has been one of the most concerned problems in mobile robotics. This study investigates a multi-objective path planning problem focused on minimizing path length and maximizing path safety. Based on the characteristics of this problem, a mathematical model is established, and then an enhanced artificial bee colony algorithm is proposed to solve this problem. In the proposed algorithm, a new hybrid initialization strategy is designed to generate a high-quality initial population. In the employed bee phase, in addition to the crossover and mutation operators, two objective-oriented evolutionary operators are developed. In the onlooker bee phase, two self-learning optimization mechanisms are applied to the non-dominated and dominated individuals, respectively. Specifically, the collaborative-based optimization mechanism is designed to improve the quality of the non-dominated individuals. The dominance-guide optimization mechanism is developed to guide the dominated individuals to learn from the non-dominated ones. In the scout bee phase, a novel individual-restart strategy that considers the useful information of global best solutions is investigated, which increases the proposed algorithm’s exploration ability. Finally, the proposed algorithm is compared with five state-of-the-art algorithms on sixteen instances from four representative environments. Simulation results show that the proposed algorithm achieved average improvements of 2.60% and 90.77% on the hypervolume and inverted generational distance metrics, respectively, compared with the algorithm with the second-best performance. These demonstrate the effectiveness and high performance of the proposed algorithm for solving multi-objective path planning problems in terms of both population diversity and solution quality.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"149 ","pages":"Article 110444"},"PeriodicalIF":7.5,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic modeling and control of pneumatic artificial muscles via Deep Lagrangian Networks and Reinforcement Learning

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-10 DOI: 10.1016/j.engappai.2025.110406

Shuopeng Wang, Rixin Wang, Yanhui Liu, Ying Zhang, Lina Hao

Pneumatic artificial muscles (PAMs), as typical soft actuators characterized by hysteresis and nonlinearity, pose a challenging task in modeling and control. This paper proposes a Deep Lagrangian Networks Reinforcement Learning (DeLaNRL) controller that combines deep Lagrangian networks (DeLaN) with reinforcement learning to achieve precise motion control of PAMs. By leveraging the DeLaN model, the dynamic model is constrained to adhere to the Lagrangian first principle, enhancing the model’s compliance with physical constraints. Furthermore, to improve the generality and adaptability of the model to various input data, the Self-scalable tanh (Stan) function is employed as the activation function within the DeLaN model. To validate the effectiveness of the proposed modeling approach, the model is tested on both sampled and unknown motions. The results demonstrate the effectiveness and generalization capability of the DeLaN model with the Stan activation function. Subsequently, the reinforcement learning controller is applied to the learned dynamics model, resulting in control strategies capable of precise motion control. To further demonstrate the effectiveness of the proposed controller, experiments are conducted on both simulation and the experiment platform for reaching and tracking tasks. The simulation results indicate that the control error is less than 0.91 millimeters, while on the experimental platform, the control error is less than 3.7 millimeters. These results confirm that the proposed DeLaNRL controller exhibits high control performance.

{"title":"Dynamic modeling and control of pneumatic artificial muscles via Deep Lagrangian Networks and Reinforcement Learning","authors":"Shuopeng Wang, Rixin Wang, Yanhui Liu, Ying Zhang, Lina Hao","doi":"10.1016/j.engappai.2025.110406","DOIUrl":"10.1016/j.engappai.2025.110406","url":null,"abstract":"<div><div>Pneumatic artificial muscles (PAMs), as typical soft actuators characterized by hysteresis and nonlinearity, pose a challenging task in modeling and control. This paper proposes a Deep Lagrangian Networks Reinforcement Learning (DeLaNRL) controller that combines deep Lagrangian networks (DeLaN) with reinforcement learning to achieve precise motion control of PAMs. By leveraging the DeLaN model, the dynamic model is constrained to adhere to the Lagrangian first principle, enhancing the model’s compliance with physical constraints. Furthermore, to improve the generality and adaptability of the model to various input data, the Self-scalable tanh (Stan) function is employed as the activation function within the DeLaN model. To validate the effectiveness of the proposed modeling approach, the model is tested on both sampled and unknown motions. The results demonstrate the effectiveness and generalization capability of the DeLaN model with the Stan activation function. Subsequently, the reinforcement learning controller is applied to the learned dynamics model, resulting in control strategies capable of precise motion control. To further demonstrate the effectiveness of the proposed controller, experiments are conducted on both simulation and the experiment platform for reaching and tracking tasks. The simulation results indicate that the control error is less than 0.91 millimeters, while on the experimental platform, the control error is less than 3.7 millimeters. These results confirm that the proposed DeLaNRL controller exhibits high control performance.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110406"},"PeriodicalIF":7.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An uncertainty-based Collaborative Weakly Supervised Segmentation Network for Positron emission tomography-Computed tomography images

IF 7.5 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Engineering Applications of Artificial Intelligence

Pub Date : 2025-03-10 DOI: 10.1016/j.engappai.2025.110442

Zhaoshuo Diao , Huiyan Jiang

Weakly supervised segmentation has emerged as an alternative to mitigate the necessity for large volumes of annotated data in semantic segmentation, particularly crucial in medical image analysis where pixel-level labeling is time-consuming and labor-intensive. At present, many weakly supervised methods for single modality medical image segmentation have been proposed, but there is a lack of research on multimodality, especially for Positron emission tomography-Computed tomography images. In Positron emission tomography-Computed tomography images, objects may be easily distinguishable in one modality but indistinguishable in another modality. Therefore, we propose an Uncertainty-based Collaborative Weakly Supervised Segmentation Network. First, we propose a Self-Refine module to output a more precise Class Activation Map for a single modality. Then, an uncertainty collaborative learning strategy is proposed, which follows the principle of “who claim, who burden the evidence”. Uncertainty collaborative learning strategy leverages the uncertainty dispersion module to make the modalities considered to have tumors dominate in the final segmentation results fusion. If both modalities are considered to have a tumor, the cross-modal consistency constraint is utilized to obtain more precise segmentation results. Finally, we extend the uncertainty collaborative learning strategy to the fully supervised segmentation task. We do experiments on two datasets of soft tissue sarcoma and liver tumors. The experimental results prove that the proposed method is superior to other weakly supervised methods in Positron emission tomography-Computed tomography images. In addition, experiments on fully supervised segmentation also demonstrate that the uncertainty collaborative learning strategy proposed can improve the segmentation results of different segmentation networks. The code is available at https://github.com/HarriesDZS/UCWS-Net.

{"title":"An uncertainty-based Collaborative Weakly Supervised Segmentation Network for Positron emission tomography-Computed tomography images","authors":"Zhaoshuo Diao , Huiyan Jiang","doi":"10.1016/j.engappai.2025.110442","DOIUrl":"10.1016/j.engappai.2025.110442","url":null,"abstract":"<div><div>Weakly supervised segmentation has emerged as an alternative to mitigate the necessity for large volumes of annotated data in semantic segmentation, particularly crucial in medical image analysis where pixel-level labeling is time-consuming and labor-intensive. At present, many weakly supervised methods for single modality medical image segmentation have been proposed, but there is a lack of research on multimodality, especially for Positron emission tomography-Computed tomography images. In Positron emission tomography-Computed tomography images, objects may be easily distinguishable in one modality but indistinguishable in another modality. Therefore, we propose an Uncertainty-based Collaborative Weakly Supervised Segmentation Network. First, we propose a Self-Refine module to output a more precise Class Activation Map for a single modality. Then, an uncertainty collaborative learning strategy is proposed, which follows the principle of “who claim, who burden the evidence”. Uncertainty collaborative learning strategy leverages the uncertainty dispersion module to make the modalities considered to have tumors dominate in the final segmentation results fusion. If both modalities are considered to have a tumor, the cross-modal consistency constraint is utilized to obtain more precise segmentation results. Finally, we extend the uncertainty collaborative learning strategy to the fully supervised segmentation task. We do experiments on two datasets of soft tissue sarcoma and liver tumors. The experimental results prove that the proposed method is superior to other weakly supervised methods in Positron emission tomography-Computed tomography images. In addition, experiments on fully supervised segmentation also demonstrate that the uncertainty collaborative learning strategy proposed can improve the segmentation results of different segmentation networks. The code is available at <span><span>https://github.com/HarriesDZS/UCWS-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110442"},"PeriodicalIF":7.5,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143577388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0