Pub Date : 2024-11-16DOI: 10.1007/s10489-024-06062-0
Jikui Wang, Huiyu Duan, Cuihong Zhang, Feiping Nie
Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.
{"title":"A robust self-training algorithm based on relative node graph","authors":"Jikui Wang, Huiyu Duan, Cuihong Zhang, Feiping Nie","doi":"10.1007/s10489-024-06062-0","DOIUrl":"10.1007/s10489-024-06062-0","url":null,"abstract":"<div><p>Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1007/s10489-024-05844-w
Haiguang Zhang, Yuanyuan Sun, Bo Xu, Hongfei Lin
{"title":"Correction to: LegalATLE: an active transfer learning framework for legal triple extraction","authors":"Haiguang Zhang, Yuanyuan Sun, Bo Xu, Hongfei Lin","doi":"10.1007/s10489-024-05844-w","DOIUrl":"10.1007/s10489-024-05844-w","url":null,"abstract":"","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13179 - 13179"},"PeriodicalIF":3.4,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1007/s10489-024-05647-z
Domenico Iuso, Soumick Chatterjee, Sven Cornelissen, Dries Verhees, Jan De Beenhouwer, Jan Sijbers
Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all samples of a batch, X-ray computed tomography (X-CT) is often used in combination with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly used, as they can be trained to be robust to the material being analysed and resilient to poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. Additionally, there is a notable absence of comparisons between supervised and unsupervised models for voxel-wise pore segmentation tasks. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet, ACC-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE, RV-VAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch approach for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models was post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was UNet++ with an average precision of 0.751 ± 0.030, while the best unsupervised model was the post-processed ceVAE with 0.830 ± 0.003. Notably, the ceVAE model, with its post-processing technique, exhibited superior capabilities, endorsing unsupervised learning as the preferred approach for the voxel-wise pore segmentation task.
{"title":"Voxel-wise segmentation for porosity investigation of additive manufactured parts with 3D unsupervised and (deeply) supervised neural networks","authors":"Domenico Iuso, Soumick Chatterjee, Sven Cornelissen, Dries Verhees, Jan De Beenhouwer, Jan Sijbers","doi":"10.1007/s10489-024-05647-z","DOIUrl":"10.1007/s10489-024-05647-z","url":null,"abstract":"<div><p>Additive Manufacturing (AM) has emerged as a manufacturing process that allows the direct production of samples from digital models. To ensure that quality standards are met in all samples of a batch, X-ray computed tomography (X-CT) is often used in combination with automated anomaly detection. For the latter, deep learning (DL) anomaly detection techniques are increasingly used, as they can be trained to be robust to the material being analysed and resilient to poor image quality. Unfortunately, most recent and popular DL models have been developed for 2D image processing, thereby disregarding valuable volumetric information. Additionally, there is a notable absence of comparisons between supervised and unsupervised models for voxel-wise pore segmentation tasks. This study revisits recent supervised (UNet, UNet++, UNet 3+, MSS-UNet, ACC-UNet) and unsupervised (VAE, ceVAE, gmVAE, vqVAE, RV-VAE) DL models for porosity analysis of AM samples from X-CT images and extends them to accept 3D input data with a 3D-patch approach for lower computational requirements, improved efficiency and generalisability. The supervised models were trained using the Focal Tversky loss to address class imbalance that arises from the low porosity in the training datasets. The output of the unsupervised models was post-processed to reduce misclassifications caused by their inability to adequately represent the object surface. The findings were cross-validated in a 5-fold fashion and include: a performance benchmark of the DL models, an evaluation of the post-processing algorithm, an evaluation of the effect of training supervised models with the output of unsupervised models. In a final performance benchmark on a test set with poor image quality, the best performing supervised model was UNet++ with an average precision of 0.751 ± 0.030, while the best unsupervised model was the post-processed ceVAE with 0.830 ± 0.003. Notably, the ceVAE model, with its post-processing technique, exhibited superior capabilities, endorsing unsupervised learning as the preferred approach for the voxel-wise pore segmentation task.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13160 - 13177"},"PeriodicalIF":3.4,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05647-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1007/s10489-024-05720-7
Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang
Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.
{"title":"Concertorl: A reinforcement learning approach for finite-time single-life enhanced control and its application to direct-drive tandem-wing experiment platforms","authors":"Minghao Zhang, Bifeng Song, Changhao Chen, Xinyu Lang, Liang Wang","doi":"10.1007/s10489-024-05720-7","DOIUrl":"10.1007/s10489-024-05720-7","url":null,"abstract":"<div><p>Achieving control of mechanical systems using finite-time single-life methods presents significant challenges in safety and efficiency for existing control algorithms. To address these issues, the ConcertoRL algorithm is introduced, featuring two main innovations: a time-interleaved mechanism based on Lipschitz conditions that integrates classical controllers with reinforcement learning-based controllers to enhance initial stage safety under single-life conditions and a policy composer based on finite-time Lyapunov convergence conditions that organizes past learning experiences to ensure efficiency within finite time constraints. Experiments are conducted on Direct-Drive Tandem-Wing Experiment Platforms, a typical mechanical system operating under nonlinear unsteady load conditions. First, compared with established algorithms such as the Soft Actor-Critic (SAC) algorithm, Proximal Policy Optimization (PPO) algorithm, and Twin Delayed Deep Deterministic policy gradient (TD3) algorithm, ConcertoRL demonstrates nearly an order of magnitude performance advantage within the first 500 steps under finite-time single-life conditions. Second, ablation experiments on the time-interleaved mechanism show that introducing this module results in a performance improvement of nearly two orders of magnitude in single-life last average reward. Furthermore, the integration of this module yields a substantial performance boost of approximately 60% over scenarios without reinforcement learning enhancements and a 30% increase in efficiency compared to reference controllers operating at doubled control frequencies. These results highlight the algorithm's ability to create a synergistic effect that exceeds the sum of its parts. Third, ablation studies on the rule-based policy composer further verify its significant impact on enhancing ConcertoRL's convergence speed. Finally, experiments on the universality of the ConcertoRL framework demonstrate its compatibility with various classical controllers, consistently achieving excellent control outcomes. ConcertoRL offers a promising approach for mechanical systems under nonlinear, unsteady load conditions. It enables plug-and-play use with high control efficiency under finite-time, single-life constraints. This work sets a new benchmark in control effectiveness for challenges posed by direct-drive platforms under tandem wing influence.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13121 - 13159"},"PeriodicalIF":3.4,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-25DOI: 10.1007/s10489-024-05827-x
Feng Lin, Jian Wang, Witold Pedrycz, Kai Zhang, Sergey Ablameyko
Underwater image processing presents a greater challenge compared to its land-based counterpart due to inherent issues such as pervasive color distortion, diminished saturation, contrast degradation, and blurred content. Existing methods rooted in general image theory and models of image formation often fall short in delivering satisfactory results, as they typically consider only common factors and make assumptions that do not hold in complex underwater environments. Furthermore, the scarcity of extensive real-world datasets for underwater image enhancement (UIE) covering diverse scenes hinders progress in this field. To address these limitations, we propose an end-to-end unsupervised underwater image enhancement network, TOLPnet. It adopts a bi-level structure, utilizing the Typhoon Optimization (TO) algorithm at the upper level to optimize the super-parameters of the convolutional neural network (CNN) model. The lower level involves a Difference of CNN that employs trainable parameters for image input-output mapping. A novel energy-limited method is proposed for dehazing, and the Laplacian pyramid mechanism decomposes the image into high-frequency and low-frequency components for enhancement. The TO algorithm is leveraged to select enhancement strength and weight coefficients for loss functions. The cascaded CNN acts as a refining network. Experimental results on typical underwater image datasets demonstrate that our proposed method surpasses many state-of-the-art approaches.
水下图像处理因其固有的问题(如普遍存在的色彩失真、饱和度降低、对比度下降和内容模糊)而比陆地图像处理面临更大的挑战。根植于一般图像理论和图像形成模型的现有方法往往无法提供令人满意的结果,因为这些方法通常只考虑常见因素,并做出在复杂的水下环境中不成立的假设。此外,用于水下图像增强(UIE)、涵盖各种场景的大量真实世界数据集的缺乏也阻碍了这一领域的进展。针对这些局限性,我们提出了端到端无监督水下图像增强网络 TOLPnet。它采用双层结构,上层利用台风优化(TO)算法优化卷积神经网络(CNN)模型的超参数。下层则是利用可训练参数进行图像输入输出映射的差分 CNN。此外,还提出了一种新颖的能量限制方法用于去毛刺,而拉普拉斯金字塔机制则将图像分解为高频和低频成分进行增强。利用 TO 算法为损失函数选择增强强度和权重系数。级联 CNN 充当细化网络。典型水下图像数据集的实验结果表明,我们提出的方法超越了许多最先进的方法。
{"title":"A typhoon optimization algorithm and difference of CNN integrated bi-level network for unsupervised underwater image enhancement","authors":"Feng Lin, Jian Wang, Witold Pedrycz, Kai Zhang, Sergey Ablameyko","doi":"10.1007/s10489-024-05827-x","DOIUrl":"10.1007/s10489-024-05827-x","url":null,"abstract":"<div><p>Underwater image processing presents a greater challenge compared to its land-based counterpart due to inherent issues such as pervasive color distortion, diminished saturation, contrast degradation, and blurred content. Existing methods rooted in general image theory and models of image formation often fall short in delivering satisfactory results, as they typically consider only common factors and make assumptions that do not hold in complex underwater environments. Furthermore, the scarcity of extensive real-world datasets for underwater image enhancement (UIE) covering diverse scenes hinders progress in this field. To address these limitations, we propose an end-to-end unsupervised underwater image enhancement network, TOLPnet. It adopts a bi-level structure, utilizing the Typhoon Optimization (TO) algorithm at the upper level to optimize the super-parameters of the convolutional neural network (CNN) model. The lower level involves a Difference of CNN that employs trainable parameters for image input-output mapping. A novel energy-limited method is proposed for dehazing, and the Laplacian pyramid mechanism decomposes the image into high-frequency and low-frequency components for enhancement. The TO algorithm is leveraged to select enhancement strength and weight coefficients for loss functions. The cascaded CNN acts as a refining network. Experimental results on typical underwater image datasets demonstrate that our proposed method surpasses many state-of-the-art approaches.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13101 - 13120"},"PeriodicalIF":3.4,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-24DOI: 10.1007/s10489-024-05756-9
Xinyu Meng, Meng Zhao, Chenxi Zhang, Yimai Zhang
Optimizing hotel recommendation systems based on consumer preferences is crucial for online hotel booking platforms. The purpose of this study is to reveal differences in hotel recommendation results for different types of consumers by considering consumer expectations. Specifically, this study introduces an online hotel recommendation method that considers three preferences for five types of consumers (business, couples, families, friends, and solo): attribute importance, consumer expectations, and actual hotel attribute performance. Here, consumer expectations are expressed in the form of the 2-tuple. 2-tuple expectations mean that customers can not only express specific demands but also express the probability of meeting the demands. Further, using three different consumer preferences, a similarity measurement model is constructed to recommend hotels for different types of consumers. This study puts this innovative method to the test using a dataset covering 40 hotels in the Beijing area and analyzes the impact of three preferences for different types of consumers on their hotel recommendation results. The method introduced in this study has two management implications. On the one hand, the recommendation method based on consumer preferences can optimize hotel recommendation systems and help online hotel booking platforms improve the accuracy of recommendation results. On the other hand, the proposed method can offer valuable insights to hotel managers, helping them measure their competitiveness and providing guidance for developing service improvement strategies.
{"title":"Making platform recommendations more responsive to the expectations of different types of consumers: a recommendation method based on online reviews","authors":"Xinyu Meng, Meng Zhao, Chenxi Zhang, Yimai Zhang","doi":"10.1007/s10489-024-05756-9","DOIUrl":"10.1007/s10489-024-05756-9","url":null,"abstract":"<div><p>Optimizing hotel recommendation systems based on consumer preferences is crucial for online hotel booking platforms. The purpose of this study is to reveal differences in hotel recommendation results for different types of consumers by considering consumer expectations. Specifically, this study introduces an online hotel recommendation method that considers three preferences for five types of consumers (business, couples, families, friends, and solo): attribute importance, consumer expectations, and actual hotel attribute performance. Here, consumer expectations are expressed in the form of the 2-tuple. 2-tuple expectations mean that customers can not only express specific demands but also express the probability of meeting the demands. Further, using three different consumer preferences, a similarity measurement model is constructed to recommend hotels for different types of consumers. This study puts this innovative method to the test using a dataset covering 40 hotels in the Beijing area and analyzes the impact of three preferences for different types of consumers on their hotel recommendation results. The method introduced in this study has two management implications. On the one hand, the recommendation method based on consumer preferences can optimize hotel recommendation systems and help online hotel booking platforms improve the accuracy of recommendation results. On the other hand, the proposed method can offer valuable insights to hotel managers, helping them measure their competitiveness and providing guidance for developing service improvement strategies.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13075 - 13100"},"PeriodicalIF":3.4,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1007/s10489-024-05819-x
Qiaokang Liang, Jintao Li, Hai Qin, Mingfeng Liu, Xiao Xiao, Dongbo Zhang, Yaonan Wang, Dan Zhang
Kitchen waste images encompass a wide range of garbage categories, posing a typical multi-label classification challenge. However, due to the complex background and significant variations in garbage morphology, there is currently limited research on kitchen waste classification. In this paper, we propose a multi-head attention-driven dynamic graph convolution lightweight network for multi-label classification of kitchen waste images. Firstly, we address the issue of large model parameterization in traditional GCN methods by optimizing the backbone network for lightweight model design. Secondly, to overcome performance losses resulting from reduced model parameters, we introduce a multi-head attention mechanism to mitigate feature information loss, enhancing the feature extraction capability of the backbone network in complex scenarios and improving the correlation between graph nodes. Finally, the dynamic graph convolution module is employed to adaptively capture semantic-aware regions, further boosting recognition capabilities. Experiments conducted on our self-constructed multi-label kitchen waste classification dataset MLKW demonstrate that our proposed algorithm achieves a 8.6% and 4.8% improvement in mAP compared to the benchmark GCN-based methods ML-GCN and ADD-GCN, respectively, establishing state-of-the-art performance. Additionally, extensive experiments on two public datasets, MS-COCO and VOC2007, showcase excellent classification results, highlighting the strong generalization ability of our algorithm.
{"title":"MHA-DGCLN: multi-head attention-driven dynamic graph convolutional lightweight network for multi-label image classification of kitchen waste","authors":"Qiaokang Liang, Jintao Li, Hai Qin, Mingfeng Liu, Xiao Xiao, Dongbo Zhang, Yaonan Wang, Dan Zhang","doi":"10.1007/s10489-024-05819-x","DOIUrl":"10.1007/s10489-024-05819-x","url":null,"abstract":"<div><p>Kitchen waste images encompass a wide range of garbage categories, posing a typical multi-label classification challenge. However, due to the complex background and significant variations in garbage morphology, there is currently limited research on kitchen waste classification. In this paper, we propose a multi-head attention-driven dynamic graph convolution lightweight network for multi-label classification of kitchen waste images. Firstly, we address the issue of large model parameterization in traditional GCN methods by optimizing the backbone network for lightweight model design. Secondly, to overcome performance losses resulting from reduced model parameters, we introduce a multi-head attention mechanism to mitigate feature information loss, enhancing the feature extraction capability of the backbone network in complex scenarios and improving the correlation between graph nodes. Finally, the dynamic graph convolution module is employed to adaptively capture semantic-aware regions, further boosting recognition capabilities. Experiments conducted on our self-constructed multi-label kitchen waste classification dataset MLKW demonstrate that our proposed algorithm achieves a 8.6% and 4.8% improvement in mAP compared to the benchmark GCN-based methods ML-GCN and ADD-GCN, respectively, establishing state-of-the-art performance. Additionally, extensive experiments on two public datasets, MS-COCO and VOC2007, showcase excellent classification results, highlighting the strong generalization ability of our algorithm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13057 - 13074"},"PeriodicalIF":3.4,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05819-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1007/s10489-024-05693-7
Zixin Chen, Jiong Yu, Qiyin Tan, Shu Li, XuSheng Du
The advancement of the computer and information industry has led to the emergence of new demands for multivariate time series anomaly detection (MTSAD) models, namely, the necessity for unsupervised anomaly detection that is both efficient and accurate. However, long-term time series data typically encompass a multitude of intricate temporal pattern variations and noise. Consequently, accurately capturing anomalous patterns within such data and establishing precise and rapid anomaly detection models pose challenging problems. In this paper, we propose a decomposition GAN-based transformer for anomaly detection (DGTAD) in multivariate time series data. Specifically, DGTAD integrates a time series decomposition structure into the original transformer model, further decomposing the extracted global features into deep trend information and seasonal information. On this basis, we improve the attention mechanism, which uses decomposed time-dependent features to change the traditional focus of the transformer, enabling the model to reconstruct anomalies of different types in a targeted manner. This makes it difficult for anomalous data to adapt to these changes, thereby amplifying the anomalous features. Finally, by combining the GAN structure and using multiple generators from different perspectives, we alleviate the mode collapse issue, thereby enhancing the model’s generalizability. DGTAD has been validated on nine benchmark datasets, demonstrating significant performance improvements and thus proving its effectiveness in unsupervised anomaly detection.
计算机和信息产业的发展对多变量时间序列异常检测(MTSAD)模型提出了新的要求,即需要高效、准确的无监督异常检测。然而,长期时间序列数据通常包含大量错综复杂的时间模式变化和噪声。因此,准确捕捉此类数据中的异常模式并建立精确、快速的异常检测模型是一个具有挑战性的问题。本文提出了一种基于分解 GAN 的异常检测转换器(DGTAD)。具体来说,DGTAD 将时间序列分解结构整合到原始变换器模型中,进一步将提取的全局特征分解为深度趋势信息和季节信息。在此基础上,我们改进了关注机制,利用分解后的随时间变化的特征来改变变换器的传统关注点,使模型能够有针对性地重建不同类型的异常现象。这使得异常数据难以适应这些变化,从而放大了异常特征。最后,通过结合 GAN 结构和使用来自不同角度的多个生成器,我们缓解了模式崩溃问题,从而增强了模型的普适性。DGTAD 已在九个基准数据集上进行了验证,显示出显著的性能改进,从而证明了它在无监督异常检测中的有效性。
{"title":"DGTAD: decomposition GAN-based transformer for anomaly detection in multivariate time series data","authors":"Zixin Chen, Jiong Yu, Qiyin Tan, Shu Li, XuSheng Du","doi":"10.1007/s10489-024-05693-7","DOIUrl":"10.1007/s10489-024-05693-7","url":null,"abstract":"<div><p>The advancement of the computer and information industry has led to the emergence of new demands for multivariate time series anomaly detection (MTSAD) models, namely, the necessity for unsupervised anomaly detection that is both efficient and accurate. However, long-term time series data typically encompass a multitude of intricate temporal pattern variations and noise. Consequently, accurately capturing anomalous patterns within such data and establishing precise and rapid anomaly detection models pose challenging problems. In this paper, we propose a decomposition GAN-based transformer for anomaly detection (DGTAD) in multivariate time series data. Specifically, DGTAD integrates a time series decomposition structure into the original transformer model, further decomposing the extracted global features into deep trend information and seasonal information. On this basis, we improve the attention mechanism, which uses decomposed time-dependent features to change the traditional focus of the transformer, enabling the model to reconstruct anomalies of different types in a targeted manner. This makes it difficult for anomalous data to adapt to these changes, thereby amplifying the anomalous features. Finally, by combining the GAN structure and using multiple generators from different perspectives, we alleviate the mode collapse issue, thereby enhancing the model’s generalizability. DGTAD has been validated on nine benchmark datasets, demonstrating significant performance improvements and thus proving its effectiveness in unsupervised anomaly detection.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13038 - 13056"},"PeriodicalIF":3.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic classification of remote sensing images using machine learning techniques is challenging due to the complex features of the images. The images are characterized by features such as multi-resolution, heterogeneous appearance and multi-spectral channels. Deep learning methods have achieved promising results in the analysis of remote sensing satellite images in the recent past. However, deep learning methods based on convolutional neural networks (CNN) experience difficulties in the analysis of intrinsic objects from satellite images. These techniques have not achieved optimum performance in the analysis of remote sensing satellite images due to their complex features, such as coarse resolution, cloud masking, varied sizes of embedded objects and appearance. The receptive fields in convolutional operations are not able to establish long-range dependencies and lack global contextual connectivity for effective feature extraction. To address this problem, we propose an improved deep learning-based vision transformer model for the efficient analysis of remote sensing images. The proposed model incorporates a multi-head local self-attention mechanism with patch shifting procedure to provide both local and global context for effective extraction of multi-scale and multi-resolution spatial features of remote sensing images. The proposed model is also enhanced by fine-tuning the hyper-parameters by introducing dropout modules and a decay linear learning rate scheduler. This approach leverages local self-attention for learning and extraction of the complex features in satellite images. Four distinct remote sensing image datasets, namely RSSCN, EuroSat, UC Merced (UCM) and SIRI-WHU, were subjected to experiments and analysis. The results show some improvement in the proposed vision transformer on the CNN-based methods.
{"title":"Automated classification of remote sensing satellite images using deep learning based vision transformer","authors":"Adekanmi Adegun, Serestina Viriri, Jules-Raymond Tapamo","doi":"10.1007/s10489-024-05818-y","DOIUrl":"10.1007/s10489-024-05818-y","url":null,"abstract":"<div><p>Automatic classification of remote sensing images using machine learning techniques is challenging due to the complex features of the images. The images are characterized by features such as multi-resolution, heterogeneous appearance and multi-spectral channels. Deep learning methods have achieved promising results in the analysis of remote sensing satellite images in the recent past. However, deep learning methods based on convolutional neural networks (CNN) experience difficulties in the analysis of intrinsic objects from satellite images. These techniques have not achieved optimum performance in the analysis of remote sensing satellite images due to their complex features, such as coarse resolution, cloud masking, varied sizes of embedded objects and appearance. The receptive fields in convolutional operations are not able to establish long-range dependencies and lack global contextual connectivity for effective feature extraction. To address this problem, we propose an improved deep learning-based vision transformer model for the efficient analysis of remote sensing images. The proposed model incorporates a multi-head local self-attention mechanism with patch shifting procedure to provide both local and global context for effective extraction of multi-scale and multi-resolution spatial features of remote sensing images. The proposed model is also enhanced by fine-tuning the hyper-parameters by introducing dropout modules and a decay linear learning rate scheduler. This approach leverages local self-attention for learning and extraction of the complex features in satellite images. Four distinct remote sensing image datasets, namely RSSCN, EuroSat, UC Merced (UCM) and SIRI-WHU, were subjected to experiments and analysis. The results show some improvement in the proposed vision transformer on the CNN-based methods.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"54 24","pages":"13018 - 13037"},"PeriodicalIF":3.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10489-024-05818-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}