Pub Date : 2025-02-18DOI: 10.1016/j.eswa.2025.126843
Chang Niu , Zilong Zhang
Regular borescope inspection of aero-engine blades is crucial to ensure the safe operation of the aero-engine. To address the problem of unavailable defective blade images, this paper focuses on the intelligent borescope inspection method based on anomaly detection. Previous anomaly detection methods rely on the features pre-trained on the natural images. Since there is a large domain gap between natural images and blade images, the discriminativeness of pre-trained features is suboptimal. To alleviate this problem, current methods adapt the pre-trained features based on the prior assumption of the class number of normal data. In real scenarios, since the class number of normal data is commonly unknown, previous adaptation methods fail in some cases. In this paper, we propose a class-agnostic feature adaptation method () to solve the above problem. The key insight is to utilize the neighbor relationship of each pre-trained feature to adaptively cluster towards the center of the k nearest neighbor samples. We conduct the experiment under multiple known classes. The results show that achieves a consistent improvement under different class numbers of normal data. The engineering experiment on anomaly detection of aero-engine blades shows a decent anomaly detection performance of . Code and dataset are available at https://github.com/changniu54/CA2.
{"title":"Class-agnostic adaptive feature adaptation method for anomaly detection of aero-engine blade","authors":"Chang Niu , Zilong Zhang","doi":"10.1016/j.eswa.2025.126843","DOIUrl":"10.1016/j.eswa.2025.126843","url":null,"abstract":"<div><div>Regular borescope inspection of aero-engine blades is crucial to ensure the safe operation of the aero-engine. To address the problem of unavailable defective blade images, this paper focuses on the intelligent borescope inspection method based on anomaly detection. Previous anomaly detection methods rely on the features pre-trained on the natural images. Since there is a large domain gap between natural images and blade images, the discriminativeness of pre-trained features is suboptimal. To alleviate this problem, current methods adapt the pre-trained features based on the prior assumption of the class number of normal data. In real scenarios, since the class number of normal data is commonly unknown, previous adaptation methods fail in some cases. In this paper, we propose a class-agnostic feature adaptation method (<span><math><msup><mrow><mi>CA</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>) to solve the above problem. The key insight is to utilize the neighbor relationship of each pre-trained feature to adaptively cluster towards the center of the <em>k</em> nearest neighbor samples. We conduct the experiment under multiple known classes. The results show that <span><math><msup><mrow><mi>CA</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> achieves a consistent improvement under different class numbers of normal data. The engineering experiment on anomaly detection of aero-engine blades shows a decent anomaly detection performance of <span><math><msup><mrow><mi>CA</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>. Code and dataset are available at <span><span>https://github.com/changniu54/CA<sup>2</sup></span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126843"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-18DOI: 10.1016/j.eswa.2025.126870
Xiaofan Li , Bo Peng , Yuan Yao , Guangchao Zhang , Zhuyang Xie , Muhammad Usman Saleem
Recent advancements in facial image-based prediction for difficult airway assessment show significant clinical promise. However, existing methods often struggle to accurately distinguish subtle facial features, contend with limited label information, and address the uncertainty in correlating facial features with airway difficulty. In this study, we propose a Reliable Multimodal Prototypical Contrastive Learning Network (RMP-Net) for difficult airway assessment, which aims to overcome these challenges. RMP-Net integrates multiple modalities, including facial images processed by a Convolutional Neural Network (CNN) and keypoint graphs processed by a Graph Convolutional Network (GCN). In addition to the commonly used image modality, we innovatively build a graph based on the keypoints modality for prediction. It not only captures comprehensive facial information but also targets critical anatomical features, enhancing feature representation and model interpretability. During the training, features extracted from laryngoscopic images serve as a priori prototype, which are further aligned with facial image and keypoint features for a clearer feature representation. Importantly, the laryngoscopic modality is used exclusively during training since it is obtained intraoperatively. This ensures that RMP-Net remains a preoperative prediction method while leveraging detailed anatomical insights during learning. Furthermore, we introduce a uncertainty learning process to validate the correlation between facial features and airway difficulty, improving the model’s robustness by focusing on reliable data. We construct a comprehensive multi-modal dataset, including facial images, laryngoscopic images, and facial keypoints. Five-fold cross-validation experiments demonstrate that RMP-Net achieves significant improvements in diagnostic AUC, sensitivity, and specificity compared to traditional and state-of-the-art (SoTA) methods. The code for this study is available at https://github.com/a6177738/RMP-Net.
{"title":"Reliable multi-modal prototypical contrastive learning for difficult airway assessment","authors":"Xiaofan Li , Bo Peng , Yuan Yao , Guangchao Zhang , Zhuyang Xie , Muhammad Usman Saleem","doi":"10.1016/j.eswa.2025.126870","DOIUrl":"10.1016/j.eswa.2025.126870","url":null,"abstract":"<div><div>Recent advancements in facial image-based prediction for difficult airway assessment show significant clinical promise. However, existing methods often struggle to accurately distinguish subtle facial features, contend with limited label information, and address the uncertainty in correlating facial features with airway difficulty. In this study, we propose a Reliable Multimodal Prototypical Contrastive Learning Network (RMP-Net) for difficult airway assessment, which aims to overcome these challenges. RMP-Net integrates multiple modalities, including facial images processed by a Convolutional Neural Network (CNN) and keypoint graphs processed by a Graph Convolutional Network (GCN). In addition to the commonly used image modality, we innovatively build a graph based on the keypoints modality for prediction. It not only captures comprehensive facial information but also targets critical anatomical features, enhancing feature representation and model interpretability. During the training, features extracted from laryngoscopic images serve as a priori prototype, which are further aligned with facial image and keypoint features for a clearer feature representation. Importantly, the laryngoscopic modality is used exclusively during training since it is obtained intraoperatively. This ensures that RMP-Net remains a preoperative prediction method while leveraging detailed anatomical insights during learning. Furthermore, we introduce a uncertainty learning process to validate the correlation between facial features and airway difficulty, improving the model’s robustness by focusing on reliable data. We construct a comprehensive multi-modal dataset, including facial images, laryngoscopic images, and facial keypoints. Five-fold cross-validation experiments demonstrate that RMP-Net achieves significant improvements in diagnostic AUC, sensitivity, and specificity compared to traditional and state-of-the-art (SoTA) methods. The code for this study is available at <span><span>https://github.com/a6177738/RMP-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126870"},"PeriodicalIF":7.5,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143446060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126823
Jacob Regan, Mahdi Khodayar
Few-shot remote sensing scene classification (FS-RSSC) is an essential task within remote sensing (RS) and aims to develop models that can quickly and accurately adapt to new aerial scene categories provided only a few labeled examples of the novel scenes. Convolutional neural network (CNN)-based methods have demonstrated decent performance for remote sensing scene classification (RSSC) and FS-RSSC, but they cannot handle irregular patterns well. Vision Transformer (ViT) does not suffer from this drawback, but its large data dependency makes it less viable for few-shot learning. To alleviate these weaknesses, we propose a novel end-to-end, fully graph-based framework for FS-RSSC called the fully graph relational matching network (FGRMNet). This framework consists of three principle components: (1) a deep graph neural network (GNN) embedding network comprised of dynamic GCN layers to extract long-range and irregular patterns from aerial scene samples. Unlike CNN, our GNN has a dynamic receptive field allowing it to extract richer, relational connections from object features. (2) A graph contrastive matching module (GCM) consisting of a local–global and global-global contrastive learning objective to improve the robustness and generalization of the embedding network for graph similarity learning by improving how the GNN encoder adapts its receptive field between latent layers. (3) A graph relational attention (GRAT) module, which consists of a graph attention network that learns to measure the similarity between the global graph representations of a query and the support samples by incorporating high-level node information with global graph context in the relational learning step. More precisely, the GRAT module improves the quality of the relational scores by assigning higher value to the parts of a query’s node embeddings most relevant to the comparison between the global representation of the query and the global representation of the support class. Extensive experimentation conducted for FGRMNet on three popular RS datasets demonstrates that our framework achieves state-of-the-art performance.
{"title":"FGRMNet: Fully graph relational matching network for few-shot remote sensing scene classification","authors":"Jacob Regan, Mahdi Khodayar","doi":"10.1016/j.eswa.2025.126823","DOIUrl":"10.1016/j.eswa.2025.126823","url":null,"abstract":"<div><div>Few-shot remote sensing scene classification (FS-RSSC) is an essential task within remote sensing (RS) and aims to develop models that can quickly and accurately adapt to new aerial scene categories provided only a few labeled examples of the novel scenes. Convolutional neural network (CNN)-based methods have demonstrated decent performance for remote sensing scene classification (RSSC) and FS-RSSC, but they cannot handle irregular patterns well. Vision Transformer (ViT) does not suffer from this drawback, but its large data dependency makes it less viable for few-shot learning. To alleviate these weaknesses, we propose a novel end-to-end, fully graph-based framework for FS-RSSC called the fully graph relational matching network (FGRMNet). This framework consists of three principle components: (1) a deep graph neural network (GNN) embedding network comprised of dynamic GCN layers to extract long-range and irregular patterns from aerial scene samples. Unlike CNN, our GNN has a dynamic receptive field allowing it to extract richer, relational connections from object features. (2) A graph contrastive matching module (GCM) consisting of a local–global and global-global contrastive learning objective to improve the robustness and generalization of the embedding network for graph similarity learning by improving how the GNN encoder adapts its receptive field between latent layers. (3) A graph relational attention (GRAT) module, which consists of a graph attention network that learns to measure the similarity between the global graph representations of a query and the support samples by incorporating high-level node information with global graph context in the relational learning step. More precisely, the GRAT module improves the quality of the relational scores by assigning higher value to the parts of a query’s node embeddings most relevant to the comparison between the global representation of the query and the global representation of the support class. Extensive experimentation conducted for FGRMNet on three popular RS datasets demonstrates that our framework achieves state-of-the-art performance.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"274 ","pages":"Article 126823"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143474265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126844
Runxin Li , Xiong Yang , Xiaowu Li , Guofeng Shu , Lianyin Jia , Zhenhong Shang
The goal of multi-label feature selection is to identify the most representative features from the original feature grouping so as to effectively mitigate the dimension disaster. Currently, most multi-label feature selection methods based on sparse regression models train the feature weight matrices by directly projecting the original feature space into the label space; however, these direct projection methods may fail to capture the key nonlinear relationships between features and labels. To address this challenge, we propose a novel multi-label feature selection technique named semi-supervised multi-label feature selection combining nonlinear manifold structure and minimizing group sparse redundant correlation (NMS-MGSRC). First, we reconstruct the Hilbert–Schmidt Independence Criterion (HSIC)-based MDDM model (Y. Zhang and Z.-H. Zhou, 2010) into a least-squares problem, allowing the weight matrices to effectively learn the nonlinear correlations between features and labels by generating a nonlinear feature-label manifold structure. Second, we group the feature-label manifold using an adaptive -means clustering method based on the evaluation of silhouette coefficients, and impose - and -norm constraints on the weight matrices to discriminate group-specific and common features in each feature-label manifold group. Finally, the nonlinear feature-label manifold learning term, group sparse feature learning constraints, semi-supervised learning mechanism, feature-label manifold correlation, and instance correlation learning constraint are integrated into a unified framework that includes two binary -norm regularity terms, and we introduce a novel optimization algorithm, S-FISTA, to optimize it. Comparative results on 13 multi-label datasets with six evaluation metrics demonstrate that NMS-MGSRC significantly outperforms 13 representative feature selection algorithms.
{"title":"Semi-supervised multi-label feature selection combining nonlinear manifold structure and minimizing group sparse redundant correlation","authors":"Runxin Li , Xiong Yang , Xiaowu Li , Guofeng Shu , Lianyin Jia , Zhenhong Shang","doi":"10.1016/j.eswa.2025.126844","DOIUrl":"10.1016/j.eswa.2025.126844","url":null,"abstract":"<div><div>The goal of multi-label feature selection is to identify the most representative features from the original feature grouping so as to effectively mitigate the dimension disaster. Currently, most multi-label feature selection methods based on sparse regression models train the feature weight matrices by directly projecting the original feature space into the label space; however, these direct projection methods may fail to capture the key nonlinear relationships between features and labels. To address this challenge, we propose a novel multi-label feature selection technique named semi-supervised multi-label feature selection combining nonlinear manifold structure and minimizing group sparse redundant correlation (NMS-MGSRC). First, we reconstruct the Hilbert–Schmidt Independence Criterion (HSIC)-based MDDM model (Y. Zhang and Z.-H. Zhou, 2010) into a least-squares problem, allowing the weight matrices to effectively learn the nonlinear correlations between features and labels by generating a nonlinear feature-label manifold structure. Second, we group the feature-label manifold using an adaptive <span><math><mi>k</mi></math></span>-means clustering method based on the evaluation of silhouette coefficients, and impose <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>- and <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>2</mn><mo>,</mo><mn>1</mn></mrow></msub></math></span>-norm constraints on the weight matrices to discriminate group-specific and common features in each feature-label manifold group. Finally, the nonlinear feature-label manifold learning term, group sparse feature learning constraints, semi-supervised learning mechanism, feature-label manifold correlation, and instance correlation learning constraint are integrated into a unified framework that includes two binary <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span>-norm regularity terms, and we introduce a novel optimization algorithm, S-FISTA, to optimize it. Comparative results on 13 multi-label datasets with six evaluation metrics demonstrate that NMS-MGSRC significantly outperforms 13 representative feature selection algorithms.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126844"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126953
Haotian Hu , Alex Jie Yang , Sanhong Deng , Dongbo Wang , Min Song
Current state-of-the-art drug–drug interaction (DDI) triplet extraction methods not only fails to exhaustively capture potential overlapping entity relations but also grapples to extract discontinuous drug entities, leading to suboptimal performance in DDI triplet extraction. To address these challenges, we proposed a Chain-of-Thought Enhanced Large Language Model for DDI Triplet Extraction (CoTEL-D3X). Based on the transformer architecture, we designed joint and pipeline methods that can perform end-to-end DDI triplet extraction in a generative manner. Our proposed approach builds upon the novel LLaMA series model as the foundation model and incorporates instruction tuning and Chain-of-Thought techniques to enhance the model’s understanding of task requirements and reasoning capabilities. We validated the effectiveness of our methods on the widely-used DDI dataset, which comprises 1025 documents containing 17,805 entity mentions and 4,999 DDIs. Our joint and pipeline methods not only outperformed mainstream generative models, such as ChatGPT, GPT-3, and OPT, on the DDI Extraction 2013 dataset but also improved the current corresponding best F1-score by 9.75% and 5.86%, respectively. Particularly, compared to the currently most advanced few-shot learning methods, our approach achieved more than a two-fold improvement in F1-score. We further validated the method’s transferability and generalization performance on the TAC 2018 DDI Extraction and ADR Extraction datasets, and assessed its applicability on real-world data from DrugBank. Performance analysis of the proposed method revealed that the CoT component significantly enhanced the extraction effect. The introduction of generative LLMs allows us to freely define the content and format of inputs and outputs, offering superior usability and flexibility compared to traditional extraction methods based on sequence labeling. Furthermore, as our proposed approach does not rely on external knowledge or manually defined rules, it may lack domain-specific knowledge to some extent. However, it can easily be adapted to other domains.
{"title":"CoTEL-D3X: A chain-of-thought enhanced large language model for drug–drug interaction triplet extraction","authors":"Haotian Hu , Alex Jie Yang , Sanhong Deng , Dongbo Wang , Min Song","doi":"10.1016/j.eswa.2025.126953","DOIUrl":"10.1016/j.eswa.2025.126953","url":null,"abstract":"<div><div>Current state-of-the-art drug–drug interaction (DDI) triplet extraction methods not only fails to exhaustively capture potential overlapping entity relations but also grapples to extract discontinuous drug entities, leading to suboptimal performance in DDI triplet extraction. To address these challenges, we proposed a Chain-of-Thought Enhanced Large Language Model for DDI Triplet Extraction (CoTEL-D3X). Based on the transformer architecture, we designed joint and pipeline methods that can perform end-to-end DDI triplet extraction in a generative manner. Our proposed approach builds upon the novel LLaMA series model as the foundation model and incorporates instruction tuning and Chain-of-Thought techniques to enhance the model’s understanding of task requirements and reasoning capabilities. We validated the effectiveness of our methods on the widely-used DDI dataset, which comprises 1025 documents containing 17,805 entity mentions and 4,999 DDIs. Our joint and pipeline methods not only outperformed mainstream generative models, such as ChatGPT, GPT-3, and OPT, on the DDI Extraction 2013 dataset but also improved the current corresponding best F1-score by 9.75% and 5.86%, respectively. Particularly, compared to the currently most advanced few-shot learning methods, our approach achieved more than a two-fold improvement in F1-score. We further validated the method’s transferability and generalization performance on the TAC 2018 DDI Extraction and ADR Extraction datasets, and assessed its applicability on real-world data from DrugBank. Performance analysis of the proposed method revealed that the CoT component significantly enhanced the extraction effect. The introduction of generative LLMs allows us to freely define the content and format of inputs and outputs, offering superior usability and flexibility compared to traditional extraction methods based on sequence labeling. Furthermore, as our proposed approach does not rely on external knowledge or manually defined rules, it may lack domain-specific knowledge to some extent. However, it can easily be adapted to other domains.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126953"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143464697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126820
Wilayat Khan , Taimur Hassan , Mobeen Ur Rehman , Mohammad Alsaffar , Irfan Hussain
Accurate identification of aquatic defects is paramount for ensuring the safety of marine life within aquaculture environment. However, due to large disparities between photographic and underwater imagery, conventional deep learning models, employed to monitor aquatic defects, produces inadequate recognition performance. Furthermore, they require extensive amount of ground truth supervision on large-scale datasets which limits their scalability in the real-world. To overcome these issues, this paper proposes a novel convolutional transformer architecture that combines multi-scale convolutional feature representations with the attentional projections to robustly recognize aquatic defects from the underwater imagery irrespective of the background clutter, color distortion and scanner specifications. Moreover, unlike the conventional fully supervised methods, the proposed model leverages self-supervision through its prior-learned experiences to perform the aquatic defects extraction tasks across different datasets without incurring additional ground truth labeling and re-training costs. The proposed model consistently outperforms state-of-the-art methods by achieving superior mean average precision scores of 0.72, 0.74, 0.80, and 0.82 across NDv1, NDv2, LABUST, and KU datasets, respectively. These results reflect the effectiveness of the proposed approach in accurately identifying and delineating aquaculture defects across diverse underwater environments.
{"title":"Multiscale convolutional transformer for robust detection of aquaculture defects","authors":"Wilayat Khan , Taimur Hassan , Mobeen Ur Rehman , Mohammad Alsaffar , Irfan Hussain","doi":"10.1016/j.eswa.2025.126820","DOIUrl":"10.1016/j.eswa.2025.126820","url":null,"abstract":"<div><div>Accurate identification of aquatic defects is paramount for ensuring the safety of marine life within aquaculture environment. However, due to large disparities between photographic and underwater imagery, conventional deep learning models, employed to monitor aquatic defects, produces inadequate recognition performance. Furthermore, they require extensive amount of ground truth supervision on large-scale datasets which limits their scalability in the real-world. To overcome these issues, this paper proposes a novel convolutional transformer architecture that combines multi-scale convolutional feature representations with the attentional projections to robustly recognize aquatic defects from the underwater imagery irrespective of the background clutter, color distortion and scanner specifications. Moreover, unlike the conventional fully supervised methods, the proposed model leverages self-supervision through its prior-learned experiences to perform the aquatic defects extraction tasks across different datasets without incurring additional ground truth labeling and re-training costs. The proposed model consistently outperforms state-of-the-art methods by achieving superior mean average precision scores of 0.72, 0.74, 0.80, and 0.82 across NDv1, NDv2, LABUST, and KU datasets, respectively. These results reflect the effectiveness of the proposed approach in accurately identifying and delineating aquaculture defects across diverse underwater environments.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126820"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126842
Xue Li , Xinghong Ling
Offline reinforcement learning (RL) agents seek optimal policies from fixed datasets. Policy constraints are typically employed to adhere to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions. Conventional approaches apply identical constraints for both value learning and test time inference. However, the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a mild evaluation policy via dataset constraint (MEDC) for test time inference with a more constrained target policy for value estimation. MEDC introduces a dual-policy constraint, comprising a target policy and an evaluation policy. The evaluation policy regularize the policy towards the nearest state–action pair, with behavior cloning performed on the target policy. The distributional shift is effectively addressed through the combination of dataset constraint and behavior cloning. The TD3 is employed to direct the policy in selecting actions that maximize the return. Moreover, MEDC achieves state-of-the-art performance compared with existing methods on the D4RL datasets.
{"title":"Mild evaluation policy via dataset constraint for offline reinforcement learning","authors":"Xue Li , Xinghong Ling","doi":"10.1016/j.eswa.2025.126842","DOIUrl":"10.1016/j.eswa.2025.126842","url":null,"abstract":"<div><div>Offline reinforcement learning (RL) agents seek optimal policies from fixed datasets. Policy constraints are typically employed to adhere to the behavior policy, thereby stabilizing value learning and mitigating the selection of out-of-distribution (OOD) actions. Conventional approaches apply identical constraints for both value learning and test time inference. However, the constraints suitable for value estimation may in fact be excessively restrictive for action selection during test time. To address this issue, we propose a mild evaluation policy via dataset constraint (MEDC) for test time inference with a more constrained target policy for value estimation. MEDC introduces a dual-policy constraint, comprising a target policy and an evaluation policy. The evaluation policy regularize the policy towards the nearest state–action pair, with behavior cloning performed on the target policy. The distributional shift is effectively addressed through the combination of dataset constraint and behavior cloning. The TD3 is employed to direct the policy in selecting actions that maximize the return. Moreover, MEDC achieves state-of-the-art performance compared with existing methods on the D4RL datasets.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"274 ","pages":"Article 126842"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126834
Szymon Niemczyk , Michal Witold Przewozniczek , Piotr Dziurzanski
A thorough analysis of the features of real-world optimization problems is advantageous for many reasons. It helps in choosing an appropriate optimizer to solve the problem at hand. It allows to propose problem-dedicated mechanisms that improve the optimizer’s effectiveness and efficiency. Finally, it allows us to check if and which problems share similar features. In this work, we consider the multi-objective NP-hard production planning problem. We identify that its instances are characterized by the low-sized Pareto fronts, i.e., the best-known Pareto fronts for the instances of this problem are of low size when compared to problem size. To handle such problems, we propose two mechanisms. The first is the Pareto Front Improvement (PFI) phase, which joins objective space-based population clusterization, variable dependency utilization, mating restrictions, and elitism. The second is the solution comparing (CS) procedure that joins the dominance relation, crowding distance, and scalarization using weight vectors. These mechanisms are introduced into the Multi-Objective Parameter-less Population Pyramid (MO-P3), the state-of-the-art optimizer dedicated to multi-objective optimization in binary domains to propose MO-P3-SC-PFI. The experiments show that for the considered real-world problem, MO-P3-SC-PFI is highly competitive with other state-of-the-art optimizers. Additionally, we show that it is effective in solving typical benchmarks. Its advantage increases with the decrease of the ratio between the PF size and problem size.
{"title":"Pareto Front Improvements Phase using linkage learning and mating restrictions for solving multi-objective industrial process planning problems with low-sized pareto fronts","authors":"Szymon Niemczyk , Michal Witold Przewozniczek , Piotr Dziurzanski","doi":"10.1016/j.eswa.2025.126834","DOIUrl":"10.1016/j.eswa.2025.126834","url":null,"abstract":"<div><div>A thorough analysis of the features of real-world optimization problems is advantageous for many reasons. It helps in choosing an appropriate optimizer to solve the problem at hand. It allows to propose problem-dedicated mechanisms that improve the optimizer’s effectiveness and efficiency. Finally, it allows us to check if and which problems share similar features. In this work, we consider the multi-objective NP-hard production planning problem. We identify that its instances are characterized by the low-sized Pareto fronts, i.e., the best-known Pareto fronts for the instances of this problem are of low size when compared to problem size. To handle such problems, we propose two mechanisms. The first is the Pareto Front Improvement (PFI) phase, which joins objective space-based population clusterization, variable dependency utilization, mating restrictions, and elitism. The second is the solution comparing (CS) procedure that joins the dominance relation, crowding distance, and scalarization using weight vectors. These mechanisms are introduced into the Multi-Objective Parameter-less Population Pyramid (MO-P3), the state-of-the-art optimizer dedicated to multi-objective optimization in binary domains to propose MO-P3-SC-PFI. The experiments show that for the considered real-world problem, MO-P3-SC-PFI is highly competitive with other state-of-the-art optimizers. Additionally, we show that it is effective in solving typical benchmarks. Its advantage increases with the decrease of the ratio between the PF size and problem size.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126834"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1016/j.eswa.2025.126846
Yukuan Zhang , Shengsheng Wang , Zihao Fu , Limin Zhao , Jiarui Zhao
The key to Multi-Object Tracking is to differentiate multiple instances in a video sequence and maintain their identity continuity. To achieve this goal, most methods model the motion or appearance cues of instances. However, when faced with complex scenarios like camera motion, occlusion, and crowding, trackers often lack discriminative capabilities. In this paper, we propose a robust tracker, named RccTrack, that combines motion cues guided by pseudo information and enhanced visual clues to overcome the aforementioned issues. Specifically, pseudo-observation information is constructed for guiding trajectory localization and generate interference-resistant trajectories. Pseudo-state information is constructed for guiding the calculation of inter-frame target motion directions. These pseudo-information is used to enhance the discriminative power of the motion cues. For visual cues, a semantic fusion network is designed to extract strong discriminative appearance information and store them in our hierarchical fusion embedding clusters, thus enhancing the discriminative power of the visual cues. In addition, we design the cascade matching method, which performs the association task based on the trajectory length information to distinguish confusing targets. In the matching stage, the two cues mentioned above are combined to enhance the discriminative power of the tracker. Experimental results demonstrate that RccTrack achieves state-of-the-art performance on MOT16, MOT17, MOT20, and DanceTrack benchmarks.
{"title":"Robust Multi-Object Tracking with pseudo-information guided motion and enhanced semantic vision","authors":"Yukuan Zhang , Shengsheng Wang , Zihao Fu , Limin Zhao , Jiarui Zhao","doi":"10.1016/j.eswa.2025.126846","DOIUrl":"10.1016/j.eswa.2025.126846","url":null,"abstract":"<div><div>The key to Multi-Object Tracking is to differentiate multiple instances in a video sequence and maintain their identity continuity. To achieve this goal, most methods model the motion or appearance cues of instances. However, when faced with complex scenarios like camera motion, occlusion, and crowding, trackers often lack discriminative capabilities. In this paper, we propose a robust tracker, named RccTrack, that combines motion cues guided by pseudo information and enhanced visual clues to overcome the aforementioned issues. Specifically, pseudo-observation information is constructed for guiding trajectory localization and generate interference-resistant trajectories. Pseudo-state information is constructed for guiding the calculation of inter-frame target motion directions. These pseudo-information is used to enhance the discriminative power of the motion cues. For visual cues, a semantic fusion network is designed to extract strong discriminative appearance information and store them in our hierarchical fusion embedding clusters, thus enhancing the discriminative power of the visual cues. In addition, we design the cascade matching method, which performs the association task based on the trajectory length information to distinguish confusing targets. In the matching stage, the two cues mentioned above are combined to enhance the discriminative power of the tracker. Experimental results demonstrate that RccTrack achieves state-of-the-art performance on MOT16, MOT17, MOT20, and DanceTrack benchmarks.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126846"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Portfolio management (PM) involves the ongoing redistribution of funds among various financial products, aiming to seek a balance between returns and risks. In this paper, we propose SDELP-DDPG, a novel approach to portfolio management that combines stochastic differential equations (SDEs) with Lévy processes and the deep deterministic policy gradient (DDPG) technique. To alleviate the challenges posed by exploration limitations and enhance the stability of DDPG, we employ SDEs driven by Lévy processes, with drift and diffusion coefficients represented by convolutional neural networks, to generate action policies. Additionally, we devise a reward function, which considers relative entropy, to guide RL agents in learning imitation policies using DDPG. Moreover, we incorporate an attention mechanism and the Ornstein–Uhlenbeck process to choose optimal actions. Our proposed algorithm is evaluated on three real-world datasets: the Dow Jones Industrial Average markets, the Energy markets and the cryptocurrency markets, and the experimental results validate the effectiveness of SDELP-DDPG compared to existing PM approaches.
{"title":"SDELP-DDPG: Stochastic differential equations with Lévy processes-driven deep deterministic policy gradient for portfolio management","authors":"Zhen Huang , Junwei Duan , Chuanlin Zhang , Wenyong Gong","doi":"10.1016/j.eswa.2025.126822","DOIUrl":"10.1016/j.eswa.2025.126822","url":null,"abstract":"<div><div>Portfolio management (PM) involves the ongoing redistribution of funds among various financial products, aiming to seek a balance between returns and risks. In this paper, we propose SDELP-DDPG, a novel approach to portfolio management that combines stochastic differential equations (SDEs) with Lévy processes and the deep deterministic policy gradient (DDPG) technique. To alleviate the challenges posed by exploration limitations and enhance the stability of DDPG, we employ SDEs driven by Lévy processes, with drift and diffusion coefficients represented by convolutional neural networks, to generate action policies. Additionally, we devise a reward function, which considers relative entropy, to guide RL agents in learning imitation policies using DDPG. Moreover, we incorporate an attention mechanism and the Ornstein–Uhlenbeck process to choose optimal actions. Our proposed algorithm is evaluated on three real-world datasets: the Dow Jones Industrial Average markets, the Energy markets and the cryptocurrency markets, and the experimental results validate the effectiveness of SDELP-DDPG compared to existing PM approaches.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"273 ","pages":"Article 126822"},"PeriodicalIF":7.5,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}