Pub Date : 2024-06-21DOI: 10.1007/s10994-024-06572-5
Yunting Zhang, Shang Li, Lin Ye, Hongli Zhang, Zhe Chen, Binxing Fang
Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT.
{"title":"Kalt: generating adversarial explainable chinese legal texts","authors":"Yunting Zhang, Shang Li, Lin Ye, Hongli Zhang, Zhe Chen, Binxing Fang","doi":"10.1007/s10994-024-06572-5","DOIUrl":"https://doi.org/10.1007/s10994-024-06572-5","url":null,"abstract":"<p>Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs), which are well-designed input samples with imperceptible perturbations. Existing methods generate AEs to evaluate the robustness of DNN-based natural language processing models. However, the AE attack performance significantly degrades in some verticals, such as law, due to overlooking essential domain knowledge. To generate explainable Chinese legal adversarial texts, we introduce legal knowledge and propose a novel black-box approach, knowledge-aware law tricker (KALT), in the framework of adversarial text generation based on word importance. Firstly, we invent a legal knowledge extraction method based on KeyBERT. The knowledge contains unique features from each category and shared features among different categories. Additionally, we design two perturbation strategies, Strengthen Similar Label and Weaken Original Label, to selectively perturb the two types of features, which can significantly reduce the classification accuracy of the target model. These two perturbation strategies can be regarded as components, which can be conveniently integrated into any perturbation method to enhance attack performance. Furthermore, we propose a strong hybrid perturbation method to introduce perturbation into the original texts. The perturbation method combines seven representative perturbation methods for Chinese. Finally, we design a formula to calculate interpretability scores, quantifying the interpretability of adversarial text generation methods. Experimental results demonstrate that KALT can effectively generate explainable Chinese legal adversarial texts that can be misclassified with high confidence and achieve excellent attack performance against the powerful Chinese BERT.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"53 32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10994-024-06549-4
Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai
State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)
{"title":"Improving interpretability via regularization of neural activation sensitivity","authors":"Ofir Moshe, Gil Fidel, Ron Bitton, Asaf Shabtai","doi":"10.1007/s10994-024-06549-4","DOIUrl":"https://doi.org/10.1007/s10994-024-06549-4","url":null,"abstract":"<p>State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"32 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10994-024-06569-0
Marco Markwald, Elena Demidova
Imbalanced graph node classification is a highly relevant and challenging problem in many real-world applications. The inherent data scarcity, a central characteristic of this task, substantially limits the performance of neural classification models driven solely by data. Given the limited instances of relevant nodes and complex graph structures, current methods fail to capture the distinct characteristics of node attributes and graph patterns within the underrepresented classes. In this article, we propose REFUEL—a novel approach for highly imbalanced node classification problems in graphs. Whereas symbolic and neural methods have complementary strengths and weaknesses when applied to such problems, REFUEL combines the power of symbolic and neural learning in a novel neural rule-extraction architecture. REFUEL captures the class semantics in the automatically extracted rule vectors. Then, REFUEL augments the graph nodes with the extracted rules vectors and adopts a Graph Attention Network-based neural node embedding, enhancing the downstream neural node representation. Our evaluation confirms the effectiveness of the proposed REFUEL approach for three real-world datasets with different minority class sizes. REFUEL achieves at least a 4% point improvement in precision on the minority classes of 1.5–2% compared to the baselines.
{"title":"REFUEL: rule extraction for imbalanced neural node classification","authors":"Marco Markwald, Elena Demidova","doi":"10.1007/s10994-024-06569-0","DOIUrl":"https://doi.org/10.1007/s10994-024-06569-0","url":null,"abstract":"<p>Imbalanced graph node classification is a highly relevant and challenging problem in many real-world applications. The inherent data scarcity, a central characteristic of this task, substantially limits the performance of neural classification models driven solely by data. Given the limited instances of relevant nodes and complex graph structures, current methods fail to capture the distinct characteristics of node attributes and graph patterns within the underrepresented classes. In this article, we propose REFUEL—a novel approach for highly imbalanced node classification problems in graphs. Whereas symbolic and neural methods have complementary strengths and weaknesses when applied to such problems, REFUEL combines the power of symbolic and neural learning in a novel neural rule-extraction architecture. REFUEL captures the class semantics in the automatically extracted rule vectors. Then, REFUEL augments the graph nodes with the extracted rules vectors and adopts a Graph Attention Network-based neural node embedding, enhancing the downstream neural node representation. Our evaluation confirms the effectiveness of the proposed REFUEL approach for three real-world datasets with different minority class sizes. REFUEL achieves at least a 4% point improvement in precision on the minority classes of 1.5–2% compared to the baselines.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"85 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10994-024-06566-3
Hanrui Wu, Yanxin Wu, Nuosi Li, Min Yang, Jia Zhang, Michael K. Ng, Jinyi Long
Cross-network node classification aims to leverage the labeled nodes from a source network to assist the learning in a target network. Existing approaches work mainly in homogeneous settings, i.e., the nodes of the source and target networks are characterized by the same features. However, in many practical applications, nodes from different networks usually have heterogeneous features. To handle this issue, in this paper, we study the cross-network node classification under heterogeneous settings, i.e., cross-network heterogeneous node classification. Specifically, we propose a new model called High-order Proximity and Relation Analysis, which studies the high-order proximity in each network and the high-order relation between nodes across the networks to obtain two kinds of features. Subsequently, these features are exploited to learn the final effective representations by introducing a feature matching mechanism and an adversarial domain adaptation. We perform extensive experiments on several real-world datasets and make comparisons with existing baseline methods. Experimental results demonstrate the effectiveness of the proposed model.
{"title":"High-order proximity and relation analysis for cross-network heterogeneous node classification","authors":"Hanrui Wu, Yanxin Wu, Nuosi Li, Min Yang, Jia Zhang, Michael K. Ng, Jinyi Long","doi":"10.1007/s10994-024-06566-3","DOIUrl":"https://doi.org/10.1007/s10994-024-06566-3","url":null,"abstract":"<p>Cross-network node classification aims to leverage the labeled nodes from a source network to assist the learning in a target network. Existing approaches work mainly in homogeneous settings, i.e., the nodes of the source and target networks are characterized by the same features. However, in many practical applications, nodes from different networks usually have heterogeneous features. To handle this issue, in this paper, we study the cross-network node classification under heterogeneous settings, i.e., cross-network heterogeneous node classification. Specifically, we propose a new model called High-order Proximity and Relation Analysis, which studies the high-order proximity in each network and the high-order relation between nodes across the networks to obtain two kinds of features. Subsequently, these features are exploited to learn the final effective representations by introducing a feature matching mechanism and an adversarial domain adaptation. We perform extensive experiments on several real-world datasets and make comparisons with existing baseline methods. Experimental results demonstrate the effectiveness of the proposed model.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"7 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Object detection models, which are widely used in various domains (such as retail), have been shown to be vulnerable to adversarial attacks. Existing methods for detecting adversarial attacks on object detectors have had difficulty detecting new real-life attacks. We present X-Detect, a novel adversarial patch detector that can: (1) detect adversarial samples in real time, allowing the defender to take preventive action; (2) provide explanations for the alerts raised to support the defender’s decision-making process, and (3) handle unfamiliar threats in the form of new attacks. Given a new scene, X-Detect uses an ensemble of explainable-by-design detectors that utilize object extraction, scene manipulation, and feature transformation techniques to determine whether an alert needs to be raised. X-Detect was evaluated in both the physical and digital space using five different attack scenarios (including adaptive attacks) and the benchmark COCO dataset and our new Superstore dataset. The physical evaluation was performed using a smart shopping cart setup in real-world settings and included 17 adversarial patch attacks recorded in 1700 adversarial videos. The results showed that X-Detect outperforms the state-of-the-art methods in distinguishing between benign and adversarial scenes for all attack scenarios while maintaining a 0% FPR (no false alarms) and providing actionable explanations for the alerts raised. A demo is available.
{"title":"X-Detect: explainable adversarial patch detection for object detectors in retail","authors":"Omer Hofman, Amit Giloni, Yarin Hayun, Ikuya Morikawa, Toshiya Shimizu, Yuval Elovici, Asaf Shabtai","doi":"10.1007/s10994-024-06548-5","DOIUrl":"https://doi.org/10.1007/s10994-024-06548-5","url":null,"abstract":"<p>Object detection models, which are widely used in various domains (such as retail), have been shown to be vulnerable to adversarial attacks. Existing methods for detecting adversarial attacks on object detectors have had difficulty detecting new real-life attacks. We present X-Detect, a novel adversarial patch detector that can: (1) detect adversarial samples in real time, allowing the defender to take preventive action; (2) provide explanations for the alerts raised to support the defender’s decision-making process, and (3) handle unfamiliar threats in the form of new attacks. Given a new scene, X-Detect uses an ensemble of explainable-by-design detectors that utilize object extraction, scene manipulation, and feature transformation techniques to determine whether an alert needs to be raised. X-Detect was evaluated in both the physical and digital space using five different attack scenarios (including adaptive attacks) and the benchmark COCO dataset and our new Superstore dataset. The physical evaluation was performed using a smart shopping cart setup in real-world settings and included 17 adversarial patch attacks recorded in 1700 adversarial videos. The results showed that X-Detect outperforms the state-of-the-art methods in distinguishing between benign and adversarial scenes for all attack scenarios while maintaining a 0% FPR (no false alarms) and providing actionable explanations for the alerts raised. A demo is available.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"22 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141506122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10994-024-06560-9
Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li
Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.
{"title":"Neighborhood relation-based incremental label propagation algorithm for partially labeled hybrid data","authors":"Wenhao Shu, Dongtao Cao, Wenbin Qian, Shipeng Li","doi":"10.1007/s10994-024-06560-9","DOIUrl":"https://doi.org/10.1007/s10994-024-06560-9","url":null,"abstract":"<p>Label propagation can rapidly predict the labels of unlabeled objects as the correct answers from a small amount of given label information, which can enhance the performance of subsequent machine learning tasks. Most existing label propagation methods are proposed for static data. However, in many applications, real datasets including multiple feature value types and massive unlabeled objects vary dynamically over time, whereas applying these label propagation methods for dynamic partially labeled hybrid data will be a huge drain due to recalculating from scratch when the data changes every time. To improve efficiency, a novel incremental label propagation algorithm based on neighborhood relation (ILPN) is developed in this paper. Specifically, we first construct graph structures by utilizing neighborhood relations to eliminate unnecessary label information. Then, a new label propagation strategy is designed in consideration of the weights assigned to each class so that it does not rely on a probabilistic transition matrix to fix the structure for propagation. On this basis, a new label propagation algorithm called neighborhood relation-based label propagation (LPN) is developed. For the dynamic partially labeled hybrid data, we integrate incremental learning into LPN and develop an updating mechanism that allows incremental label propagation over previous label propagation results and graph structures, rather than recalculating from scratch. Finally, extensive experiments on UCI datasets validate that our proposed algorithm LPN can outperform other label propagation algorithms in speed on the premise of ensuring accuracy. Especially for simulated dynamic data, the incremental algorithm ILPN is more efficient than other non-incremental methods with the variation of the partially labeled hybrid data.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"29 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1007/s10994-024-06553-8
Deliang Yang, Hou-Duo Qi
Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the “crowding phenomenon” will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE.
{"title":"Supervised maximum variance unfolding","authors":"Deliang Yang, Hou-Duo Qi","doi":"10.1007/s10994-024-06553-8","DOIUrl":"https://doi.org/10.1007/s10994-024-06553-8","url":null,"abstract":"<p>Maximum Variance Unfolding (MVU) is among the first methods in nonlinear dimensionality reduction for data visualization and classification. It aims to preserve local data structure and in the meantime push the variance among data as big as possible. However, MVU in general remains a computationally challenging problem and this may explain why it is less popular than other leading methods such as Isomap and t-SNE. In this paper, based on a key observation that the structure-preserving term in MVU is actually the squared stress in Multi-Dimensional Scaling (MDS), we replace the term with the stress function from MDS, resulting in a model that is usable. The property of the usability guarantees the “crowding phenomenon” will not happen in the dimension reduced results. The new model also allows us to combine label information and hence we call it the supervised MVU (SMVU). We then develop a fast algorithm that is based on Euclidean distance matrix optimization. By making use of the majorization-mininmization technique, the algorithm at each iteration solves a number of one-dimensional optimization problems, each having a closed-form solution. This strategy significantly speeds up the computation. We demonstrate the advantage of SMVU on some standard data sets against a few leading algorithms including Isomap and t-SNE.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"209 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141525440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-07DOI: 10.1007/s10994-024-06564-5
Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, Francisco S. Melo
We study the interplay between the data distribution and Q-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of Q-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of Q-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline Q-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).
{"title":"The impact of data distribution on Q-learning with function approximation","authors":"Pedro P. Santos, Diogo S. Carvalho, Alberto Sardinha, Francisco S. Melo","doi":"10.1007/s10994-024-06564-5","DOIUrl":"https://doi.org/10.1007/s10994-024-06564-5","url":null,"abstract":"<p>We study the interplay between the data distribution and <i>Q</i>-learning-based algorithms with function approximation. We provide a unified theoretical and empirical analysis as to how different properties of the data distribution influence the performance of <i>Q</i>-learning-based algorithms. We connect different lines of research, as well as validate and extend previous results, being primarily focused on offline settings. First, we analyze the impact of the data distribution by using optimization as a tool to better understand which data distributions yield low concentrability coefficients. We motivate high-entropy distributions from a game-theoretical point of view and propose an algorithm to find the optimal data distribution from the point of view of concentrability. Second, from an empirical perspective, we introduce a novel four-state MDP specifically tailored to highlight the impact of the data distribution in the performance of <i>Q</i>-learning-based algorithms with function approximation. Finally, we experimentally assess the impact of the data distribution properties on the performance of two offline <i>Q</i>-learning-based algorithms under different environments. Our results attest to the importance of different properties of the data distribution such as entropy, coverage, and data quality (closeness to optimal policy).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"19 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141551892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-31DOI: 10.1007/s10994-024-06559-2
Giacomo Arcieri, Cyprien Hoelzl, Oliver Schwery, Daniel Straub, Konstantinos G. Papakonstantinou, Eleni Chatzi
Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.
{"title":"POMDP inference and robust solution via deep reinforcement learning: an application to railway optimal maintenance","authors":"Giacomo Arcieri, Cyprien Hoelzl, Oliver Schwery, Daniel Straub, Konstantinos G. Papakonstantinou, Eleni Chatzi","doi":"10.1007/s10994-024-06559-2","DOIUrl":"https://doi.org/10.1007/s10994-024-06559-2","url":null,"abstract":"<p>Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"64 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-29DOI: 10.1007/s10994-024-06554-7
Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat
We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).
{"title":"Exploiting residual errors in nonlinear online prediction","authors":"Emirhan Ilhan, Ahmet B. Koc, Suleyman S. Kozat","doi":"10.1007/s10994-024-06554-7","DOIUrl":"https://doi.org/10.1007/s10994-024-06554-7","url":null,"abstract":"<p>We introduce a novel online (or sequential) nonlinear prediction approach that incorporates the residuals, i.e., prediction errors in the past observations, as additional features for the current data. Including the past error terms in an online prediction algorithm naturally improves prediction performance significantly since this information is essential for an algorithm to adjust itself based on its past errors. These terms are well exploited in many linear statistical models such as ARMA, SES, and Holts-Winters models. However, the past error terms are rarely or in a certain sense not optimally exploited in nonlinear prediction models since training them requires complex nonlinear state-space modeling. To this end, for the first time in the literature, we introduce a nonlinear prediction framework that utilizes not only the current features but also the past error terms as additional features, thereby exploiting the residual state information in the error terms, i.e., the model’s performance on the past samples. Since the new feature vectors contain error terms that change with every update, our algorithm jointly optimizes the model parameters and the feature vectors simultaneously. We achieve this by introducing new update equations that handle the effects resulting from the changes in the feature vectors in an online manner. We use soft decision trees and neural networks as the nonlinear prediction algorithms since these are the most widely used methods in highly publicized competitions. However, as we show, our methods are generic and any algorithm supporting gradient calculations can be straightforwardly used. We show through our experiments on the well-known real-life competition datasets that our method significantly outperforms the state-of-the-art. We also provide the implementation of our approach including the source code to facilitate reproducibility (https://github.com/ahmetberkerkoc/SDT-ARMA).</p>","PeriodicalId":49900,"journal":{"name":"Machine Learning","volume":"34 1","pages":""},"PeriodicalIF":7.5,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141197672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}