Pub Date : 2025-01-01Epub Date: 2022-11-11DOI: 10.1007/s00521-022-07960-5
Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.
{"title":"Difference rewards policy gradients.","authors":"Jacopo Castellini, Sam Devlin, Frans A Oliehoek, Rahul Savani","doi":"10.1007/s00521-022-07960-5","DOIUrl":"10.1007/s00521-022-07960-5","url":null,"abstract":"<p><p>Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the <i>Q</i>-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 19","pages":"13163-13186"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204931/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144530735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-11DOI: 10.1007/s00521-024-10596-2
Anna Penzkofer, Simon Schaefer, Florian Strohm, Mihai Bâce, Stefan Leutenegger, Andreas Bulling
While deep reinforcement learning (RL) agents outperform humans on an increasing number of tasks, training them requires data equivalent to decades of human gameplay. Recent hierarchical RL methods have increased sample efficiency by incorporating information inherent to the structure of the decision problem but at the cost of having to discover or use human-annotated sub-goals that guide the learning process. We show that intentions of human players, i.e. the precursor of goal-oriented decisions, can be robustly predicted from eye gaze even for the long-horizon sparse rewards task of Montezuma's Revenge-one of the most challenging RL tasks in the Atari2600 game suite. We propose Int-HRL: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our novel sub-goal extraction pipeline is fully automatic and replaces the need for manual sub-goal annotation by human experts. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to an HRL agent that is significantly more sample efficient than previous methods.
{"title":"Int-HRL: towards intention-based hierarchical reinforcement learning.","authors":"Anna Penzkofer, Simon Schaefer, Florian Strohm, Mihai Bâce, Stefan Leutenegger, Andreas Bulling","doi":"10.1007/s00521-024-10596-2","DOIUrl":"10.1007/s00521-024-10596-2","url":null,"abstract":"<p><p>While deep reinforcement learning (RL) agents outperform humans on an increasing number of tasks, training them requires data equivalent to decades of human gameplay. Recent hierarchical RL methods have increased sample efficiency by incorporating information inherent to the structure of the decision problem but at the cost of having to discover or use human-annotated sub-goals that guide the learning process. We show that intentions of human players, i.e. the precursor of goal-oriented decisions, can be robustly predicted from eye gaze even for the long-horizon sparse rewards task of Montezuma's Revenge-one of the most challenging RL tasks in the Atari2600 game suite. We propose <i>Int-HRL</i>: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze. Our novel sub-goal extraction pipeline is fully automatic and replaces the need for manual sub-goal annotation by human experts. Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to an HRL agent that is significantly more sample efficient than previous methods.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 23","pages":"18823-18834"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313806/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-01-10DOI: 10.1007/s00521-024-10954-0
Jesse van Remmerden, Maurice Kenter, Diederik M Roijers, Charalampos Andriotis, Yingqian Zhang, Zaharah Bukhsh
In this paper, we introduce multi-objective deep centralized multi-agent actor-critic (MO-DCMAC), a multi-objective reinforcement learning method for infrastructural maintenance optimization, an area traditionally dominated by single-objective reinforcement learning (RL) approaches. Previous single-objective RL methods combine multiple objectives, such as probability of collapse and cost, into a singular reward signal through reward-shaping. In contrast, MO-DCMAC can optimize a policy for multiple objectives directly, even when the utility function is nonlinear. We evaluated MO-DCMAC using two utility functions, which use probability of collapse and cost as input. The first utility function is the threshold utility, in which MO-DCMAC should minimize cost so that the probability of collapse is never above the threshold. The second is based on the failure mode, effects, and criticality analysis methodology used by asset managers to assess maintenance plans. We evaluated MO-DCMAC, with both utility functions, in multiple maintenance environments, including ones based on a case study of the historical quay walls of Amsterdam. The performance of MO-DCMAC was compared against multiple rule-based policies based on heuristics currently used for constructing maintenance plans. Our results demonstrate that MO-DCMAC outperforms traditional rule-based policies across various environments and utility functions.
{"title":"Deep multi-objective reinforcement learning for utility-based infrastructural maintenance optimization.","authors":"Jesse van Remmerden, Maurice Kenter, Diederik M Roijers, Charalampos Andriotis, Yingqian Zhang, Zaharah Bukhsh","doi":"10.1007/s00521-024-10954-0","DOIUrl":"10.1007/s00521-024-10954-0","url":null,"abstract":"<p><p>In this paper, we introduce multi-objective deep centralized multi-agent actor-critic (MO-DCMAC), a multi-objective reinforcement learning method for infrastructural maintenance optimization, an area traditionally dominated by single-objective reinforcement learning (RL) approaches. Previous single-objective RL methods combine multiple objectives, such as probability of collapse and cost, into a singular reward signal through reward-shaping. In contrast, MO-DCMAC can optimize a policy for multiple objectives directly, even when the utility function is nonlinear. We evaluated MO-DCMAC using two utility functions, which use probability of collapse and cost as input. The first utility function is the threshold utility, in which MO-DCMAC should minimize cost so that the probability of collapse is never above the threshold. The second is based on the failure mode, effects, and criticality analysis methodology used by asset managers to assess maintenance plans. We evaluated MO-DCMAC, with both utility functions, in multiple maintenance environments, including ones based on a case study of the historical quay walls of Amsterdam. The performance of MO-DCMAC was compared against multiple rule-based policies based on heuristics currently used for constructing maintenance plans. Our results demonstrate that MO-DCMAC outperforms traditional rule-based policies across various environments and utility functions.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 30","pages":"24719-24742"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12511271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-08-01DOI: 10.1007/s00521-025-11478-x
Jacob Verrey, Peter Neyroud, Lawrence Sherman, Barak Ariel
This investigation explores whether machine learning can predict recidivism while addressing societal biases. To investigate this, we obtained conviction data from the UK's Police National Computer (PNC) on 346,685 records between January 1, 2000, and February 3, 2006 (His Majesty's Inspectorate of Constabulary in Use of the Police National Computer: An inspection of the ACRO Criminal Records Office. His Majesty's Inspectorate of Constabulary, Birmingham, https://assets-hmicfrs.justiceinspectorates.gov.uk/uploads/police-national-computer-use-acro-criminal-records-office.pdf, 2017). We generate twelve machine learning models-six to forecast general recidivism, and six to forecast violent recidivism-over a 3-year period, evaluated via fivefold cross-validation. Our best-performing models outperform the existing state-of-the-arts, receiving an area under curve (AUC) score of 0.8660 and 0.8375 for general and violent recidivism, respectively. Next, we construct a fairness scale that communicates the semantic and technical trade-offs associated with debiasing a criminal justice forecasting model. We use this scale to debias our best-performing models. Results indicate both models can achieve all five fairness definitions because the metrics measuring these definitions-the statistical range of recall, precision, positive rate, and error balance between demographics-indicate that these scores are within a one percentage point difference of each other. Deployment recommendations and implications are discussed. These include recommended safeguards against false positives, an explication of how these models addressed societal biases, and a case study illustrating how these models can improve existing criminal justice practices. That is, these models may help police identify fewer people in a way less impacted by structural bias while still reducing crime. A randomized control trial is proposed to test this illustrated case study, and further directions explored.
Supplementary information: The online version contains supplementary material available at 10.1007/s00521-025-11478-x.
{"title":"A fairness scale for real-time recidivism forecasts using a national database of convicted offenders.","authors":"Jacob Verrey, Peter Neyroud, Lawrence Sherman, Barak Ariel","doi":"10.1007/s00521-025-11478-x","DOIUrl":"10.1007/s00521-025-11478-x","url":null,"abstract":"<p><p>This investigation explores whether machine learning can predict recidivism while addressing societal biases. To investigate this, we obtained conviction data from the UK's Police National Computer (PNC) on 346,685 records between January 1, 2000, and February 3, 2006 (His Majesty's Inspectorate of Constabulary in Use of the Police National Computer: An inspection of the ACRO Criminal Records Office. His Majesty's Inspectorate of Constabulary, Birmingham, https://assets-hmicfrs.justiceinspectorates.gov.uk/uploads/police-national-computer-use-acro-criminal-records-office.pdf, 2017). We generate twelve machine learning models-six to forecast general recidivism, and six to forecast violent recidivism-over a 3-year period, evaluated via fivefold cross-validation. Our best-performing models outperform the existing state-of-the-arts, receiving an area under curve (AUC) score of 0.8660 and 0.8375 for general and violent recidivism, respectively. Next, we construct a fairness scale that communicates the semantic and technical trade-offs associated with debiasing a criminal justice forecasting model. We use this scale to debias our best-performing models. Results indicate both models can achieve all five fairness definitions because the metrics measuring these definitions-the statistical range of recall, precision, positive rate, and error balance between demographics-indicate that these scores are within a one percentage point difference of each other. Deployment recommendations and implications are discussed. These include recommended safeguards against false positives, an explication of how these models addressed societal biases, and a case study illustrating how these models can improve existing criminal justice practices. That is, these models may help police identify fewer people in a way less impacted by structural bias while still reducing crime. A randomized control trial is proposed to test this illustrated case study, and further directions explored.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s00521-025-11478-x.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 26","pages":"21607-21657"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401775/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2024-12-14DOI: 10.1007/s00521-024-10674-5
Ahmad Zainul Ihsan, Said Fathalla, Stefan Sandfeld
The research in Materials Science and Engineering focuses on the design, synthesis, properties, and performance of materials. An important class of materials that is widely investigated are crystalline materials, including metals and semiconductors. Crystalline material typically contains a specific type of defect called "dislocation". This defect significantly affects various material properties, including bending strength, fracture toughness, and ductility. Researchers have devoted a significant effort in recent years to understanding dislocation behaviour through experimental characterization techniques and simulations, e.g., dislocation dynamics simulations. This paper presents how data from dislocation dynamics simulations can be modelled using semantic web technologies through annotating data with ontologies. We extend the dislocation ontology by adding missing concepts and aligning it with two other domain-related ontologies (i.e., the Elementary Multi-perspective Material Ontology and the Materials Design Ontology), allowing for efficiently representing the dislocation simulation data. Moreover, we present a real-world use case for representing the discrete dislocation dynamics data as a knowledge graph (DisLocKG) which can depict the relationship between them. We also developed a SPARQL endpoint that brings extensive flexibility for querying DisLocKG.
{"title":"Modeling dislocation dynamics data using semantic web technologies.","authors":"Ahmad Zainul Ihsan, Said Fathalla, Stefan Sandfeld","doi":"10.1007/s00521-024-10674-5","DOIUrl":"10.1007/s00521-024-10674-5","url":null,"abstract":"<p><p>The research in Materials Science and Engineering focuses on the design, synthesis, properties, and performance of materials. An important class of materials that is widely investigated are crystalline materials, including metals and semiconductors. Crystalline material typically contains a specific type of defect called \"dislocation\". This defect significantly affects various material properties, including bending strength, fracture toughness, and ductility. Researchers have devoted a significant effort in recent years to understanding dislocation behaviour through experimental characterization techniques and simulations, e.g., dislocation dynamics simulations. This paper presents how data from dislocation dynamics simulations can be modelled using semantic web technologies through annotating data with ontologies. We extend the dislocation ontology by adding missing concepts and aligning it with two other domain-related ontologies (i.e., the Elementary Multi-perspective Material Ontology and the Materials Design Ontology), allowing for efficiently representing the dislocation simulation data. Moreover, we present a real-world use case for representing the discrete dislocation dynamics data as a knowledge graph (DisLocKG) which can depict the relationship between them. We also developed a SPARQL endpoint that brings extensive flexibility for querying DisLocKG.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 18","pages":"11737-11753"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12174205/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-09-13DOI: 10.1007/s00521-025-11613-8
Xin Du, Rajesh Jena, Katayoun Farrahi, Mahesan Niranjan
Pattern recognition models, particularly neural networks, often focus on maximising classification accuracy. However, in practice, the types of errors made (misclassification between different classes) can have varying associated costs. Current methods overlook varying misclassification error types. Misclassification labels can either be available from expert knowledge or derived from semantics of textual descriptions of class labels. Exploiting such misclassification costs can have significant implications when deploying machine learning systems. Here, using five examples from image and tabular domains, we show how a deep neural architecture trained in a nested layer-wise fashion (cascade learning) in which early layers solve easier problems than later ones could exploit such hierarchical aspects of class labels. We employ a measure of performance called "severity" of errors and show how emphasis could be placed on classes that are deeper in the hierarchy, ignoring errors that arise between semantic neighbours.
Supplementary information: The online version contains supplementary material available at 10.1007/s00521-025-11613-8.
{"title":"Balancing misclassification errors in image-based inference using problem domain semantics and a nested cascade architecture.","authors":"Xin Du, Rajesh Jena, Katayoun Farrahi, Mahesan Niranjan","doi":"10.1007/s00521-025-11613-8","DOIUrl":"10.1007/s00521-025-11613-8","url":null,"abstract":"<p><p>Pattern recognition models, particularly neural networks, often focus on maximising classification accuracy. However, in practice, the types of errors made (misclassification between different classes) can have varying associated costs. Current methods overlook varying misclassification error types. Misclassification labels can either be available from expert knowledge or derived from semantics of textual descriptions of class labels. Exploiting such misclassification costs can have significant implications when deploying machine learning systems. Here, using five examples from image and tabular domains, we show how a deep neural architecture trained in a nested layer-wise fashion (cascade learning) in which early layers solve easier problems than later ones could exploit such hierarchical aspects of class labels. We employ a measure of performance called \"severity\" of errors and show how emphasis could be placed on classes that are deeper in the hierarchy, ignoring errors that arise between semantic neighbours.</p><p><strong>Supplementary information: </strong>The online version contains supplementary material available at 10.1007/s00521-025-11613-8.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 31","pages":"26021-26036"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535507/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145338090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-05-27DOI: 10.1007/s00521-025-11283-6
Merve Selcuk-Simsek, Paolo Massa, Hualin Xiao, Säm Krucker, André Csillaghy
Reconstructing images from observational data is a complex and time-consuming process, particularly in astronomy, where traditional algorithms like CLEAN require extensive computational resources and expert interpretation to distinguish genuine features from artifacts, especially without ground truth data. To address these challenges, we developed the Fourier convolutional decoder (FCD), a custom-made overcomplete autoencoder trained on simulated data with available ground truth. This enables the network to generate outputs that closely approximate expected ground truth. The model's versatility was demonstrated on both simulated and observational datasets, with a specific application to data from the spectrometer/telescope for imaging X-rays (STIX) on the solar orbiter. In the simulated environment, FCD's performance was evaluated using multiple-image reconstruction metrics, demonstrating its ability to produce accurate images with minimal artifacts. For observational data, FCD was compared with benchmark algorithms, focusing on reconstruction metrics related to Fourier components. Our evaluation found that FCD is the fastest imaging method, with runtimes on the order of milliseconds. Its computational cost is comparable to the most efficient reconstruction algorithm and 280 faster than the slowest imaging method for single-image reconstruction on a CPU. Additionally, its runtime can be reduced by an order of magnitude for multiple-image reconstruction on a GPU. FCD outperforms or matches state-of-the-art methods on simulated data, achieving a mean MS-SSIM of 0.97, LPIPS of 0.04, PSNR of 35.70 dB, a Dice coefficient of 0.83, and a Hausdorff distance of 5.08. Finally, on experimental STIX observations, FCD remains competitive with top methods despite reduced performance compared to simulated data.
{"title":"Fourier convolutional decoder: reconstructing solar flare images via deep learning.","authors":"Merve Selcuk-Simsek, Paolo Massa, Hualin Xiao, Säm Krucker, André Csillaghy","doi":"10.1007/s00521-025-11283-6","DOIUrl":"10.1007/s00521-025-11283-6","url":null,"abstract":"<p><p>Reconstructing images from observational data is a complex and time-consuming process, particularly in astronomy, where traditional algorithms like CLEAN require extensive computational resources and expert interpretation to distinguish genuine features from artifacts, especially without ground truth data. To address these challenges, we developed the Fourier convolutional decoder (FCD), a custom-made overcomplete autoencoder trained on simulated data with available ground truth. This enables the network to generate outputs that closely approximate expected ground truth. The model's versatility was demonstrated on both simulated and observational datasets, with a specific application to data from the spectrometer/telescope for imaging X-rays (STIX) on the solar orbiter. In the simulated environment, FCD's performance was evaluated using multiple-image reconstruction metrics, demonstrating its ability to produce accurate images with minimal artifacts. For observational data, FCD was compared with benchmark algorithms, focusing on reconstruction metrics related to Fourier components. Our evaluation found that FCD is the fastest imaging method, with runtimes on the order of milliseconds. Its computational cost is comparable to the most efficient reconstruction algorithm and 280 <math><mo>×</mo></math> faster than the slowest imaging method for single-image reconstruction on a CPU. Additionally, its runtime can be reduced by an order of magnitude for multiple-image reconstruction on a GPU. FCD outperforms or matches state-of-the-art methods on simulated data, achieving a mean MS-SSIM of 0.97, LPIPS of 0.04, PSNR of 35.70 dB, a Dice coefficient of 0.83, and a Hausdorff distance of 5.08. Finally, on experimental STIX observations, FCD remains competitive with top methods despite reduced performance compared to simulated data.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 20","pages":"15573-15604"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12234595/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144602115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a general method of creating an accurate neural network-based surrogate model for postprocessing a topologically optimized structure. When topology optimization results are converted into computer-aided design (CAD) files with smooth boundaries for manufacturability, finite element method (FEM) based stresses often do not agree with the topology optimized results due to changes of surface and mesh density. The conversion between topology optimization derived results and CAD files often requires postprocessing, an additional fine tuning of the geometry parameters to reconcile the change of the stress values. In this work, a feedforward, deep artificial neural network (DANN) is presented with varying architecture parameters that are found for each stress output of interest. This network is trained with the data based on a combination of Design of Experiments (DoE) models that have the geometry dimensions as inputs and stress readings under various loads as the outputs. A DANN-based surrogate model is constructed to enable fine tuning of all relevant stress performance metrics. This method of constructing an artificial network-based surrogate model minimizes the number of FEM computations required to generate an optimized, post-processed design. We present a case study of postprocessing a wind tunnel balance, a measurement device that yields the six force and moment components of a test aircraft. It needs to be designed considering multiple stress measures under combinations of the six loading conditions. Excellent performance of a neural network is presented in this paper in terms of accurate prediction of the highly nonlinear stresses under combinations of the six loads. Von Mises stress predictions are within 10% and axial force sensor stress predictions are within 2% for the final post-processed topology. The results support its usefulness for postprocessing of topology optimized structures.
{"title":"Neural network-based surrogate model in postprocessing of topology optimized structures.","authors":"Jude Thaddeus Persia, Myung Kyun Sung, Soobum Lee, Devin E Burns","doi":"10.1007/s00521-025-11039-2","DOIUrl":"https://doi.org/10.1007/s00521-025-11039-2","url":null,"abstract":"<p><p>This paper proposes a general method of creating an accurate neural network-based surrogate model for postprocessing a topologically optimized structure. When topology optimization results are converted into computer-aided design (CAD) files with smooth boundaries for manufacturability, finite element method (FEM) based stresses often do not agree with the topology optimized results due to changes of surface and mesh density. The conversion between topology optimization derived results and CAD files often requires postprocessing, an additional fine tuning of the geometry parameters to reconcile the change of the stress values. In this work, a feedforward, deep artificial neural network (DANN) is presented with varying architecture parameters that are found for each stress output of interest. This network is trained with the data based on a combination of Design of Experiments (DoE) models that have the geometry dimensions as inputs and stress readings under various loads as the outputs. A DANN-based surrogate model is constructed to enable fine tuning of all relevant stress performance metrics. This method of constructing an artificial network-based surrogate model minimizes the number of FEM computations required to generate an optimized, post-processed design. We present a case study of postprocessing a wind tunnel balance, a measurement device that yields the six force and moment components of a test aircraft. It needs to be designed considering multiple stress measures under combinations of the six loading conditions. Excellent performance of a neural network is presented in this paper in terms of accurate prediction of the highly nonlinear stresses under combinations of the six loads. Von Mises stress predictions are within 10% and axial force sensor stress predictions are within 2% for the final post-processed topology. The results support its usefulness for postprocessing of topology optimized structures.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 15","pages":"8845-8867"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12053174/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144024819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Communication is a widely used mechanism to promote cooperation in multi-agent systems. In the field of emergent communication, agents are typically trained in specific environments: cooperative, competitive or mixed-motive. Motivated by the idea that real-world settings are characterized by incomplete information and that humans face daily interactions under a wide spectrum of incentives, we aim to explore the role of emergent communication when simultaneously exploited across all these contexts. In this work, we pursue this line of research by focusing on social dilemmas. To do this, we developed an extended version of the Public Goods Game, which allows us to train independent reinforcement learning agents simultaneously in different scenarios where incentives are (mis)aligned to various extents. Additionally, agents experience uncertainty in terms of the alignment of their incentives with those of others. We equip agents with the ability to learn a communication policy and study the impact of emergent communication in the face of uncertainty among agents. Our findings show that in settings where all agents have the same level of uncertainty, communication can enhance the cooperation of the whole group. However, in cases of asymmetric uncertainty, the agents that do not face uncertainty learn to use communication to deceive and exploit their uncertain peers.
{"title":"Learning in public goods games: the effects of uncertainty and communication on cooperation.","authors":"Nicole Orzan, Erman Acar, Davide Grossi, Roxana Rădulescu","doi":"10.1007/s00521-024-10530-6","DOIUrl":"10.1007/s00521-024-10530-6","url":null,"abstract":"<p><p>Communication is a widely used mechanism to promote cooperation in multi-agent systems. In the field of emergent communication, agents are typically trained in specific environments: cooperative, competitive or mixed-motive. Motivated by the idea that real-world settings are characterized by incomplete information and that humans face daily interactions under a wide spectrum of incentives, we aim to explore the role of emergent communication when simultaneously exploited across all these contexts. In this work, we pursue this line of research by focusing on social dilemmas. To do this, we developed an extended version of the Public Goods Game, which allows us to train independent reinforcement learning agents simultaneously in different scenarios where incentives are (mis)aligned to various extents. Additionally, agents experience uncertainty in terms of the alignment of their incentives with those of others. We equip agents with the ability to learn a communication policy and study the impact of emergent communication in the face of uncertainty among agents. Our findings show that in settings where all agents have the same level of uncertainty, communication can enhance the cooperation of the whole group. However, in cases of asymmetric uncertainty, the agents that do not face uncertainty learn to use communication to deceive and exploit their uncertain peers.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":"37 23","pages":"18899-18932"},"PeriodicalIF":4.5,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12313843/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-02DOI: 10.1007/s00521-023-08681-z
Fatma M Talaat, Rana Mohamed El-Balka
The concept "Internet of Things" (IoT), which facilitates communication between linked devices, is relatively new. It refers to the next generation of the Internet. IoT supports healthcare and is essential to numerous applications for tracking medical services. By examining the pattern of observed parameters, the type of the disease can be anticipated. For people with a range of diseases, health professionals and technicians have developed an excellent system that employs commonly utilized techniques like wearable technology, wireless channels, and other remote equipment to give low-cost healthcare monitoring. Whether put in living areas or worn on the body, network-related sensors gather detailed data to evaluate the patient's physical and mental health. The main objective of this study is to examine the current e-health monitoring system using integrated systems. Automatically providing patients with a prescription based on their status is the main goal of the e-health monitoring system. The doctor can keep an eye on the patient's health without having to communicate with them. The purpose of the study is to examine how IoT technologies are applied in the medical industry and how they help to raise the bar of healthcare delivered by healthcare institutions. The study will also include the uses of IoT in the medical area, the degree to which it is used to enhance conventional practices in various health fields, and the degree to which IoT may raise the standard of healthcare services. The main contributions in this paper are as follows: (1) importing signals from wearable devices, extracting signals from non-signals, performing peak enhancement; (2) processing and analyzing the incoming signals; (3) proposing a new stress monitoring algorithm (SMA) using wearable sensors; (4) comparing between various ML algorithms; (5) the proposed stress monitoring algorithm (SMA) is composed of four main phases: (a) data acquisition phase, (b) data and signal processing phase, (c) prediction phase, and (d) model performance evaluation phase; and (6) grid search is used to find the optimal values for hyperparameters of SVM (C and gamma). From the findings, it is shown that random forest is best suited for this classification, with decision tree and XGBoost following closely behind.
{"title":"Stress monitoring using wearable sensors: IoT techniques in medical field.","authors":"Fatma M Talaat, Rana Mohamed El-Balka","doi":"10.1007/s00521-023-08681-z","DOIUrl":"10.1007/s00521-023-08681-z","url":null,"abstract":"<p><p>The concept \"Internet of Things\" (IoT), which facilitates communication between linked devices, is relatively new. It refers to the next generation of the Internet. IoT supports healthcare and is essential to numerous applications for tracking medical services. By examining the pattern of observed parameters, the type of the disease can be anticipated. For people with a range of diseases, health professionals and technicians have developed an excellent system that employs commonly utilized techniques like wearable technology, wireless channels, and other remote equipment to give low-cost healthcare monitoring. Whether put in living areas or worn on the body, network-related sensors gather detailed data to evaluate the patient's physical and mental health. The main objective of this study is to examine the current e-health monitoring system using integrated systems. Automatically providing patients with a prescription based on their status is the main goal of the e-health monitoring system. The doctor can keep an eye on the patient's health without having to communicate with them. The purpose of the study is to examine how IoT technologies are applied in the medical industry and how they help to raise the bar of healthcare delivered by healthcare institutions. The study will also include the uses of IoT in the medical area, the degree to which it is used to enhance conventional practices in various health fields, and the degree to which IoT may raise the standard of healthcare services. The main contributions in this paper are as follows: (1) importing signals from wearable devices, extracting signals from non-signals, performing peak enhancement; (2) processing and analyzing the incoming signals; (3) proposing a new stress monitoring algorithm (SMA) using wearable sensors; (4) comparing between various ML algorithms; (5) the proposed stress monitoring algorithm (SMA) is composed of four main phases: (a) data acquisition phase, (b) data and signal processing phase, (c) prediction phase, and (d) model performance evaluation phase; and (6) grid search is used to find the optimal values for hyperparameters of SVM (C and gamma). From the findings, it is shown that random forest is best suited for this classification, with decision tree and XGBoost following closely behind.</p>","PeriodicalId":49766,"journal":{"name":"Neural Computing & Applications","volume":" ","pages":"1-14"},"PeriodicalIF":6.0,"publicationDate":"2023-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237081/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9771493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}