Current methods to learn controllers for autonomous vehicles (AVs) focus on behavioural cloning. Being trained only on exact historic data, the resulting agents often generalize poorly to novel scenarios. Simulators provide the opportunity to go beyond offline datasets, but they are still treated as complicated black boxes, only used to update the global simulation state. As a result, these RL algorithms are slow, sample-inefficient, and prior-agnostic. In this work, we leverage a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers on the large-scale Waymo Open Motion Dataset. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of the environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We combine this setup with a recurrent architecture that can efficiently propagate temporal information across long simulated trajectories. This APG method allows us to learn robust, accurate, and fast policies, while only requiring widely-available expert trajectories, instead of scarce expert actions. We compare to behavioural cloning and find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
目前学习自动驾驶汽车(AV)控制器的方法主要集中在行为克隆上。由于只能在精确的历史数据基础上进行训练,由此产生的控制器对新场景的泛化能力往往很差。模拟器提供了超越离线数据集的机会,但仍被视为复杂的黑盒子,仅用于更新全局模拟状态。在这项工作中,我们利用可微分模拟器,设计了一种分析政策梯度(APG)方法,在大规模的 Waymo 开放运动数据集上训练 AV 控制器。我们提出的框架将可微分模拟器引入端到端训练循环,其中环境动力学梯度可作为有用的先验,帮助代理学习更接地气的政策。这种 APG 方法使我们能够学习稳健、准确和快速的策略,同时只需要广泛可用的专家轨迹,而不是稀缺的专家交互。我们将其与行为克隆进行了比较,发现其在性能和对动态噪声的鲁棒性方面都有显著提高,而且整体处理方式更直观,更像人类。
{"title":"Autonomous Vehicle Controllers From End-to-End Differentiable Simulation","authors":"Asen Nachkov, Danda Pani Paudel, Luc Van Gool","doi":"arxiv-2409.07965","DOIUrl":"https://doi.org/arxiv-2409.07965","url":null,"abstract":"Current methods to learn controllers for autonomous vehicles (AVs) focus on\u0000behavioural cloning. Being trained only on exact historic data, the resulting\u0000agents often generalize poorly to novel scenarios. Simulators provide the\u0000opportunity to go beyond offline datasets, but they are still treated as\u0000complicated black boxes, only used to update the global simulation state. As a\u0000result, these RL algorithms are slow, sample-inefficient, and prior-agnostic.\u0000In this work, we leverage a differentiable simulator and design an analytic\u0000policy gradients (APG) approach to training AV controllers on the large-scale\u0000Waymo Open Motion Dataset. Our proposed framework brings the differentiable\u0000simulator into an end-to-end training loop, where gradients of the environment\u0000dynamics serve as a useful prior to help the agent learn a more grounded\u0000policy. We combine this setup with a recurrent architecture that can\u0000efficiently propagate temporal information across long simulated trajectories.\u0000This APG method allows us to learn robust, accurate, and fast policies, while\u0000only requiring widely-available expert trajectories, instead of scarce expert\u0000actions. We compare to behavioural cloning and find significant improvements in\u0000performance and robustness to noise in the dynamics, as well as overall more\u0000intuitive human-like handling.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan
Wildlife monitoring via camera traps has become an essential tool in ecology, but the deployment of machine learning models for on-device animal classification faces significant challenges due to domain shifts and resource constraints. This paper introduces WildFit, a novel approach that reconciles the conflicting goals of achieving high domain generalization performance and ensuring efficient inference for camera trap applications. WildFit leverages continuous background-aware model fine-tuning to deploy ML models tailored to the current location and time window, allowing it to maintain robust classification accuracy in the new environment without requiring significant computational resources. This is achieved by background-aware data synthesis, which generates training images representing the new domain by blending background images with animal images from the source domain. We further enhance fine-tuning effectiveness through background drift detection and class distribution drift detection, which optimize the quality of synthesized data and improve generalization performance. Our extensive evaluation across multiple camera trap datasets demonstrates that WildFit achieves significant improvements in classification accuracy and computational efficiency compared to traditional approaches.
通过相机陷阱对野生动物进行监测已成为生态学的重要工具,但由于领域转移和资源限制,在设备上部署用于动物分类的机器学习模型面临着巨大挑战。本文介绍的 WildFit 是一种新颖的方法,它能在实现高领域泛化性能和确保相机陷阱应用的高效推理这两个相互冲突的目标之间取得平衡。WildFit 利用连续的背景感知模型微调技术,部署适合当前位置和时间窗口的 ML 模型,使其能够在新环境中保持稳健的分类准确性,而无需大量的计算资源。这是通过背景感知数据合成实现的,它通过将背景图像与源领域的动物图像混合生成代表新领域的训练图像。我们通过背景漂移检测和类分布漂移检测进一步提高了微调效果,从而优化了合成数据的质量,提高了泛化性能。我们在多个相机陷阱数据集上进行的广泛评估表明,与传统方法相比,WildFit 在分类准确性和计算效率方面都有显著提高。
{"title":"In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation","authors":"Mohammad Mehdi Rastikerdar, Jin Huang, Hui Guan, Deepak Ganesan","doi":"arxiv-2409.07796","DOIUrl":"https://doi.org/arxiv-2409.07796","url":null,"abstract":"Wildlife monitoring via camera traps has become an essential tool in ecology,\u0000but the deployment of machine learning models for on-device animal\u0000classification faces significant challenges due to domain shifts and resource\u0000constraints. This paper introduces WildFit, a novel approach that reconciles\u0000the conflicting goals of achieving high domain generalization performance and\u0000ensuring efficient inference for camera trap applications. WildFit leverages\u0000continuous background-aware model fine-tuning to deploy ML models tailored to\u0000the current location and time window, allowing it to maintain robust\u0000classification accuracy in the new environment without requiring significant\u0000computational resources. This is achieved by background-aware data synthesis,\u0000which generates training images representing the new domain by blending\u0000background images with animal images from the source domain. We further enhance\u0000fine-tuning effectiveness through background drift detection and class\u0000distribution drift detection, which optimize the quality of synthesized data\u0000and improve generalization performance. Our extensive evaluation across\u0000multiple camera trap datasets demonstrates that WildFit achieves significant\u0000improvements in classification accuracy and computational efficiency compared\u0000to traditional approaches.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI software engineers. Recently, many data science benchmarks have been proposed to investigate their performance in the data science domain. However, existing data science benchmarks still fall short when compared to real-world data science applications due to their simplified settings. To bridge this gap, we introduce DSBench, a comprehensive benchmark designed to evaluate data science agents with realistic tasks. This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions. DSBench offers a realistic setting by encompassing long contexts, multimodal task backgrounds, reasoning with large data files and multi-table structures, and performing end-to-end data modeling tasks. Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG). These findings underscore the need for further advancements in developing more practical, intelligent, and autonomous data science agents.
{"title":"DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?","authors":"Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu","doi":"arxiv-2409.07703","DOIUrl":"https://doi.org/arxiv-2409.07703","url":null,"abstract":"Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have\u0000demonstrated impressive language/vision reasoning abilities, igniting the\u0000recent trend of building agents for targeted applications such as shopping\u0000assistants or AI software engineers. Recently, many data science benchmarks\u0000have been proposed to investigate their performance in the data science domain.\u0000However, existing data science benchmarks still fall short when compared to\u0000real-world data science applications due to their simplified settings. To\u0000bridge this gap, we introduce DSBench, a comprehensive benchmark designed to\u0000evaluate data science agents with realistic tasks. This benchmark includes 466\u0000data analysis tasks and 74 data modeling tasks, sourced from Eloquence and\u0000Kaggle competitions. DSBench offers a realistic setting by encompassing long\u0000contexts, multimodal task backgrounds, reasoning with large data files and\u0000multi-table structures, and performing end-to-end data modeling tasks. Our\u0000evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle\u0000with most tasks, with the best agent solving only 34.12% of data analysis tasks\u0000and achieving a 34.74% Relative Performance Gap (RPG). These findings\u0000underscore the need for further advancements in developing more practical,\u0000intelligent, and autonomous data science agents.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image segmentation, a critical application of semantic segmentation in healthcare, has seen significant advancements through specialized computer vision techniques. While deep learning-based medical image segmentation is essential for assisting in medical diagnosis, the lack of diverse training data causes the long-tail problem. Moreover, most previous hybrid CNN-ViT architectures have limited ability to combine various attentions in different layers of the Convolutional Neural Network. To address these issues, we propose a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware Contrastive Loss, as the overall training objective for semi-supervised learning to mitigate the long-tail problem. Additionally, we introduce CMAformer, a novel network that synergizes the strengths of ResUNet and Transformer. The cross-attention block in CMAformer effectively integrates spatial attention and channel attention for multi-scale feature fusion. Overall, our results indicate that CMAformer, combined with the feature fusion framework and the new consistency loss, demonstrates strong complementarity in semi-supervised learning ensembles. We achieve state-of-the-art results on multiple public medical image datasets. Example code are available at: url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}.
{"title":"Lagrange Duality and Compound Multi-Attention Transformer for Semi-Supervised Medical Image Segmentation","authors":"Fuchen Zheng, Quanjun Li, Weixuan Li, Xuhang Chen, Yihang Dong, Guoheng Huang, Chi-Man Pun, Shoujun Zhou","doi":"arxiv-2409.07793","DOIUrl":"https://doi.org/arxiv-2409.07793","url":null,"abstract":"Medical image segmentation, a critical application of semantic segmentation\u0000in healthcare, has seen significant advancements through specialized computer\u0000vision techniques. While deep learning-based medical image segmentation is\u0000essential for assisting in medical diagnosis, the lack of diverse training data\u0000causes the long-tail problem. Moreover, most previous hybrid CNN-ViT\u0000architectures have limited ability to combine various attentions in different\u0000layers of the Convolutional Neural Network. To address these issues, we propose\u0000a Lagrange Duality Consistency (LDC) Loss, integrated with Boundary-Aware\u0000Contrastive Loss, as the overall training objective for semi-supervised\u0000learning to mitigate the long-tail problem. Additionally, we introduce\u0000CMAformer, a novel network that synergizes the strengths of ResUNet and\u0000Transformer. The cross-attention block in CMAformer effectively integrates\u0000spatial attention and channel attention for multi-scale feature fusion.\u0000Overall, our results indicate that CMAformer, combined with the feature fusion\u0000framework and the new consistency loss, demonstrates strong complementarity in\u0000semi-supervised learning ensembles. We achieve state-of-the-art results on\u0000multiple public medical image datasets. Example code are available at:\u0000url{https://github.com/lzeeorno/Lagrange-Duality-and-CMAformer}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlie Griffin, Louis Thomson, Buck Shlegeris, Alessandro Abate
To evaluate the safety and usefulness of deployment protocols for untrusted AIs, AI Control uses a red-teaming exercise played between a protocol designer and an adversary. This paper introduces AI-Control Games, a formal decision-making model of the red-teaming exercise as a multi-objective, partially observable, stochastic game. We also introduce methods for finding optimal protocols in AI-Control Games, by reducing them to a set of zero-sum partially observable stochastic games. We apply our formalism to model, evaluate and synthesise protocols for deploying untrusted language models as programming assistants, focusing on Trusted Monitoring protocols, which use weaker language models and limited human assistance. Finally, we demonstrate the utility of our formalism by showcasing improvements over empirical studies in existing settings, evaluating protocols in new settings, and analysing how modelling assumptions affect the safety and usefulness of protocols.
{"title":"Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols","authors":"Charlie Griffin, Louis Thomson, Buck Shlegeris, Alessandro Abate","doi":"arxiv-2409.07985","DOIUrl":"https://doi.org/arxiv-2409.07985","url":null,"abstract":"To evaluate the safety and usefulness of deployment protocols for untrusted\u0000AIs, AI Control uses a red-teaming exercise played between a protocol designer\u0000and an adversary. This paper introduces AI-Control Games, a formal\u0000decision-making model of the red-teaming exercise as a multi-objective,\u0000partially observable, stochastic game. We also introduce methods for finding\u0000optimal protocols in AI-Control Games, by reducing them to a set of zero-sum\u0000partially observable stochastic games. We apply our formalism to model,\u0000evaluate and synthesise protocols for deploying untrusted language models as\u0000programming assistants, focusing on Trusted Monitoring protocols, which use\u0000weaker language models and limited human assistance. Finally, we demonstrate\u0000the utility of our formalism by showcasing improvements over empirical studies\u0000in existing settings, evaluating protocols in new settings, and analysing how\u0000modelling assumptions affect the safety and usefulness of protocols.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent studies have shown that cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor trigger is observed, it will perform abnormal actions leading to failures or malicious goals. However, existing proposed backdoors suffer from several issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is trained or activated by an additional network, or all agents are backdoored. To this end, in this paper, we propose a novel backdoor attack against c-MADRL, which attacks the entire multi-agent team by embedding the backdoor only in a single agent. Firstly, we introduce adversary spatiotemporal behavior patterns as the backdoor trigger rather than manual-injected fixed visual patterns or instant status and control the attack duration. This method can guarantee the stealthiness and practicality of injected backdoors. Secondly, we hack the original reward function of the backdoored agent via reward reverse and unilateral guidance during training to ensure its adverse influence on the entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results demonstrate that our backdoor attacks are able to reach a high attack success rate (91.6%) while maintaining a low clean performance variance rate (3.7%).
{"title":"A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning","authors":"Yinbo Yu, Saihao Yan, Jiajia Liu","doi":"arxiv-2409.07775","DOIUrl":"https://doi.org/arxiv-2409.07775","url":null,"abstract":"Recent studies have shown that cooperative multi-agent deep reinforcement\u0000learning (c-MADRL) is under the threat of backdoor attacks. Once a backdoor\u0000trigger is observed, it will perform abnormal actions leading to failures or\u0000malicious goals. However, existing proposed backdoors suffer from several\u0000issues, e.g., fixed visual trigger patterns lack stealthiness, the backdoor is\u0000trained or activated by an additional network, or all agents are backdoored. To\u0000this end, in this paper, we propose a novel backdoor attack against c-MADRL,\u0000which attacks the entire multi-agent team by embedding the backdoor only in a\u0000single agent. Firstly, we introduce adversary spatiotemporal behavior patterns\u0000as the backdoor trigger rather than manual-injected fixed visual patterns or\u0000instant status and control the attack duration. This method can guarantee the\u0000stealthiness and practicality of injected backdoors. Secondly, we hack the\u0000original reward function of the backdoored agent via reward reverse and\u0000unilateral guidance during training to ensure its adverse influence on the\u0000entire team. We evaluate our backdoor attacks on two classic c-MADRL algorithms\u0000VDN and QMIX, in a popular c-MADRL environment SMAC. The experimental results\u0000demonstrate that our backdoor attacks are able to reach a high attack success\u0000rate (91.6%) while maintaining a low clean performance variance rate (3.7%).","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Octavio Arriaga, Jichen Guo, Rebecca Adam, Sebastian Houben, Frank Kirchner
Humans excel at building generalizations of new concepts from just one single example. Contrary to this, current computer vision models typically require large amount of training samples to achieve a comparable accuracy. In this work we present a Bayesian model of perception that learns using only minimal data, a prototypical probabilistic program of an object. Specifically, we propose a generative inverse graphics model of primitive shapes, to infer posterior distributions over physically consistent parameters from one or several images. We show how this representation can be used for downstream tasks such as few-shot classification and pose estimation. Our model outperforms existing few-shot neural-only classification algorithms and demonstrates generalization across varying lighting conditions, backgrounds, and out-of-distribution shapes. By design, our model is uncertainty-aware and uses our new differentiable renderer for optimizing global scene parameters through gradient descent, sampling posterior distributions over object parameters with Markov Chain Monte Carlo (MCMC), and using a neural based likelihood function.
{"title":"Bayesian Inverse Graphics for Few-Shot Concept Learning","authors":"Octavio Arriaga, Jichen Guo, Rebecca Adam, Sebastian Houben, Frank Kirchner","doi":"arxiv-2409.08351","DOIUrl":"https://doi.org/arxiv-2409.08351","url":null,"abstract":"Humans excel at building generalizations of new concepts from just one single\u0000example. Contrary to this, current computer vision models typically require\u0000large amount of training samples to achieve a comparable accuracy. In this work\u0000we present a Bayesian model of perception that learns using only minimal data,\u0000a prototypical probabilistic program of an object. Specifically, we propose a\u0000generative inverse graphics model of primitive shapes, to infer posterior\u0000distributions over physically consistent parameters from one or several images.\u0000We show how this representation can be used for downstream tasks such as\u0000few-shot classification and pose estimation. Our model outperforms existing\u0000few-shot neural-only classification algorithms and demonstrates generalization\u0000across varying lighting conditions, backgrounds, and out-of-distribution\u0000shapes. By design, our model is uncertainty-aware and uses our new\u0000differentiable renderer for optimizing global scene parameters through gradient\u0000descent, sampling posterior distributions over object parameters with Markov\u0000Chain Monte Carlo (MCMC), and using a neural based likelihood function.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A metagame is a collection of knowledge that goes beyond the rules of a game. In competitive, team-based games like Pok'emon or League of Legends, it refers to the set of current dominant characters and/or strategies within the player base. Developer changes to the balance of the game can have drastic and unforeseen consequences on these sets of meta characters. A framework for predicting the impact of balance changes could aid developers in making more informed balance decisions. In this paper we present such a Meta Discovery framework, leveraging Reinforcement Learning for automated testing of balance changes. Our results demonstrate the ability to predict the outcome of balance changes in Pok'emon Showdown, a collection of competitive Pok'emon tiers, with high accuracy.
{"title":"A Framework for Predicting the Impact of Game Balance Changes through Meta Discovery","authors":"Akash Saravanan, Matthew Guzdial","doi":"arxiv-2409.07340","DOIUrl":"https://doi.org/arxiv-2409.07340","url":null,"abstract":"A metagame is a collection of knowledge that goes beyond the rules of a game.\u0000In competitive, team-based games like Pok'emon or League of Legends, it refers\u0000to the set of current dominant characters and/or strategies within the player\u0000base. Developer changes to the balance of the game can have drastic and\u0000unforeseen consequences on these sets of meta characters. A framework for\u0000predicting the impact of balance changes could aid developers in making more\u0000informed balance decisions. In this paper we present such a Meta Discovery\u0000framework, leveraging Reinforcement Learning for automated testing of balance\u0000changes. Our results demonstrate the ability to predict the outcome of balance\u0000changes in Pok'emon Showdown, a collection of competitive Pok'emon tiers,\u0000with high accuracy.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This position paper explores the rapid development of Foundation Models (FMs) in AI and their implications for intelligence and reasoning. It examines the characteristics of FMs, including their training on vast datasets and use of embedding spaces to capture semantic relationships. The paper discusses recent advancements in FMs' reasoning abilities which we argue cannot be attributed to increased model size but to novel training techniques which yield learning phenomena like grokking. It also addresses the challenges in benchmarking FMs and compares their structure to the human brain. We argue that while FMs show promising developments in reasoning and knowledge representation, understanding their inner workings remains a significant challenge, similar to ongoing efforts in neuroscience to comprehend human brain function. Despite having some similarities, fundamental differences between FMs and the structure of human brain warn us against making direct comparisons or expecting neuroscience to provide immediate insights into FM function.
{"title":"Understanding Foundation Models: Are We Back in 1924?","authors":"Alan F. Smeaton","doi":"arxiv-2409.07618","DOIUrl":"https://doi.org/arxiv-2409.07618","url":null,"abstract":"This position paper explores the rapid development of Foundation Models (FMs)\u0000in AI and their implications for intelligence and reasoning. It examines the\u0000characteristics of FMs, including their training on vast datasets and use of\u0000embedding spaces to capture semantic relationships. The paper discusses recent\u0000advancements in FMs' reasoning abilities which we argue cannot be attributed to\u0000increased model size but to novel training techniques which yield learning\u0000phenomena like grokking. It also addresses the challenges in benchmarking FMs\u0000and compares their structure to the human brain. We argue that while FMs show\u0000promising developments in reasoning and knowledge representation, understanding\u0000their inner workings remains a significant challenge, similar to ongoing\u0000efforts in neuroscience to comprehend human brain function. Despite having some\u0000similarities, fundamental differences between FMs and the structure of human\u0000brain warn us against making direct comparisons or expecting neuroscience to\u0000provide immediate insights into FM function.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Solving combinatorial optimization problems involve satisfying a set of hard constraints while optimizing some objectives. In this context, exact or approximate methods can be used. While exact methods guarantee the optimal solution, they often come with an exponential running time as opposed to approximate methods that trade the solutions quality for a better running time. In this context, we tackle the Nurse Scheduling Problem (NSP). The NSP consist in assigning nurses to daily shifts within a planning horizon such that workload constraints are satisfied while hospitals costs and nurses preferences are optimized. To solve the NSP, we propose implicit and explicit approaches. In the implicit solving approach, we rely on Machine Learning methods using historical data to learn and generate new solutions through the constraints and objectives that may be embedded in the learned patterns. To quantify the quality of using our implicit approach in capturing the embedded constraints and objectives, we rely on the Frobenius Norm, a quality measure used to compute the average error between the generated solutions and historical data. To compensate for the uncertainty related to the implicit approach given that the constraints and objectives may not be concretely visible in the produced solutions, we propose an alternative explicit approach where we first model the NSP using the Constraint Satisfaction Problem (CSP) framework. Then we develop Stochastic Local Search methods and a new Branch and Bound algorithm enhanced with constraint propagation techniques and variables/values ordering heuristics. Since our implicit approach may not guarantee the feasibility or optimality of the generated solution, we propose a data-driven approach to passively learn the NSP as a constraint network. The learned constraint network, formulated as a CSP, will then be solved using the methods we listed earlier.
{"title":"Machine Learning and Constraint Programming for Efficient Healthcare Scheduling","authors":"Aymen Ben Said, Malek Mouhoub","doi":"arxiv-2409.07547","DOIUrl":"https://doi.org/arxiv-2409.07547","url":null,"abstract":"Solving combinatorial optimization problems involve satisfying a set of hard\u0000constraints while optimizing some objectives. In this context, exact or\u0000approximate methods can be used. While exact methods guarantee the optimal\u0000solution, they often come with an exponential running time as opposed to\u0000approximate methods that trade the solutions quality for a better running time.\u0000In this context, we tackle the Nurse Scheduling Problem (NSP). The NSP consist\u0000in assigning nurses to daily shifts within a planning horizon such that\u0000workload constraints are satisfied while hospitals costs and nurses preferences\u0000are optimized. To solve the NSP, we propose implicit and explicit approaches.\u0000In the implicit solving approach, we rely on Machine Learning methods using\u0000historical data to learn and generate new solutions through the constraints and\u0000objectives that may be embedded in the learned patterns. To quantify the\u0000quality of using our implicit approach in capturing the embedded constraints\u0000and objectives, we rely on the Frobenius Norm, a quality measure used to\u0000compute the average error between the generated solutions and historical data.\u0000To compensate for the uncertainty related to the implicit approach given that\u0000the constraints and objectives may not be concretely visible in the produced\u0000solutions, we propose an alternative explicit approach where we first model the\u0000NSP using the Constraint Satisfaction Problem (CSP) framework. Then we develop\u0000Stochastic Local Search methods and a new Branch and Bound algorithm enhanced\u0000with constraint propagation techniques and variables/values ordering\u0000heuristics. Since our implicit approach may not guarantee the feasibility or\u0000optimality of the generated solution, we propose a data-driven approach to\u0000passively learn the NSP as a constraint network. The learned constraint\u0000network, formulated as a CSP, will then be solved using the methods we listed\u0000earlier.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}