In the coming years, AI agents will be used for making more complex decisions, including in situations involving many different groups of people. One big challenge is that AI agent tends to act in its own interest, unlike humans who often think about what will be the best for everyone in the long run. In this paper, we explore a method to get self-interested agents to work towards goals that benefit society as a whole. We propose a method to add a manager agent to mediate agent interactions by assigning incentives to certain actions. We tested our method with a supply-chain management problem and showed that this framework (1) increases the raw reward by 22.2%, (2) increases the agents' reward by 23.8%, and (3) increases the manager's reward by 20.1%.
{"title":"Managing multiple agents by automatically adjusting incentives","authors":"Shunichi Akatsuka, Yaemi Teramoto, Aaron Courville","doi":"arxiv-2409.02960","DOIUrl":"https://doi.org/arxiv-2409.02960","url":null,"abstract":"In the coming years, AI agents will be used for making more complex\u0000decisions, including in situations involving many different groups of people.\u0000One big challenge is that AI agent tends to act in its own interest, unlike\u0000humans who often think about what will be the best for everyone in the long\u0000run. In this paper, we explore a method to get self-interested agents to work\u0000towards goals that benefit society as a whole. We propose a method to add a\u0000manager agent to mediate agent interactions by assigning incentives to certain\u0000actions. We tested our method with a supply-chain management problem and showed\u0000that this framework (1) increases the raw reward by 22.2%, (2) increases the\u0000agents' reward by 23.8%, and (3) increases the manager's reward by 20.1%.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace
Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks. Leveraging these capabilities, LLMs hold the potential to transform the entire hardware design stack, with predictions suggesting that front-end and back-end tasks could be fully automated in the near future. Currently, LLMs show great promise in streamlining Register Transfer Level (RTL) generation, enhancing efficiency, and accelerating innovation. However, their probabilistic nature makes them prone to inaccuracies - a significant drawback in RTL design, where reliability and precision are essential. To address these challenges, this paper introduces AIvril, an advanced framework designed to enhance the accuracy and reliability of RTL-aware LLMs. AIvril employs a multi-agent, LLM-agnostic system for automatic syntax correction and functional verification, significantly reducing - and in many cases, completely eliminating - instances of erroneous code generation. Experimental results conducted on the VerilogEval-Human dataset show that our framework improves code quality by nearly 2x when compared to previous works, while achieving an 88.46% success rate in meeting verification objectives. This represents a critical step toward automating and optimizing hardware design workflows, offering a more dependable methodology for AI-driven RTL design.
{"title":"AIvril: AI-Driven RTL Generation With Verification In-The-Loop","authors":"Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace","doi":"arxiv-2409.11411","DOIUrl":"https://doi.org/arxiv-2409.11411","url":null,"abstract":"Large Language Models (LLMs) are computational models capable of performing\u0000complex natural language processing tasks. Leveraging these capabilities, LLMs\u0000hold the potential to transform the entire hardware design stack, with\u0000predictions suggesting that front-end and back-end tasks could be fully\u0000automated in the near future. Currently, LLMs show great promise in\u0000streamlining Register Transfer Level (RTL) generation, enhancing efficiency,\u0000and accelerating innovation. However, their probabilistic nature makes them\u0000prone to inaccuracies - a significant drawback in RTL design, where reliability\u0000and precision are essential. To address these challenges, this paper introduces AIvril, an advanced\u0000framework designed to enhance the accuracy and reliability of RTL-aware LLMs.\u0000AIvril employs a multi-agent, LLM-agnostic system for automatic syntax\u0000correction and functional verification, significantly reducing - and in many\u0000cases, completely eliminating - instances of erroneous code generation.\u0000Experimental results conducted on the VerilogEval-Human dataset show that our\u0000framework improves code quality by nearly 2x when compared to previous works,\u0000while achieving an 88.46% success rate in meeting verification objectives. This\u0000represents a critical step toward automating and optimizing hardware design\u0000workflows, offering a more dependable methodology for AI-driven RTL design.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in Large Language Models (LLMs) have spurred a surge of interest in leveraging these models for game-theoretical simulations, where LLMs act as individual agents engaging in social interactions. This study explores the potential for LLM agents to spontaneously generate and adhere to normative strategies through natural language discourse, building upon the foundational work of Axelrod's metanorm games. Our experiments demonstrate that through dialogue, LLM agents can form complex social norms, such as metanorms-norms enforcing the punishment of those who do not punish cheating-purely through natural language interaction. The results affirm the effectiveness of using LLM agents for simulating social interactions and understanding the emergence and evolution of complex strategies and norms through natural language. Future work may extend these findings by incorporating a wider range of scenarios and agent characteristics, aiming to uncover more nuanced mechanisms behind social norm formation.
{"title":"Evolution of Social Norms in LLM Agents using Natural Language","authors":"Ilya Horiguchi, Takahide Yoshida, Takashi Ikegami","doi":"arxiv-2409.00993","DOIUrl":"https://doi.org/arxiv-2409.00993","url":null,"abstract":"Recent advancements in Large Language Models (LLMs) have spurred a surge of\u0000interest in leveraging these models for game-theoretical simulations, where\u0000LLMs act as individual agents engaging in social interactions. This study\u0000explores the potential for LLM agents to spontaneously generate and adhere to\u0000normative strategies through natural language discourse, building upon the\u0000foundational work of Axelrod's metanorm games. Our experiments demonstrate that\u0000through dialogue, LLM agents can form complex social norms, such as\u0000metanorms-norms enforcing the punishment of those who do not punish\u0000cheating-purely through natural language interaction. The results affirm the\u0000effectiveness of using LLM agents for simulating social interactions and\u0000understanding the emergence and evolution of complex strategies and norms\u0000through natural language. Future work may extend these findings by\u0000incorporating a wider range of scenarios and agent characteristics, aiming to\u0000uncover more nuanced mechanisms behind social norm formation.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du
We initiate the study of Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games, a problem marked by the challenge of sparse feedback signals. Our theory establishes the upper complexity bounds for Nash Equilibrium in effective MARLHF, demonstrating that single-policy coverage is inadequate and highlighting the importance of unilateral dataset coverage. These theoretical insights are verified through comprehensive experiments. To enhance the practical performance, we further introduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE) regularization along the time axis to achieve a more uniform reward distribution and improve reward learning outcomes. (2) We utilize imitation learning to approximate the reference policy, ensuring stability and effectiveness in training. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
{"title":"Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques","authors":"Natalia Zhang, Xinqi Wang, Qiwen Cui, Runlong Zhou, Sham M. Kakade, Simon S. Du","doi":"arxiv-2409.00717","DOIUrl":"https://doi.org/arxiv-2409.00717","url":null,"abstract":"We initiate the study of Multi-Agent Reinforcement Learning from Human\u0000Feedback (MARLHF), exploring both theoretical foundations and empirical\u0000validations. We define the task as identifying Nash equilibrium from a\u0000preference-only offline dataset in general-sum games, a problem marked by the\u0000challenge of sparse feedback signals. Our theory establishes the upper\u0000complexity bounds for Nash Equilibrium in effective MARLHF, demonstrating that\u0000single-policy coverage is inadequate and highlighting the importance of\u0000unilateral dataset coverage. These theoretical insights are verified through\u0000comprehensive experiments. To enhance the practical performance, we further\u0000introduce two algorithmic techniques. (1) We propose a Mean Squared Error (MSE)\u0000regularization along the time axis to achieve a more uniform reward\u0000distribution and improve reward learning outcomes. (2) We utilize imitation\u0000learning to approximate the reference policy, ensuring stability and\u0000effectiveness in training. Our findings underscore the multifaceted approach\u0000required for MARLHF, paving the way for effective preference-based multi-agent\u0000systems.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"73 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunxiao Shi, Min Xu, Haimin Zhang, Xing Zi, Qiang Wu
Large language models (LLMs) and retrieval-augmented generation (RAG) techniques have revolutionized traditional information access, enabling AI agent to search and summarize information on behalf of users during dynamic dialogues. Despite their potential, current AI search engines exhibit considerable room for improvement in several critical areas. These areas include the support for multimodal information, the delivery of personalized responses, the capability to logically answer complex questions, and the facilitation of more flexible interactions. This paper proposes a novel AI Search Engine framework called the Agent Collaboration Network (ACN). The ACN framework consists of multiple specialized agents working collaboratively, each with distinct roles such as Account Manager, Solution Strategist, Information Manager, and Content Creator. This framework integrates mechanisms for picture content understanding, user profile tracking, and online evolution, enhancing the AI search engine's response quality, personalization, and interactivity. A highlight of the ACN is the introduction of a Reflective Forward Optimization method (RFO), which supports the online synergistic adjustment among agents. This feature endows the ACN with online learning capabilities, ensuring that the system has strong interactive flexibility and can promptly adapt to user feedback. This learning method may also serve as an optimization approach for agent-based systems, potentially influencing other domains of agent applications.
{"title":"A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine","authors":"Yunxiao Shi, Min Xu, Haimin Zhang, Xing Zi, Qiang Wu","doi":"arxiv-2409.00636","DOIUrl":"https://doi.org/arxiv-2409.00636","url":null,"abstract":"Large language models (LLMs) and retrieval-augmented generation (RAG)\u0000techniques have revolutionized traditional information access, enabling AI\u0000agent to search and summarize information on behalf of users during dynamic\u0000dialogues. Despite their potential, current AI search engines exhibit\u0000considerable room for improvement in several critical areas. These areas\u0000include the support for multimodal information, the delivery of personalized\u0000responses, the capability to logically answer complex questions, and the\u0000facilitation of more flexible interactions. This paper proposes a novel AI\u0000Search Engine framework called the Agent Collaboration Network (ACN). The ACN\u0000framework consists of multiple specialized agents working collaboratively, each\u0000with distinct roles such as Account Manager, Solution Strategist, Information\u0000Manager, and Content Creator. This framework integrates mechanisms for picture\u0000content understanding, user profile tracking, and online evolution, enhancing\u0000the AI search engine's response quality, personalization, and interactivity. A\u0000highlight of the ACN is the introduction of a Reflective Forward Optimization\u0000method (RFO), which supports the online synergistic adjustment among agents.\u0000This feature endows the ACN with online learning capabilities, ensuring that\u0000the system has strong interactive flexibility and can promptly adapt to user\u0000feedback. This learning method may also serve as an optimization approach for\u0000agent-based systems, potentially influencing other domains of agent\u0000applications.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While Agent-Based Models can create detailed artificial societies based on individual differences and local context, they can be computationally intensive. Modelers may offset these costs through a parsimonious use of the model, for example by using smaller population sizes (which limits analyses in sub-populations), running fewer what-if scenarios, or accepting more uncertainty by performing fewer simulations. Alternatively, researchers may accelerate simulations via hardware solutions (e.g., GPU parallelism) or approximation approaches that operate a tradeoff between accuracy and compute time. In this paper, we present an approximation that combines agents who `think alike', thus reducing the population size and the compute time. Our innovation relies on representing agent behaviors as networks of rules (Fuzzy Cognitive Maps) and empirically evaluating different measures of distance between these networks. Then, we form groups of think-alike agents via community detection and simplify them to a representative agent. Case studies show that our simplifications remain accuracy.
{"title":"Accelerating Hybrid Agent-Based Models and Fuzzy Cognitive Maps: How to Combine Agents who Think Alike?","authors":"Philippe J. Giabbanelli, Jack T. Beerman","doi":"arxiv-2409.00824","DOIUrl":"https://doi.org/arxiv-2409.00824","url":null,"abstract":"While Agent-Based Models can create detailed artificial societies based on\u0000individual differences and local context, they can be computationally\u0000intensive. Modelers may offset these costs through a parsimonious use of the\u0000model, for example by using smaller population sizes (which limits analyses in\u0000sub-populations), running fewer what-if scenarios, or accepting more\u0000uncertainty by performing fewer simulations. Alternatively, researchers may\u0000accelerate simulations via hardware solutions (e.g., GPU parallelism) or\u0000approximation approaches that operate a tradeoff between accuracy and compute\u0000time. In this paper, we present an approximation that combines agents who\u0000`think alike', thus reducing the population size and the compute time. Our\u0000innovation relies on representing agent behaviors as networks of rules (Fuzzy\u0000Cognitive Maps) and empirically evaluating different measures of distance\u0000between these networks. Then, we form groups of think-alike agents via\u0000community detection and simplify them to a representative agent. Case studies\u0000show that our simplifications remain accuracy.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can balance be quantified in game settings? This question is crucial for game designers, especially in player-versus-player (PvP) games, where analyzing the strength relations among predefined team compositions-such as hero combinations in multiplayer online battle arena (MOBA) games or decks in card games-is essential for enhancing gameplay and achieving balance. We have developed two advanced measures that extend beyond the simplistic win rate to quantify balance in zero-sum competitive scenarios. These measures are derived from win value estimations, which employ strength rating approximations via the Bradley-Terry model and counter relationship approximations via vector quantization, significantly reducing the computational complexity associated with traditional win value estimations. Throughout the learning process of these models, we identify useful categories of compositions and pinpoint their counter relationships, aligning with the experiences of human players without requiring specific game knowledge. Our methodology hinges on a simple technique to enhance codebook utilization in discrete representation with a deterministic vector quantization process for an extremely small state space. Our framework has been validated in popular online games, including Age of Empires II, Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observed strength relations in these games is comparable to traditional pairwise win value predictions, while also offering a more manageable complexity for analysis. Ultimately, our findings contribute to a deeper understanding of PvP game dynamics and present a methodology that significantly improves game balance evaluation and design.
{"title":"Identifying and Clustering Counter Relationships of Team Compositions in PvP Games for Efficient Balance Analysis","authors":"Chiu-Chou Lin, Yu-Wei Shih, Kuei-Ting Kuo, Yu-Cheng Chen, Chien-Hua Chen, Wei-Chen Chiu, I-Chen Wu","doi":"arxiv-2408.17180","DOIUrl":"https://doi.org/arxiv-2408.17180","url":null,"abstract":"How can balance be quantified in game settings? This question is crucial for\u0000game designers, especially in player-versus-player (PvP) games, where analyzing\u0000the strength relations among predefined team compositions-such as hero\u0000combinations in multiplayer online battle arena (MOBA) games or decks in card\u0000games-is essential for enhancing gameplay and achieving balance. We have\u0000developed two advanced measures that extend beyond the simplistic win rate to\u0000quantify balance in zero-sum competitive scenarios. These measures are derived\u0000from win value estimations, which employ strength rating approximations via the\u0000Bradley-Terry model and counter relationship approximations via vector\u0000quantization, significantly reducing the computational complexity associated\u0000with traditional win value estimations. Throughout the learning process of\u0000these models, we identify useful categories of compositions and pinpoint their\u0000counter relationships, aligning with the experiences of human players without\u0000requiring specific game knowledge. Our methodology hinges on a simple technique\u0000to enhance codebook utilization in discrete representation with a deterministic\u0000vector quantization process for an extremely small state space. Our framework\u0000has been validated in popular online games, including Age of Empires II,\u0000Hearthstone, Brawl Stars, and League of Legends. The accuracy of the observed\u0000strength relations in these games is comparable to traditional pairwise win\u0000value predictions, while also offering a more manageable complexity for\u0000analysis. Ultimately, our findings contribute to a deeper understanding of PvP\u0000game dynamics and present a methodology that significantly improves game\u0000balance evaluation and design.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik
Multi-agent pathfinding (MAPF) is a challenging computational problem that typically requires to find collision-free paths for multiple agents in a shared environment. Solving MAPF optimally is NP-hard, yet efficient solutions are critical for numerous applications, including automated warehouses and transportation systems. Recently, learning-based approaches to MAPF have gained attention, particularly those leveraging deep reinforcement learning. Following current trends in machine learning, we have created a foundation model for the MAPF problems called MAPF-GPT. Using imitation learning, we have trained a policy on a set of pre-collected sub-optimal expert trajectories that can generate actions in conditions of partial observability without additional heuristics, reward functions, or communication with other agents. The resulting MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF problem instances that were not present in the training dataset. We show that MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers on a diverse range of problem instances and is efficient in terms of computation (in the inference mode).
{"title":"MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale","authors":"Anton Andreychuk, Konstantin Yakovlev, Aleksandr Panov, Alexey Skrynnik","doi":"arxiv-2409.00134","DOIUrl":"https://doi.org/arxiv-2409.00134","url":null,"abstract":"Multi-agent pathfinding (MAPF) is a challenging computational problem that\u0000typically requires to find collision-free paths for multiple agents in a shared\u0000environment. Solving MAPF optimally is NP-hard, yet efficient solutions are\u0000critical for numerous applications, including automated warehouses and\u0000transportation systems. Recently, learning-based approaches to MAPF have gained\u0000attention, particularly those leveraging deep reinforcement learning. Following\u0000current trends in machine learning, we have created a foundation model for the\u0000MAPF problems called MAPF-GPT. Using imitation learning, we have trained a\u0000policy on a set of pre-collected sub-optimal expert trajectories that can\u0000generate actions in conditions of partial observability without additional\u0000heuristics, reward functions, or communication with other agents. The resulting\u0000MAPF-GPT model demonstrates zero-shot learning abilities when solving the MAPF\u0000problem instances that were not present in the training dataset. We show that\u0000MAPF-GPT notably outperforms the current best-performing learnable-MAPF solvers\u0000on a diverse range of problem instances and is efficient in terms of\u0000computation (in the inference mode).","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By compressing diverse narratives, LLMs go beyond memorization, achieving intelligence by capturing generalizable causal relationships. However, they suffer from local 'representation gaps' due to insufficient training data diversity, limiting their real-world utility, especially in tasks requiring strict alignment to rules. Traditional alignment methods relying on heavy human annotations are inefficient and unscalable. Recent self-alignment techniques also fall short, as they often depend on self-selection based prompting and memorization-based learning. To address these issues, we introduce Iterative Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical graphs and reference answers. The student model (LLM) identifies local knowledge gaps by attempting to align its responses with these references, collaborating with helper models to generate diverse answers. These aligned responses are then used for iterative supervised fine-tuning (SFT). Our evaluations across five rule-based scenarios demonstrate IGP's effectiveness, with a 73.12% alignment improvement in Claude Sonnet 3.5, and Llama3-8B-Instruct achieving an 86.20% improvement, outperforming Claude Sonnet 3.5 in rule-based alignment.
{"title":"Iterative Graph Alignment","authors":"Fangyuan Yu, Hardeep Singh Arora, Matt Johnson","doi":"arxiv-2408.16667","DOIUrl":"https://doi.org/arxiv-2408.16667","url":null,"abstract":"By compressing diverse narratives, LLMs go beyond memorization, achieving\u0000intelligence by capturing generalizable causal relationships. However, they\u0000suffer from local 'representation gaps' due to insufficient training data\u0000diversity, limiting their real-world utility, especially in tasks requiring\u0000strict alignment to rules. Traditional alignment methods relying on heavy human\u0000annotations are inefficient and unscalable. Recent self-alignment techniques\u0000also fall short, as they often depend on self-selection based prompting and\u0000memorization-based learning. To address these issues, we introduce Iterative\u0000Graph Alignment (IGA), an annotation-free rule-based alignment algorithm. A\u0000teacher model (VLM) employs Iterative Graph Prompting (IGP) to create logical\u0000graphs and reference answers. The student model (LLM) identifies local\u0000knowledge gaps by attempting to align its responses with these references,\u0000collaborating with helper models to generate diverse answers. These aligned\u0000responses are then used for iterative supervised fine-tuning (SFT). Our\u0000evaluations across five rule-based scenarios demonstrate IGP's effectiveness,\u0000with a 73.12% alignment improvement in Claude Sonnet 3.5, and\u0000Llama3-8B-Instruct achieving an 86.20% improvement, outperforming Claude\u0000Sonnet 3.5 in rule-based alignment.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A key problem in agent-based simulation is that integrating qualitative insights from multiple discipline experts is extremely hard. In most simulations, agent capabilities and corresponding behaviour needs to be programmed into the agent. We report on the architecture of a tool that disconnects the programmed functions of the agent, from the acquisition of capability and displayed behaviour. This allows multiple different domain experts to represent qualitative insights, without the need for code to be changed. It also allows a continuous integration (or even change) of qualitative behaviour processes, as more insights are gained. The consequent behaviour observed in the model is both, more faithful to the expert's insight as well as able to be contrasted against other models representing other insights.
{"title":"Different Facets for Different Experts: A Framework for Streamlining The Integration of Qualitative Insights into ABM Development","authors":"Vivek Nallur, Pedram Aghaei, Graham Finlay","doi":"arxiv-2408.15725","DOIUrl":"https://doi.org/arxiv-2408.15725","url":null,"abstract":"A key problem in agent-based simulation is that integrating qualitative\u0000insights from multiple discipline experts is extremely hard. In most\u0000simulations, agent capabilities and corresponding behaviour needs to be\u0000programmed into the agent. We report on the architecture of a tool that\u0000disconnects the programmed functions of the agent, from the acquisition of\u0000capability and displayed behaviour. This allows multiple different domain\u0000experts to represent qualitative insights, without the need for code to be\u0000changed. It also allows a continuous integration (or even change) of\u0000qualitative behaviour processes, as more insights are gained. The consequent\u0000behaviour observed in the model is both, more faithful to the expert's insight\u0000as well as able to be contrasted against other models representing other\u0000insights.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142190527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}