Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performance of model-based RL. Furthermore, while model-based RL can improve sample efficiency, it often still requires substantial training time to learn from scratch, potentially limiting its advantages over model-free approaches. To address these challenges, this paper introduces a knowledge-informed model-based residual reinforcement learning framework aimed at enhancing learning efficiency by infusing established expert knowledge into the learning process and avoiding the issue of beginning from zero. Our approach integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics, thus ensuring adaptability to complex scenarios. We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch. The proposed approach is applied to CAV trajectory control tasks for the dissipation of stop-and-go waves in mixed traffic flow. Experimental results demonstrate that our proposed approach enables the CAV agent to achieve superior performance in trajectory control compared to the baseline agents in terms of sample efficiency, traffic flow smoothness and traffic mobility. The source code and supplementary materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.
{"title":"Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control","authors":"Zihao Sheng, Zilin Huang, Sikai Chen","doi":"arxiv-2408.17380","DOIUrl":"https://doi.org/arxiv-2408.17380","url":null,"abstract":"Model-based reinforcement learning (RL) is anticipated to exhibit higher\u0000sample efficiency compared to model-free RL by utilizing a virtual environment\u0000model. However, it is challenging to obtain sufficiently accurate\u0000representations of the environmental dynamics due to uncertainties in complex\u0000systems and environments. An inaccurate environment model may degrade the\u0000sample efficiency and performance of model-based RL. Furthermore, while\u0000model-based RL can improve sample efficiency, it often still requires\u0000substantial training time to learn from scratch, potentially limiting its\u0000advantages over model-free approaches. To address these challenges, this paper\u0000introduces a knowledge-informed model-based residual reinforcement learning\u0000framework aimed at enhancing learning efficiency by infusing established expert\u0000knowledge into the learning process and avoiding the issue of beginning from\u0000zero. Our approach integrates traffic expert knowledge into a virtual\u0000environment model, employing the Intelligent Driver Model (IDM) for basic\u0000dynamics and neural networks for residual dynamics, thus ensuring adaptability\u0000to complex scenarios. We propose a novel strategy that combines traditional\u0000control methods with residual RL, facilitating efficient learning and policy\u0000optimization without the need to learn from scratch. The proposed approach is\u0000applied to CAV trajectory control tasks for the dissipation of stop-and-go\u0000waves in mixed traffic flow. Experimental results demonstrate that our proposed\u0000approach enables the CAV agent to achieve superior performance in trajectory\u0000control compared to the baseline agents in terms of sample efficiency, traffic\u0000flow smoothness and traffic mobility. The source code and supplementary\u0000materials are available at https://github.com/zihaosheng/traffic-expertise-RL/.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142193944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we present an innovative fusion of language models and query analysis techniques to unlock cognition in artificial intelligence. Our system seamlessly integrates a Chess engine with a language model, enabling it to predict moves and provide strategic explanations. Leveraging a vector database through retrievable answer generation, our OpenSI AI system elucidates its decision-making process, bridging the gap between raw computation and human-like understanding. Our choice of Chess as the demonstration environment underscores the versatility of our approach. Beyond Chess, our system holds promise for diverse applications, from medical diagnostics to financial forecasting.
{"title":"Unleashing Artificial Cognition: Integrating Multiple AI Systems","authors":"Muntasir Adnan, Buddhi Gamage, Zhiwei Xu, Damith Herath, Carlos Noschang Kuhn","doi":"arxiv-2408.04910","DOIUrl":"https://doi.org/arxiv-2408.04910","url":null,"abstract":"In this study, we present an innovative fusion of language models and query\u0000analysis techniques to unlock cognition in artificial intelligence. Our system\u0000seamlessly integrates a Chess engine with a language model, enabling it to\u0000predict moves and provide strategic explanations. Leveraging a vector database\u0000through retrievable answer generation, our OpenSI AI system elucidates its\u0000decision-making process, bridging the gap between raw computation and\u0000human-like understanding. Our choice of Chess as the demonstration environment\u0000underscores the versatility of our approach. Beyond Chess, our system holds\u0000promise for diverse applications, from medical diagnostics to financial\u0000forecasting.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Explaining decisions of black-box classifiers is both important and computationally challenging. In this paper, we scrutinize explainers that generate feature-based explanations from samples or datasets. We start by presenting a set of desirable properties that explainers would ideally satisfy, delve into their relationships, and highlight incompatibilities of some of them. We identify the entire family of explainers that satisfy two key properties which are compatible with all the others. Its instances provide sufficient reasons, called weak abductive explanations.We then unravel its various subfamilies that satisfy subsets of compatible properties. Indeed, we fully characterize all the explainers that satisfy any subset of compatible properties. In particular, we introduce the first (broad family of) explainers that guarantee the existence of explanations and their global consistency.We discuss some of its instances including the irrefutable explainer and the surrogate explainer whose explanations can be found in polynomial time.
{"title":"Axiomatic Characterisations of Sample-based Explainers","authors":"Leila Amgouda, Martin C. Cooper, Salim Debbaoui","doi":"arxiv-2408.04903","DOIUrl":"https://doi.org/arxiv-2408.04903","url":null,"abstract":"Explaining decisions of black-box classifiers is both important and\u0000computationally challenging. In this paper, we scrutinize explainers that\u0000generate feature-based explanations from samples or datasets. We start by\u0000presenting a set of desirable properties that explainers would ideally satisfy,\u0000delve into their relationships, and highlight incompatibilities of some of\u0000them. We identify the entire family of explainers that satisfy two key\u0000properties which are compatible with all the others. Its instances provide\u0000sufficient reasons, called weak abductive explanations.We then unravel its\u0000various subfamilies that satisfy subsets of compatible properties. Indeed, we\u0000fully characterize all the explainers that satisfy any subset of compatible\u0000properties. In particular, we introduce the first (broad family of) explainers\u0000that guarantee the existence of explanations and their global consistency.We\u0000discuss some of its instances including the irrefutable explainer and the\u0000surrogate explainer whose explanations can be found in polynomial time.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu
Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method. Specifically, we construct a large-scale, high-quality dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single or multi-turn dialogues. Additionally, we present a robust evaluation method, MMRole-Eval, encompassing eight metrics across three dimensions, where a reward model is trained to score MRPAs with the constructed ground-truth data for comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent. Extensive evaluation results demonstrate the improved performance of MMRole-Agent and highlight the primary challenges in developing MRPAs, emphasizing the need for enhanced multimodal understanding and role-playing consistency. The data, code, and models will be available at https://github.com/YanqiDai/MMRole.
{"title":"MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents","authors":"Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu","doi":"arxiv-2408.04203","DOIUrl":"https://doi.org/arxiv-2408.04203","url":null,"abstract":"Recently, Role-Playing Agents (RPAs) have garnered increasing attention for\u0000their potential to deliver emotional value and facilitate sociological\u0000research. However, existing studies are primarily confined to the textual\u0000modality, unable to simulate humans' multimodal perceptual capabilities. To\u0000bridge this gap, we introduce the concept of Multimodal Role-Playing Agents\u0000(MRPAs), and propose a comprehensive framework, MMRole, for their development\u0000and evaluation, which comprises a personalized multimodal dataset and a robust\u0000evaluation method. Specifically, we construct a large-scale, high-quality\u0000dataset, MMRole-Data, consisting of 85 characters, 11K images, and 14K single\u0000or multi-turn dialogues. Additionally, we present a robust evaluation method,\u0000MMRole-Eval, encompassing eight metrics across three dimensions, where a reward\u0000model is trained to score MRPAs with the constructed ground-truth data for\u0000comparison. Moreover, we develop the first specialized MRPA, MMRole-Agent.\u0000Extensive evaluation results demonstrate the improved performance of\u0000MMRole-Agent and highlight the primary challenges in developing MRPAs,\u0000emphasizing the need for enhanced multimodal understanding and role-playing\u0000consistency. The data, code, and models will be available at\u0000https://github.com/YanqiDai/MMRole.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Susana Hahn, Cedric Martens, Amade Nemes, Henry Otunuya, Javier Romero, Torsten Schaub, Sebastian Schellhorn
We are interested in automating reasoning with and about study regulations, catering to various stakeholders, ranging from administrators, over faculty, to students at different stages. Our work builds on an extensive analysis of various study programs at the University of Potsdam. The conceptualization of the underlying principles provides us with a formal account of study regulations. In particular, the formalization reveals the properties of admissible study plans. With these at end, we propose an encoding of study regulations in Answer Set Programming that produces corresponding study plans. Finally, we show how this approach can be extended to a generic user interface for exploring study plans.
{"title":"Reasoning about Study Regulations in Answer Set Programming","authors":"Susana Hahn, Cedric Martens, Amade Nemes, Henry Otunuya, Javier Romero, Torsten Schaub, Sebastian Schellhorn","doi":"arxiv-2408.04528","DOIUrl":"https://doi.org/arxiv-2408.04528","url":null,"abstract":"We are interested in automating reasoning with and about study regulations,\u0000catering to various stakeholders, ranging from administrators, over faculty, to\u0000students at different stages. Our work builds on an extensive analysis of\u0000various study programs at the University of Potsdam. The conceptualization of\u0000the underlying principles provides us with a formal account of study\u0000regulations. In particular, the formalization reveals the properties of\u0000admissible study plans. With these at end, we propose an encoding of study\u0000regulations in Answer Set Programming that produces corresponding study plans.\u0000Finally, we show how this approach can be extended to a generic user interface\u0000for exploring study plans.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to handle unseen partners. Despite some success, these approaches usually rely on black-box neural networks as the policy function. However, neural networks typically lack interpretability and logic, making the learned policies difficult for partners (e.g., humans) to understand and limiting their generalization ability. These shortcomings hinder the application of reinforcement learning methods in diverse cooperative scenarios.We suggest to represent the agent's policy with an interpretable program. Unlike neural networks, programs contain stable logic, but they are non-differentiable and difficult to optimize.To automatically learn such programs, we introduce Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC). We first define a foundational Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently. To address this, KnowPC integrates an extractor and an reasoner. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge.
{"title":"KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination","authors":"Yin Gu, Qi Liu, Zhi Li, Kai Zhang","doi":"arxiv-2408.04336","DOIUrl":"https://doi.org/arxiv-2408.04336","url":null,"abstract":"Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI\u0000field, which aims to learn an agent to cooperate with an unseen partner in\u0000training environments or even novel environments. In recent years, a popular\u0000ZSC solution paradigm has been deep reinforcement learning (DRL) combined with\u0000advanced self-play or population-based methods to enhance the neural policy's\u0000ability to handle unseen partners. Despite some success, these approaches\u0000usually rely on black-box neural networks as the policy function. However,\u0000neural networks typically lack interpretability and logic, making the learned\u0000policies difficult for partners (e.g., humans) to understand and limiting their\u0000generalization ability. These shortcomings hinder the application of\u0000reinforcement learning methods in diverse cooperative scenarios.We suggest to\u0000represent the agent's policy with an interpretable program. Unlike neural\u0000networks, programs contain stable logic, but they are non-differentiable and\u0000difficult to optimize.To automatically learn such programs, we introduce\u0000Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination\u0000(KnowPC). We first define a foundational Domain-Specific Language (DSL),\u0000including program structures, conditional primitives, and action primitives. A\u0000significant challenge is the vast program search space, making it difficult to\u0000find high-performing programs efficiently. To address this, KnowPC integrates\u0000an extractor and an reasoner. The extractor discovers environmental transition\u0000knowledge from multi-agent interaction trajectories, while the reasoner deduces\u0000the preconditions of each action primitive based on the transition knowledge.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingbin Zeng, Qinglong Yang, Shunan Dong, Heming Du, Liang Zheng, Fengli Xu, Yong Li
This paper considers a scenario in city navigation: an AI agent is provided with language descriptions of the goal location with respect to some well-known landmarks; By only observing the scene around, including recognizing landmarks and road network connections, the agent has to make decisions to navigate to the goal location without instructions. This problem is very challenging, because it requires agent to establish self-position and acquire spatial representation of complex urban environment, where landmarks are often invisible. In the absence of navigation instructions, such abilities are vital for the agent to make high-quality decisions in long-range city navigation. With the emergent reasoning ability of large language models (LLMs), a tempting baseline is to prompt LLMs to "react" on each observation and make decisions accordingly. However, this baseline has very poor performance that the agent often repeatedly visits same locations and make short-sighted, inconsistent decisions. To address these issues, this paper introduces a novel agentic workflow featured by its abilities to perceive, reflect and plan. Specifically, we find LLaVA-7B can be fine-tuned to perceive the direction and distance of landmarks with sufficient accuracy for city navigation. Moreover, reflection is achieved through a memory mechanism, where past experiences are stored and can be retrieved with current perception for effective decision argumentation. Planning uses reflection results to produce long-term plans, which can avoid short-sighted decisions in long-range navigation. We show the designed workflow significantly improves navigation ability of the LLM agent compared with the state-of-the-art baselines.
{"title":"Perceive, Reflect, and Plan: Designing LLM Agent for Goal-Directed City Navigation without Instructions","authors":"Qingbin Zeng, Qinglong Yang, Shunan Dong, Heming Du, Liang Zheng, Fengli Xu, Yong Li","doi":"arxiv-2408.04168","DOIUrl":"https://doi.org/arxiv-2408.04168","url":null,"abstract":"This paper considers a scenario in city navigation: an AI agent is provided\u0000with language descriptions of the goal location with respect to some well-known\u0000landmarks; By only observing the scene around, including recognizing landmarks\u0000and road network connections, the agent has to make decisions to navigate to\u0000the goal location without instructions. This problem is very challenging,\u0000because it requires agent to establish self-position and acquire spatial\u0000representation of complex urban environment, where landmarks are often\u0000invisible. In the absence of navigation instructions, such abilities are vital\u0000for the agent to make high-quality decisions in long-range city navigation.\u0000With the emergent reasoning ability of large language models (LLMs), a tempting\u0000baseline is to prompt LLMs to \"react\" on each observation and make decisions\u0000accordingly. However, this baseline has very poor performance that the agent\u0000often repeatedly visits same locations and make short-sighted, inconsistent\u0000decisions. To address these issues, this paper introduces a novel agentic\u0000workflow featured by its abilities to perceive, reflect and plan. Specifically,\u0000we find LLaVA-7B can be fine-tuned to perceive the direction and distance of\u0000landmarks with sufficient accuracy for city navigation. Moreover, reflection is\u0000achieved through a memory mechanism, where past experiences are stored and can\u0000be retrieved with current perception for effective decision argumentation.\u0000Planning uses reflection results to produce long-term plans, which can avoid\u0000short-sighted decisions in long-range navigation. We show the designed workflow\u0000significantly improves navigation ability of the LLM agent compared with the\u0000state-of-the-art baselines.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of large language models (LLMs) into robotics significantly enhances the capabilities of embodied agents in understanding and executing complex natural language instructions. However, the unmitigated deployment of LLM-based embodied systems in real-world environments may pose potential physical risks, such as property damage and personal injury. Existing security benchmarks for LLMs overlook risk awareness for LLM-based embodied agents. To address this gap, we propose RiskAwareBench, an automated framework designed to assess physical risks awareness in LLM-based embodied agents. RiskAwareBench consists of four modules: safety tips generation, risky scene generation, plan generation, and evaluation, enabling comprehensive risk assessment with minimal manual intervention. Utilizing this framework, we compile the PhysicalRisk dataset, encompassing diverse scenarios with associated safety tips, observations, and instructions. Extensive experiments reveal that most LLMs exhibit insufficient physical risk awareness, and baseline risk mitigation strategies yield limited enhancement, which emphasizes the urgency and cruciality of improving risk awareness in LLM-based embodied agents in the future.
{"title":"RiskAwareBench: Towards Evaluating Physical Risk Awareness for High-level Planning of LLM-based Embodied Agents","authors":"Zihao Zhu, Bingzhe Wu, Zhengyou Zhang, Baoyuan Wu","doi":"arxiv-2408.04449","DOIUrl":"https://doi.org/arxiv-2408.04449","url":null,"abstract":"The integration of large language models (LLMs) into robotics significantly\u0000enhances the capabilities of embodied agents in understanding and executing\u0000complex natural language instructions. However, the unmitigated deployment of\u0000LLM-based embodied systems in real-world environments may pose potential\u0000physical risks, such as property damage and personal injury. Existing security\u0000benchmarks for LLMs overlook risk awareness for LLM-based embodied agents. To\u0000address this gap, we propose RiskAwareBench, an automated framework designed to\u0000assess physical risks awareness in LLM-based embodied agents. RiskAwareBench\u0000consists of four modules: safety tips generation, risky scene generation, plan\u0000generation, and evaluation, enabling comprehensive risk assessment with minimal\u0000manual intervention. Utilizing this framework, we compile the PhysicalRisk\u0000dataset, encompassing diverse scenarios with associated safety tips,\u0000observations, and instructions. Extensive experiments reveal that most LLMs\u0000exhibit insufficient physical risk awareness, and baseline risk mitigation\u0000strategies yield limited enhancement, which emphasizes the urgency and\u0000cruciality of improving risk awareness in LLM-based embodied agents in the\u0000future.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang
We present a novel prompting strategy for artificial intelligence driven digital avatars. To better quantify how our prompting strategy affects anthropomorphic features like humor, authenticity, and favorability we present Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a large language model (LLM) candidate over competitors answering the same or similar prompts. To visualize the responses of our LLM, and the effectiveness of our prompting strategy we propose an end-to-end framework for creating high-fidelity artificial intelligence (AI) driven digital avatars. This pipeline effectively captures an individual's essence for interaction and our streaming algorithm delivers a high-quality digital avatar with real-time audio-video streaming from server to mobile device. Both our visualization tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have state-of-the-art humor, authenticity, and favorability outperforming all competitors and baselines. In the case of our Donald Trump and Joe Biden avatars, their authenticity and favorability are rated higher than even their real-world equivalents.
{"title":"Digital Avatars: Framework Development and Their Evaluation","authors":"Timothy Rupprecht, Sung-En Chang, Yushu Wu, Lei Lu, Enfu Nan, Chih-hsiang Li, Caiyue Lai, Zhimin Li, Zhijun Hu, Yumei He, David Kaeli, Yanzhi Wang","doi":"arxiv-2408.04068","DOIUrl":"https://doi.org/arxiv-2408.04068","url":null,"abstract":"We present a novel prompting strategy for artificial intelligence driven\u0000digital avatars. To better quantify how our prompting strategy affects\u0000anthropomorphic features like humor, authenticity, and favorability we present\u0000Crowd Vote - an adaptation of Crowd Score that allows for judges to elect a\u0000large language model (LLM) candidate over competitors answering the same or\u0000similar prompts. To visualize the responses of our LLM, and the effectiveness\u0000of our prompting strategy we propose an end-to-end framework for creating\u0000high-fidelity artificial intelligence (AI) driven digital avatars. This\u0000pipeline effectively captures an individual's essence for interaction and our\u0000streaming algorithm delivers a high-quality digital avatar with real-time\u0000audio-video streaming from server to mobile device. Both our visualization\u0000tool, and our Crowd Vote metrics demonstrate our AI driven digital avatars have\u0000state-of-the-art humor, authenticity, and favorability outperforming all\u0000competitors and baselines. In the case of our Donald Trump and Joe Biden\u0000avatars, their authenticity and favorability are rated higher than even their\u0000real-world equivalents.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Starting from the Boolean notion of logical proportion in Piaget's sense, which turns out to be equivalent to analogical proportion, this note proposes a definition of analogical proportion between numerical values based on triangular norms (and dual co-norms). Frank's family of triangular norms is particularly interesting from this perspective. The article concludes with a comparative discussion with another very recent proposal for defining analogical proportions between numerical values based on the family of generalized means.
{"title":"Frank's triangular norms in Piaget's logical proportions","authors":"Henri Prade, Gilles Richard","doi":"arxiv-2408.03795","DOIUrl":"https://doi.org/arxiv-2408.03795","url":null,"abstract":"Starting from the Boolean notion of logical proportion in Piaget's sense,\u0000which turns out to be equivalent to analogical proportion, this note proposes a\u0000definition of analogical proportion between numerical values based on\u0000triangular norms (and dual co-norms). Frank's family of triangular norms is\u0000particularly interesting from this perspective. The article concludes with a\u0000comparative discussion with another very recent proposal for defining\u0000analogical proportions between numerical values based on the family of\u0000generalized means.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}