{"title":"基于GAN网络的强化学习多跳推理方法","authors":"Zhicai Gao, Xiaoze Gong, Yongli Wang","doi":"10.1117/12.2671176","DOIUrl":null,"url":null,"abstract":"At present, the academic community has carried out some research on knowledge reasoning using Reinforcement Learning (RL), which has achieved good results in multi-hop reasoning. However, these methods often need to manually design the reward function to adapt to a specific dataset. For different datasets, the reward function in RL-based methods needs to be manually adjusted to obtain good performance. To solve this problem, an agent training model combined with Generative Adversarial Networks (GAN) is proposed. The model consists of two modules: a generative adversarial inference engine and a sampler. The sampler uses a policy-based bidirectional breadth-first search method to find the demonstration path, and the agent uses the reward considering the information of the neighborhood entities as the initial reward function. After sufficient adversarial training between the agent and the discriminator, the policy-based agent can find evidence paths that match the demonstration distribution and synthesize these evidence paths to make predictions. Experiments show that the model achieves better results in both fact prediction and link prediction tasks.","PeriodicalId":227528,"journal":{"name":"International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning multi-hop reasoning method with GAN network\",\"authors\":\"Zhicai Gao, Xiaoze Gong, Yongli Wang\",\"doi\":\"10.1117/12.2671176\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the academic community has carried out some research on knowledge reasoning using Reinforcement Learning (RL), which has achieved good results in multi-hop reasoning. However, these methods often need to manually design the reward function to adapt to a specific dataset. For different datasets, the reward function in RL-based methods needs to be manually adjusted to obtain good performance. To solve this problem, an agent training model combined with Generative Adversarial Networks (GAN) is proposed. The model consists of two modules: a generative adversarial inference engine and a sampler. The sampler uses a policy-based bidirectional breadth-first search method to find the demonstration path, and the agent uses the reward considering the information of the neighborhood entities as the initial reward function. After sufficient adversarial training between the agent and the discriminator, the policy-based agent can find evidence paths that match the demonstration distribution and synthesize these evidence paths to make predictions. Experiments show that the model achieves better results in both fact prediction and link prediction tasks.\",\"PeriodicalId\":227528,\"journal\":{\"name\":\"International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2671176\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2671176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement learning multi-hop reasoning method with GAN network
At present, the academic community has carried out some research on knowledge reasoning using Reinforcement Learning (RL), which has achieved good results in multi-hop reasoning. However, these methods often need to manually design the reward function to adapt to a specific dataset. For different datasets, the reward function in RL-based methods needs to be manually adjusted to obtain good performance. To solve this problem, an agent training model combined with Generative Adversarial Networks (GAN) is proposed. The model consists of two modules: a generative adversarial inference engine and a sampler. The sampler uses a policy-based bidirectional breadth-first search method to find the demonstration path, and the agent uses the reward considering the information of the neighborhood entities as the initial reward function. After sufficient adversarial training between the agent and the discriminator, the policy-based agent can find evidence paths that match the demonstration distribution and synthesize these evidence paths to make predictions. Experiments show that the model achieves better results in both fact prediction and link prediction tasks.