{"title":"基于任务完成评价的开发率动态调整研究","authors":"Masashi Sugimoto","doi":"10.17781/P002402","DOIUrl":null,"url":null,"abstract":"Until now, in reinforcement learning, a ratio of a random action as known as exploration often has not been adjusted dynamically. However, this ratio will be an index of performance in the reinforcement learning. In this study, agents learn using information from the evaluation of achievement for task of another agent, will be suggested. From this proposed method, the exploration ratio will be adjusted from other agents’ behavior, dynamically. In Human Life, an “atmosphere” will be existed as a communication method. For example, empirically, people will be influenced by “serious atmosphere,” such as in the situation of working, or take an examination. In this study, this atmosphere as motivation for task achievement of agent will be defined. Moreover, in this study, agent’s action decision when another agent will be solved the task, will be focused on. In other words, an agent will be trying to find an optimal solution if other agents have been found an optimal solution. In this paper, we propose the action decision based on other agent’s behavior. Moreover, in this study, we discuss effectiveness using the maze problem as an example. In particular, “number of task achievement” and “influence for task achievement,” and how to achieve the task quantitative will be focused. As a result, we confirmed that the proposed method is well influenced from other agent’s behavior.","PeriodicalId":211757,"journal":{"name":"International journal of new computer architectures and their applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Study for Dynamically Adjustmentation for Exploitation Rate using Evaluation of Task Achievement\",\"authors\":\"Masashi Sugimoto\",\"doi\":\"10.17781/P002402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Until now, in reinforcement learning, a ratio of a random action as known as exploration often has not been adjusted dynamically. However, this ratio will be an index of performance in the reinforcement learning. In this study, agents learn using information from the evaluation of achievement for task of another agent, will be suggested. From this proposed method, the exploration ratio will be adjusted from other agents’ behavior, dynamically. In Human Life, an “atmosphere” will be existed as a communication method. For example, empirically, people will be influenced by “serious atmosphere,” such as in the situation of working, or take an examination. In this study, this atmosphere as motivation for task achievement of agent will be defined. Moreover, in this study, agent’s action decision when another agent will be solved the task, will be focused on. In other words, an agent will be trying to find an optimal solution if other agents have been found an optimal solution. In this paper, we propose the action decision based on other agent’s behavior. Moreover, in this study, we discuss effectiveness using the maze problem as an example. In particular, “number of task achievement” and “influence for task achievement,” and how to achieve the task quantitative will be focused. As a result, we confirmed that the proposed method is well influenced from other agent’s behavior.\",\"PeriodicalId\":211757,\"journal\":{\"name\":\"International journal of new computer architectures and their applications\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of new computer architectures and their applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17781/P002402\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of new computer architectures and their applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17781/P002402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Study for Dynamically Adjustmentation for Exploitation Rate using Evaluation of Task Achievement
Until now, in reinforcement learning, a ratio of a random action as known as exploration often has not been adjusted dynamically. However, this ratio will be an index of performance in the reinforcement learning. In this study, agents learn using information from the evaluation of achievement for task of another agent, will be suggested. From this proposed method, the exploration ratio will be adjusted from other agents’ behavior, dynamically. In Human Life, an “atmosphere” will be existed as a communication method. For example, empirically, people will be influenced by “serious atmosphere,” such as in the situation of working, or take an examination. In this study, this atmosphere as motivation for task achievement of agent will be defined. Moreover, in this study, agent’s action decision when another agent will be solved the task, will be focused on. In other words, an agent will be trying to find an optimal solution if other agents have been found an optimal solution. In this paper, we propose the action decision based on other agent’s behavior. Moreover, in this study, we discuss effectiveness using the maze problem as an example. In particular, “number of task achievement” and “influence for task achievement,” and how to achieve the task quantitative will be focused. As a result, we confirmed that the proposed method is well influenced from other agent’s behavior.