{"title":"主动学习在文档分类中的不同场景和查询策略","authors":"Zeynep Yetiştiren, Can Özbey, Hakki Eren Arkangil","doi":"10.1109/UBMK52708.2021.9558925","DOIUrl":null,"url":null,"abstract":"Nowadays, machine learning and deep learning models are used in many fields and giving promising results. Large amounts of labeled data are needed to increase the performance of these models, which become more complex and growing as technology advances. Although a large amount of data is produced every day, labeling this data is a major challenge in the development of these models, as it takes a lot of time and is costly. Active learning is a semi-supervised learning method which helps us overcome this problem. The purpose of active learning is to select and label the most informative examples from unlabeled data. Therefore, same success is achieved with less labeled data. At this stage, it has been observed that query strategies greatly affect the increase in accuracy, and this fact makes us think that the accuracy may increase further if new query strategies are used. In this study, we compare the cosine similarity strategy that we propose with different scenarios, as well as classical query strategies that measure the informativeness of the data. However, higher accuracy increase comparing to classical query strategies could not be observed.","PeriodicalId":106516,"journal":{"name":"2021 6th International Conference on Computer Science and Engineering (UBMK)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Different Scenarios and Query Strategies in Active Learning for Document Classification\",\"authors\":\"Zeynep Yetiştiren, Can Özbey, Hakki Eren Arkangil\",\"doi\":\"10.1109/UBMK52708.2021.9558925\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, machine learning and deep learning models are used in many fields and giving promising results. Large amounts of labeled data are needed to increase the performance of these models, which become more complex and growing as technology advances. Although a large amount of data is produced every day, labeling this data is a major challenge in the development of these models, as it takes a lot of time and is costly. Active learning is a semi-supervised learning method which helps us overcome this problem. The purpose of active learning is to select and label the most informative examples from unlabeled data. Therefore, same success is achieved with less labeled data. At this stage, it has been observed that query strategies greatly affect the increase in accuracy, and this fact makes us think that the accuracy may increase further if new query strategies are used. In this study, we compare the cosine similarity strategy that we propose with different scenarios, as well as classical query strategies that measure the informativeness of the data. However, higher accuracy increase comparing to classical query strategies could not be observed.\",\"PeriodicalId\":106516,\"journal\":{\"name\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Computer Science and Engineering (UBMK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UBMK52708.2021.9558925\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Computer Science and Engineering (UBMK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UBMK52708.2021.9558925","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Different Scenarios and Query Strategies in Active Learning for Document Classification
Nowadays, machine learning and deep learning models are used in many fields and giving promising results. Large amounts of labeled data are needed to increase the performance of these models, which become more complex and growing as technology advances. Although a large amount of data is produced every day, labeling this data is a major challenge in the development of these models, as it takes a lot of time and is costly. Active learning is a semi-supervised learning method which helps us overcome this problem. The purpose of active learning is to select and label the most informative examples from unlabeled data. Therefore, same success is achieved with less labeled data. At this stage, it has been observed that query strategies greatly affect the increase in accuracy, and this fact makes us think that the accuracy may increase further if new query strategies are used. In this study, we compare the cosine similarity strategy that we propose with different scenarios, as well as classical query strategies that measure the informativeness of the data. However, higher accuracy increase comparing to classical query strategies could not be observed.