I. Chugunkov, Dmitry V. Kabak, Viktor N. Vyunnikov, R. E. Aslanov
{"title":"从开放资源创建数据集","authors":"I. Chugunkov, Dmitry V. Kabak, Viktor N. Vyunnikov, R. E. Aslanov","doi":"10.1109/EICONRUS.2018.8317091","DOIUrl":null,"url":null,"abstract":"Machine learning is one of the fastest growing spheres in IT, but it still has some fundamental problems. Before training a neural network, it's necessary to collect a vast dataset of marked entries. However, manual collection of information takes a lot of time and resources. That is why one of the hardest problems to solve in deep learning is the problem of getting the right data with the proper tags. This paper aims at methods that allow to automatically create or update the marked dataset for building a car model classifier by the parser of known Internet sources, which uses a simple classifier to delete incorrect data. The main goal of this article is to prove that public sources can be used to collect the correctly selected and marked data.","PeriodicalId":6562,"journal":{"name":"2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)","volume":"148 1","pages":"295-297"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Creation of datasets from open sources\",\"authors\":\"I. Chugunkov, Dmitry V. Kabak, Viktor N. Vyunnikov, R. E. Aslanov\",\"doi\":\"10.1109/EICONRUS.2018.8317091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning is one of the fastest growing spheres in IT, but it still has some fundamental problems. Before training a neural network, it's necessary to collect a vast dataset of marked entries. However, manual collection of information takes a lot of time and resources. That is why one of the hardest problems to solve in deep learning is the problem of getting the right data with the proper tags. This paper aims at methods that allow to automatically create or update the marked dataset for building a car model classifier by the parser of known Internet sources, which uses a simple classifier to delete incorrect data. The main goal of this article is to prove that public sources can be used to collect the correctly selected and marked data.\",\"PeriodicalId\":6562,\"journal\":{\"name\":\"2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)\",\"volume\":\"148 1\",\"pages\":\"295-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EICONRUS.2018.8317091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EICONRUS.2018.8317091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine learning is one of the fastest growing spheres in IT, but it still has some fundamental problems. Before training a neural network, it's necessary to collect a vast dataset of marked entries. However, manual collection of information takes a lot of time and resources. That is why one of the hardest problems to solve in deep learning is the problem of getting the right data with the proper tags. This paper aims at methods that allow to automatically create or update the marked dataset for building a car model classifier by the parser of known Internet sources, which uses a simple classifier to delete incorrect data. The main goal of this article is to prove that public sources can be used to collect the correctly selected and marked data.