Masahiro Senda, Daiji Iwasa, Teruaki Hayashi, Y. Ohsawa
{"title":"Data Classification by Reducing Bias of Domain-Oriented Knowledge Based on Data Jackets","authors":"Masahiro Senda, Daiji Iwasa, Teruaki Hayashi, Y. Ohsawa","doi":"10.1109/SPIN.2019.8711715","DOIUrl":null,"url":null,"abstract":"In recent years, because of the worldwide trend of big data and AI, cross-disciplinary data exchange and collaboration is one of the social demands. However, data users do not always have sufficient knowledge about data, which prevents from exchanging and utilizing data. The meaning of words depends on the contexts even if the same words are used because of the different background knowledge. It is necessary to bridge the gap between the expertise of the data owners and the requests of data users. To avoid this contextual gap, we propose the classification system to support data users to discover the related categories of data which is learned by the semantic knowledge. We use Data Jackets as the summary of data, and the knowledge base of Wikipedia and word2vec in order to reduce the influence of domain-oriented knowledge. As a result of the experiment, we found that our proposed method got a higher accuracy rate of the classification tasks and the classification was similar to human recognition.","PeriodicalId":344030,"journal":{"name":"2019 6th International Conference on Signal Processing and Integrated Networks (SPIN)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 6th International Conference on Signal Processing and Integrated Networks (SPIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPIN.2019.8711715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, because of the worldwide trend of big data and AI, cross-disciplinary data exchange and collaboration is one of the social demands. However, data users do not always have sufficient knowledge about data, which prevents from exchanging and utilizing data. The meaning of words depends on the contexts even if the same words are used because of the different background knowledge. It is necessary to bridge the gap between the expertise of the data owners and the requests of data users. To avoid this contextual gap, we propose the classification system to support data users to discover the related categories of data which is learned by the semantic knowledge. We use Data Jackets as the summary of data, and the knowledge base of Wikipedia and word2vec in order to reduce the influence of domain-oriented knowledge. As a result of the experiment, we found that our proposed method got a higher accuracy rate of the classification tasks and the classification was similar to human recognition.