Andre Sihombing, Ariani Indrawati, Aris Yaman, Cahyo Trianggoro, L. Manik, Zaenal Akbar
{"title":"A scientific expertise classification model based on experts’ self-claims using the semantic and the TF-IDF approach","authors":"Andre Sihombing, Ariani Indrawati, Aris Yaman, Cahyo Trianggoro, L. Manik, Zaenal Akbar","doi":"10.1145/3575882.3575940","DOIUrl":null,"url":null,"abstract":"It is difficult to understand a scientific domain’s structure and extract specific information from it. A lot of human work is needed to achieve this goal. Based on previous studies, most of the data sets used in identifying the scientific expertise of academia are obtained through the information in the metadata and the contents of the papers written by academia. Therefore, machine learning tools should be utilized to accurately represent how knowledge has been arranged and presented up to this point. In this research, we compare semantic analysis approaches (Latent Dirichlet Allocation/ LDA and knowledge graph / KG) and non-explainable variables (TF-IDF) in identifying categories of scientific expertise. Dataset used based on scientific expertise self-claims written organically by academia which has not been widely studied in previous studies. The TF-IDF approach can provide better classification model accuracy results because its character only looks at the level of word importance (word relevance). However, this approach does not give meaning to the independent variable. It is also supported by the dataset with single part of speech condition. Meanwhile, the semantic analysis approach can provide meaning and relation to form the topic or cluster graph, even with a lower accuracy value.","PeriodicalId":367340,"journal":{"name":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on Computer, Control, Informatics and Its Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575882.3575940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
It is difficult to understand a scientific domain’s structure and extract specific information from it. A lot of human work is needed to achieve this goal. Based on previous studies, most of the data sets used in identifying the scientific expertise of academia are obtained through the information in the metadata and the contents of the papers written by academia. Therefore, machine learning tools should be utilized to accurately represent how knowledge has been arranged and presented up to this point. In this research, we compare semantic analysis approaches (Latent Dirichlet Allocation/ LDA and knowledge graph / KG) and non-explainable variables (TF-IDF) in identifying categories of scientific expertise. Dataset used based on scientific expertise self-claims written organically by academia which has not been widely studied in previous studies. The TF-IDF approach can provide better classification model accuracy results because its character only looks at the level of word importance (word relevance). However, this approach does not give meaning to the independent variable. It is also supported by the dataset with single part of speech condition. Meanwhile, the semantic analysis approach can provide meaning and relation to form the topic or cluster graph, even with a lower accuracy value.