Ali Arshad, Wanghu Chen, Yang Liu, Nauman Ali Khan
{"title":"Semantic Keywords Extraction from Paper Abstract in the Domain of Educational Big Data to support Topic Clustering","authors":"Ali Arshad, Wanghu Chen, Yang Liu, Nauman Ali Khan","doi":"10.1109/ICoDT255437.2022.9787427","DOIUrl":null,"url":null,"abstract":"Keywords are the list of valuable words present in a paragraph, that help in quickly understanding the context of the paragraph. These keywords hold the generic and overall meaning of the paragraph. Extraction of valid and meaningful keywords from scientific documents became one of the hot topics for researchers. Such research not only facilitates better comprehension of articles but also explores the scientific manner of understanding big repositories of scientific documents. In this study, we propose Semantic keyword extraction by adding a new feature that includes domain-specific grammar rules and deduction of adjectives. Our algorithm incorporates frequencies of keywords that are appearing repeatedly. The proposed frame-work extracts the keywords from the scientific paper abstract to support topic clustering. Such topic clustering benefits the new researchers to easily and quickly find their research topic in the concerned field of educational big data. We have selected the educational big dataset that includes 1028 published research papers regarding education learning, education management, students’ information system, etc. For evaluating the results and performance of a Semantic Keyword Extractor, we have used a general dataset. The proposed keyword extractor gives a precision of 76.8% which outperforms other keywords extractors. In our research, our proposed framework classified scientific papers into 3 meaningful groups by using an unsupervised machine learning clustering technique called k-means.","PeriodicalId":291030,"journal":{"name":"2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICoDT255437.2022.9787427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Keywords are the list of valuable words present in a paragraph, that help in quickly understanding the context of the paragraph. These keywords hold the generic and overall meaning of the paragraph. Extraction of valid and meaningful keywords from scientific documents became one of the hot topics for researchers. Such research not only facilitates better comprehension of articles but also explores the scientific manner of understanding big repositories of scientific documents. In this study, we propose Semantic keyword extraction by adding a new feature that includes domain-specific grammar rules and deduction of adjectives. Our algorithm incorporates frequencies of keywords that are appearing repeatedly. The proposed frame-work extracts the keywords from the scientific paper abstract to support topic clustering. Such topic clustering benefits the new researchers to easily and quickly find their research topic in the concerned field of educational big data. We have selected the educational big dataset that includes 1028 published research papers regarding education learning, education management, students’ information system, etc. For evaluating the results and performance of a Semantic Keyword Extractor, we have used a general dataset. The proposed keyword extractor gives a precision of 76.8% which outperforms other keywords extractors. In our research, our proposed framework classified scientific papers into 3 meaningful groups by using an unsupervised machine learning clustering technique called k-means.