{"title":"探索科研论文元数据所代表的内容比例","authors":"Shahzad Nazir, M. Asif, Shahbaz Ahmad","doi":"10.1109/ICACS47775.2020.9055955","DOIUrl":null,"url":null,"abstract":"In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.","PeriodicalId":268675,"journal":{"name":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring the Proportion of Content Represented by the Metadata of Research Articles\",\"authors\":\"Shahzad Nazir, M. Asif, Shahbaz Ahmad\",\"doi\":\"10.1109/ICACS47775.2020.9055955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.\",\"PeriodicalId\":268675,\"journal\":{\"name\":\"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)\",\"volume\":\"171 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACS47775.2020.9055955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Advancements in Computational Sciences (ICACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACS47775.2020.9055955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploring the Proportion of Content Represented by the Metadata of Research Articles
In this era, to find out relevant research articles is considered an important task to track the state-of-the-art-work, and it is termed as research paper recommender system. Considering the massive increase in research corpora, the research community has turned its focus towards finding the most relevant research papers. Researchers have adopted different techniques that are bibliographic information based, content-based, and collaborative filtering based. The most common approach for the research paper recommender system is content-based. According to a survey, 55% of research paper recommender systems use a content-based approach. On the other hand, due to the unavailability of the full text of research papers, researchers started utilizing the Meta-data. But it is still unclear that what proportion of full content can be represented by the Meta-data. This research explored the significant portion of the full content contained by the Metadata of research articles. We applied two different techniques; in the first technique, we implemented the TF-IDF over Metadata and full content and considered the intersection of key terms. Secondly, similarity scores of Meta-data and full content were calculated by applying cosine similarity. This approach was assessed on a dataset of 271 research articles that were automatically downloaded from CiteseerX. The results revealed that the Meta-data of research articles could effectively represent the 47% proportion.