Hassanin M. Al-Barhamtoshy, Hanen Himdi, Mohamad Alyahya
{"title":"Arabic Pilgrim Services Dataset: Creating and Analysis","authors":"Hassanin M. Al-Barhamtoshy, Hanen Himdi, Mohamad Alyahya","doi":"10.1109/ICAISC56366.2023.10085561","DOIUrl":null,"url":null,"abstract":"With Countless Arabic news articles published daily; users have become increasingly concerned about obtaining news from credible sources. Nonetheless, to individuals, credible news sources are associated with certain countries where users have faith. Therefore, detecting the source of a news article is imperative to fake news detection and enables users a better trust in their consuming news. This paper introduces to create, filter, analyze, and evaluate a domain services-specific Arabic dataset for pilgrims. The Arabic Pilgrim Services (ArPiS) dataset is a collection of approximately 30,000 news, collected across three different Arabic countries and regions. The paper presents a creation for pilgrims’ opinions measurement services dataset for text mining, text classification, clustering, and text summarization. The default basic search methods start with 124 web sites of Arabic news. Then, many of filtering features have been done to limit the dataset by pilgrim subjected services. A lot of topics are addressed, and a lot of filter with a discussion group have been made with many opinions| and extra comments. The huge of the collected data need some kind of additional effort and more analysis to produce valuable dataset. Balanced dataset is one of this extra effort, we are going to create. Therefore, the collected and annotated dataset represents real news for pilgrims’ services. So, we need to build additional quantity of these data to be fake news. Accordingly, a precondition procedure invoked as a methodology to create and then annotate such dataset.","PeriodicalId":422888,"journal":{"name":"2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 1st International Conference on Advanced Innovations in Smart Cities (ICAISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAISC56366.2023.10085561","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With Countless Arabic news articles published daily; users have become increasingly concerned about obtaining news from credible sources. Nonetheless, to individuals, credible news sources are associated with certain countries where users have faith. Therefore, detecting the source of a news article is imperative to fake news detection and enables users a better trust in their consuming news. This paper introduces to create, filter, analyze, and evaluate a domain services-specific Arabic dataset for pilgrims. The Arabic Pilgrim Services (ArPiS) dataset is a collection of approximately 30,000 news, collected across three different Arabic countries and regions. The paper presents a creation for pilgrims’ opinions measurement services dataset for text mining, text classification, clustering, and text summarization. The default basic search methods start with 124 web sites of Arabic news. Then, many of filtering features have been done to limit the dataset by pilgrim subjected services. A lot of topics are addressed, and a lot of filter with a discussion group have been made with many opinions| and extra comments. The huge of the collected data need some kind of additional effort and more analysis to produce valuable dataset. Balanced dataset is one of this extra effort, we are going to create. Therefore, the collected and annotated dataset represents real news for pilgrims’ services. So, we need to build additional quantity of these data to be fake news. Accordingly, a precondition procedure invoked as a methodology to create and then annotate such dataset.