阿拉伯文文本分类方法:主要研究的系统文献综述

Waleed Alabbas, Haider M. Al-Khateeb, Ali Mansour
{"title":"阿拉伯文文本分类方法:主要研究的系统文献综述","authors":"Waleed Alabbas, Haider M. Al-Khateeb, Ali Mansour","doi":"10.1109/CIST.2016.7805072","DOIUrl":null,"url":null,"abstract":"Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.","PeriodicalId":196827,"journal":{"name":"2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Arabic text classification methods: Systematic literature review of primary studies\",\"authors\":\"Waleed Alabbas, Haider M. Al-Khateeb, Ali Mansour\",\"doi\":\"10.1109/CIST.2016.7805072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.\",\"PeriodicalId\":196827,\"journal\":{\"name\":\"2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIST.2016.7805072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 4th IEEE International Colloquium on Information Science and Technology (CiSt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIST.2016.7805072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

最近关于大数据的研究提出并评估了一些先进的技术,以从万维网上复杂而大量的数据中获得有意义的信息。为了实现准确的文本分析,通常使用文本分类(TC)方法启动一个过程。回顾这一领域的最新文献表明,大多数研究都集中在英语(和其他文字)上,而对阿拉伯文本进行分类的尝试仍然相对非常有限。因此,我们打算撰写第一篇系统文献综述(SLR),严格使用搜索协议来总结用于分类阿拉伯文本的不同TC技术和方法的关键特征,这项工作还旨在识别和分享当前文献差距的科学证据,以帮助提出进一步研究的领域。我们的SLR明确调查经验证据作为包括研究的决定因素,然后得出哪个分类器产生更准确的结果。此外,我们的研究结果表明,阿拉伯语文本缺乏标准化的语料库;作者们自己编撰,大部分工作都集中在现代阿拉伯语上,很少涉及口语阿拉伯语,尽管它在社交媒体网络(如Twitter)中广泛使用。总共调查了1464篇论文,其中包括48项主要研究并进行了分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Arabic text classification methods: Systematic literature review of primary studies
Recent research on Big Data proposed and evaluated a number of advanced techniques to gain meaningful information from the complex and large volume of data available on the World Wide Web. To achieve accurate text analysis, a process is usually initiated with a Text Classification (TC) method. Reviewing the very recent literature in this area shows that most studies are focused on English (and other scripts) while attempts on classifying Arabic texts remain relatively very limited. Hence, we intend to contribute the first Systematic Literature Review (SLR) utilizing a search protocol strictly to summarize key characteristics of the different TC techniques and methods used to classify Arabic text, this work also aims to identify and share a scientific evidence of the gap in current literature to help suggesting areas for further research. Our SLR explicitly investigates empirical evidence as a decision factor to include studies, then conclude which classifier produced more accurate results. Further, our findings identify the lack of standardized corpuses for Arabic text; authors compile their own, and most of the work is focused on Modern Arabic with very little done on Colloquial Arabic despite its wide use in Social Media Networks such as Twitter. In total, 1464 papers were surveyed from which 48 primary studies were included and analyzed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Semantically enhanced term frequency based on word embeddings for Arabic information retrieval Automatic generation of TestNG tests cases from UML sequence diagrams in Scrum process Coordination by sharing demand forecasts in a supply chain using game theoretic approach Robust approach for textured image clustering High speed efficient FPGA implementation of pipelined AES S-Box
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1