{"title":"Improved Letter Weighting Feature Selection on Arabic Script Language Identification","authors":"Choon-Ching Ng, A. Selamat","doi":"10.1109/ACIIDS.2009.33","DOIUrl":null,"url":null,"abstract":"Language identification is the process identifying predefined language in a document automatically; we focused on the web documents in this paper. Initially, we have applied the letter frequency as features combine with neural networks in Arabic script language identification. However, reliability of selected letters of the features is a major issue to be overcome. Therefore, we propose an improved letter weighting feature selection in order to enhance the effectiveness of language identification. It is based on the concept letter frequency document frequency. From the experiments, we have found that the improved letter weighting feature selection achieve the highest accuracy 99.75% on Arabic script language identification.","PeriodicalId":275776,"journal":{"name":"2009 First Asian Conference on Intelligent Information and Database Systems","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 First Asian Conference on Intelligent Information and Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIIDS.2009.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Language identification is the process identifying predefined language in a document automatically; we focused on the web documents in this paper. Initially, we have applied the letter frequency as features combine with neural networks in Arabic script language identification. However, reliability of selected letters of the features is a major issue to be overcome. Therefore, we propose an improved letter weighting feature selection in order to enhance the effectiveness of language identification. It is based on the concept letter frequency document frequency. From the experiments, we have found that the improved letter weighting feature selection achieve the highest accuracy 99.75% on Arabic script language identification.