{"title":"Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof","authors":"Thierno Ibrahima Ciss'e, F. Sadat","doi":"10.48550/arXiv.2305.12694","DOIUrl":null,"url":null,"abstract":"This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker’s performance showed a predictive accuracy of 98.31% and a suggestion accuracy of 93.33%.Our primary focus remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.","PeriodicalId":48510,"journal":{"name":"International Journal of Rail Transportation","volume":"102 1","pages":"1-10"},"PeriodicalIF":3.4000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Rail Transportation","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.48550/arXiv.2305.12694","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker’s performance showed a predictive accuracy of 98.31% and a suggestion accuracy of 93.33%.Our primary focus remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.
期刊介绍:
The unprecedented modernization and expansion of rail transportation system will require substantial new efforts in scientific research for field-deployable technologies. The International Journal of Rail Transportation (IJRT) aims to provide an open forum for scientists, researchers, and engineers in the world to promote the exchange of the latest scientific and technological innovations in rail transportation; and to advance the state-of-the-art engineering and practices for various types of rail based transportation systems. IJRT covers all main areas of rail vehicle, infrastructure, traction power, operation, communication, and environment. The journal publishes original, significant articles on topics in dynamics and mechanics of rail vehicle, track, and bridge system; planning and design, construction, operation, inspection, and maintenance of rail infrastructure; train operation, control, scheduling and management; rail electrification; signalling and communication; and environmental impacts such as vibration and noise. The editorial policy of the new journal will abide by the highest level of standards in research rigor, ethics, and academic freedom. All published articles in IJRT have undergone rigorous peer review, based on initial editor screening and anonymous refereeing by independent experts. There are no page charges and colour figures are included in the online edition free of charge.