{"title":"Syllabification and parameter optimisation in Zulu to English machine translation","authors":"G. Kotzé, Friedel Wolff","doi":"10.18489/SACJ.V0I57.323","DOIUrl":null,"url":null,"abstract":"We present a series of experiments involving the machine translation of Zulu to English using a well-known statistical software system. Due to morphological complexity and relative scarcity of resources, the case of Zulu is challenging. Against a selection of baseline models, we show that a relatively naive approach of dividing Zulu words into syllables leads to a surprising improvement. We further improve on this model through manual configuration changes. Our best model significantly outperforms the baseline models (BLEU measure, at p < 0.001) even when they are optimised to a similar degree, only falling short of the well-known Morfessor morphological analyser that makes use of relatively sophisticated algorithms. These experiments suggest that even a simple optimisation procedure can improve the quality of this approach to a significant degree. This is promising particularly because it improves on a mostly language independent approach — at least within the same language family. Our work also drives the point home that sub-lexical alignment for Zulu is crucial for improved translation quality.","PeriodicalId":55859,"journal":{"name":"South African Computer Journal","volume":"57 1","pages":"1-23"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"South African Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18489/SACJ.V0I57.323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 24
Abstract
We present a series of experiments involving the machine translation of Zulu to English using a well-known statistical software system. Due to morphological complexity and relative scarcity of resources, the case of Zulu is challenging. Against a selection of baseline models, we show that a relatively naive approach of dividing Zulu words into syllables leads to a surprising improvement. We further improve on this model through manual configuration changes. Our best model significantly outperforms the baseline models (BLEU measure, at p < 0.001) even when they are optimised to a similar degree, only falling short of the well-known Morfessor morphological analyser that makes use of relatively sophisticated algorithms. These experiments suggest that even a simple optimisation procedure can improve the quality of this approach to a significant degree. This is promising particularly because it improves on a mostly language independent approach — at least within the same language family. Our work also drives the point home that sub-lexical alignment for Zulu is crucial for improved translation quality.
期刊介绍:
The South African Computer Journal is specialist ICT academic journal, accredited by the South African Department of Higher Education and Training SACJ publishes research articles, viewpoints and communications in English in Computer Science and Information Systems.