DEVELOPMENT OF A MACHINE TRANSLATION SYSTEM FOR ENGLISH, IGBO AND YORÙBÁ LANGUAGES

AYOGU, IGNATIUS IKECHUKWU

dc.contributor.author	AYOGU, IGNATIUS IKECHUKWU
dc.date.accessioned	2021-08-09T09:46:18Z
dc.date.available	2021-08-09T09:46:18Z
dc.date.issued	2018-07
dc.identifier.uri	http://196.220.128.81:8080/xmlui/handle/123456789/4439
dc.description	PhD THESIS	en_US
dc.description.abstract	Machine translation (MT) describes the process by which computer software is used to express the meaning of some text or utterances in a given human language in another human language. Though MT research has developed high quality translation software systems for the translation of English and other privileged languages, research into the development of translation systems for Nigerian languages has not yielded demonstrable practical results. The aim of this research was therefore to develop prototype MT systems for English-Igbo, English-Yorùbá and Igbo-Yorùbá language pairs using data-driven, scalable, language-independent techniques. In order to realize the MT objectives, this research also developed part-of-speech tagging systems for Igbo and Yorùbá languages using algorithms from the Markov random family. A research-size parallel corpus was created for each language pair and used to train and evaluate the respective MT system in a two-stage experimental setup in which the baseline Phrase-based Statistical Machine Translation (Pb-SMT) systems were first developed, experimented and evaluated before being improved upon through linguistic enrichment by incorporating of part-of-speech information into the translation model. The baseline Pb-SMT systems were trained using plain, unannotated parallel text, attaining BLEU scores of 30.17, 29.64 and 19.02 respectively for English-Igbo, English-Yorùbá and Igbo-Yorùbá MT systems. Error analyses of these baseline systems revealed that a greater percentage of errors occurred at the lexical, grammatical and semantic levels for each of the three translation systems. To improve upon the baseline translation performance, the Pb-SMT models were enriched with part-of-speech information using the factored Pb-SMT framework in a number of investigative experimental configurations involving direct translations and single-step procedures that combines a generation step respectively. The best BLEU score gains for the factored models utilizing only the direct translation step was found to be 0.45 and 0.40 respectively for English-Igbo and English-Yorùbá systems while the best gain for models combining translation and generation steps with multiple decoding paths was 0.56 BLEU points for the English- Igbo system. An evaluation of the best models were made against GoogleTranslate; the results showed	en_US
dc.description.sponsorship	FEDRAL UNIVERSITY OF TECHNOLOGY AKURE	en_US
dc.language.iso	en	en_US
dc.publisher	FEDERAL UNIVERSITY OF TECHNOLOGY AKURE	en_US
dc.subject	MACHINE TRANSLATION SYSTEM	en_US
dc.subject	Linguistic communication	en_US
dc.title	DEVELOPMENT OF A MACHINE TRANSLATION SYSTEM FOR ENGLISH, IGBO AND YORÙBÁ LANGUAGES	en_US
dc.type	Thesis	en_US