DEVELOPMENT OF HYBRID VOICE CLONING ALGORITHM USING NEURAL NETWORK AND COEFFICIENT MAPPING

Show simple item record

dc.contributor.author AGBOLADE, OLAIDE AYODEJI
dc.date.accessioned 2020-12-02T11:28:11Z
dc.date.available 2020-12-02T11:28:11Z
dc.date.issued 2016-05
dc.identifier.uri http://196.220.128.81:8080/xmlui/handle/123456789/2087
dc.description.abstract A review of most voice conversion algorithms in use today shows that new methods are needed to solve the problem of oversmoothing. Oversmoothing in parametric speech synthesis occurs as a result of losses in the spectral details of the synthesized speech thus making it sound muffled, unnatural and too distant from the targeted speech. Losses in spectral details are attributed to the deficiency in the representation of speech signal. The aim of this research is to develop an algorithm that alleviates oversmoothing challenges in voice conversion. Both linear predictive coding (LPC) and line spectral frequency (LSF) coefficients were used to parametrize the source’s speech signals before being mapped into the acoustic vector space of the target speaker. Non-linear mapping ability of neural network was employed in mapping the speech coefficients. LPC representation of the speech proved sufficient for direct coefficient mapping for both parallel and non-parallel utterances. The result for the non-parallel mapping revealed that vowel sounds are the major contributor to the success of the conversion process. For the system to be able to make generalization for utterances that has not been made by the source speaker, neural network was adapted. Training LPC coefficients with neural network training yielded a very poor result due to the instability of the LPC filter poles. To guarantee the stability of the filter poles and retention of spectral details, the LPC coefficients were converted to the line spectral frequency coefficients before been trained by a 2-layer neural network. The result of the neural network mapping for a single word shows a significant 25 percent improvement over the coefficient mapping technique. The algorithm was further tested with words that were not part of the training data. Cepstral distance measurement for this also shows a 35.7 percent reduction in the spectral distance between the target and the converted speech which shows slight improvement over most previously reviewed algorithms. en_US
dc.description.sponsorship FUTA en_US
dc.language.iso en en_US
dc.publisher The federal university of technology,Akure. en_US
dc.subject Voice conversion (voice cloning en_US
dc.subject communication process en_US
dc.subject voice manipulation technique en_US
dc.title DEVELOPMENT OF HYBRID VOICE CLONING ALGORITHM USING NEURAL NETWORK AND COEFFICIENT MAPPING en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search FUTAspace


Advanced Search

Browse

My Account