DEVELOPMENT OF HYBRID VOICE CLONING ALGORITHM USING NEURAL NETWORK AND COEFFICIENT MAPPING

AGBOLADE, OLAIDE AYODEJI

FUTAspace Home
→
School of Engineering & Engineering Technology (SEET)
→
Electrical & Electronics Engineering
→
Master's/Ph.D Thesis
→
View Item

dc.contributor.author	AGBOLADE, OLAIDE AYODEJI
dc.date.accessioned	2020-12-02T11:28:11Z
dc.date.available	2020-12-02T11:28:11Z
dc.date.issued	2016-05
dc.identifier.uri	http://196.220.128.81:8080/xmlui/handle/123456789/2087
dc.description.abstract	A review of most voice conversion algorithms in use today shows that new methods are needed to solve the problem of oversmoothing. Oversmoothing in parametric speech synthesis occurs as a result of losses in the spectral details of the synthesized speech thus making it sound muffled, unnatural and too distant from the targeted speech. Losses in spectral details are attributed to the deficiency in the representation of speech signal. The aim of this research is to develop an algorithm that alleviates oversmoothing challenges in voice conversion. Both linear predictive coding (LPC) and line spectral frequency (LSF) coefficients were used to parametrize the source’s speech signals before being mapped into the acoustic vector space of the target speaker. Non-linear mapping ability of neural network was employed in mapping the speech coefficients. LPC representation of the speech proved sufficient for direct coefficient mapping for both parallel and non-parallel utterances. The result for the non-parallel mapping revealed that vowel sounds are the major contributor to the success of the conversion process. For the system to be able to make generalization for utterances that has not been made by the source speaker, neural network was adapted. Training LPC coefficients with neural network training yielded a very poor result due to the instability of the LPC filter poles. To guarantee the stability of the filter poles and retention of spectral details, the LPC coefficients were converted to the line spectral frequency coefficients before been trained by a 2-layer neural network. The result of the neural network mapping for a single word shows a significant 25 percent improvement over the coefficient mapping technique. The algorithm was further tested with words that were not part of the training data. Cepstral distance measurement for this also shows a 35.7 percent reduction in the spectral distance between the target and the converted speech which shows slight improvement over most previously reviewed algorithms.	en_US
dc.description.sponsorship	FUTA	en_US
dc.language.iso	en	en_US
dc.publisher	The federal university of technology,Akure.	en_US
dc.subject	Voice conversion (voice cloning	en_US
dc.subject	communication process	en_US
dc.subject	voice manipulation technique	en_US
dc.title	DEVELOPMENT OF HYBRID VOICE CLONING ALGORITHM USING NEURAL NETWORK AND COEFFICIENT MAPPING	en_US
dc.type	Thesis	en_US