| dc.contributor.author | AGBOLADE, OLAIDE AYODEJI | |
| dc.date.accessioned | 2020-12-02T11:28:11Z | |
| dc.date.available | 2020-12-02T11:28:11Z | |
| dc.date.issued | 2016-05 | |
| dc.identifier.uri | http://196.220.128.81:8080/xmlui/handle/123456789/2087 | |
| dc.description.abstract | A review of most voice conversion algorithms in use today shows that new methods are needed to solve the problem of oversmoothing. Oversmoothing in parametric speech synthesis occurs as a result of losses in the spectral details of the synthesized speech thus making it sound muffled, unnatural and too distant from the targeted speech. Losses in spectral details are attributed to the deficiency in the representation of speech signal. The aim of this research is to develop an algorithm that alleviates oversmoothing challenges in voice conversion. Both linear predictive coding (LPC) and line spectral frequency (LSF) coefficients were used to parametrize the source’s speech signals before being mapped into the acoustic vector space of the target speaker. Non-linear mapping ability of neural network was employed in mapping the speech coefficients. LPC representation of the speech proved sufficient for direct coefficient mapping for both parallel and non-parallel utterances. The result for the non-parallel mapping revealed that vowel sounds are the major contributor to the success of the conversion process. For the system to be able to make generalization for utterances that has not been made by the source speaker, neural network was adapted. Training LPC coefficients with neural network training yielded a very poor result due to the instability of the LPC filter poles. To guarantee the stability of the filter poles and retention of spectral details, the LPC coefficients were converted to the line spectral frequency coefficients before been trained by a 2-layer neural network. The result of the neural network mapping for a single word shows a significant 25 percent improvement over the coefficient mapping technique. The algorithm was further tested with words that were not part of the training data. Cepstral distance measurement for this also shows a 35.7 percent reduction in the spectral distance between the target and the converted speech which shows slight improvement over most previously reviewed algorithms. | en_US |
| dc.description.sponsorship | FUTA | en_US |
| dc.language.iso | en | en_US |
| dc.publisher | The federal university of technology,Akure. | en_US |
| dc.subject | Voice conversion (voice cloning | en_US |
| dc.subject | communication process | en_US |
| dc.subject | voice manipulation technique | en_US |
| dc.title | DEVELOPMENT OF HYBRID VOICE CLONING ALGORITHM USING NEURAL NETWORK AND COEFFICIENT MAPPING | en_US |
| dc.type | Thesis | en_US |