In this project, a Speech-to-Text conversion engine on Chinese is established, resulting in a working application.
There are two leading candidates for idea implementation:
- A Tensorflow implementation for Chinese speech recognition based on DeepMind’s WaveNet. Although WaveNet was designed as a generative model, it can straightforwardly be adapted to discriminative audio tasks such as speech recognition. The paper omitted specific details about the implementation, we can fill the gaps in our own way in this project.
- A Tensorflow implementation for Chinese speech recognition based on Baidu's DeepSpeech. Mozilla's DeepSpeech project is an open source Speech-To-Text engine, using a model trained by machine learning techniques. It is a well-known open source project on Github, therefore we can make our own breakthroughs based on the existing framework in this project.
Since these two architecture are both based on Tensorflow, we can actually refer both of the architectures when designing our own network. In this project, we can use THCHS-30, Chinese news from two CCTV channels, two Hunan regional channels, and one Changsha local channel to train our model.