Details of deep learning models and their performance are, unfortunately, often published without accompanying implementation code. Those models that come from speech recognition seem to be particularly susceptible to this phenomenon. The present project proposes to create a working implementation of a speech recognition model using the Flux library for the Julia programming language and contribute its code to the Flux model zoo. The model to be implemented is Zhang et al.'s (2017) model from their paper "Towards end-to-end speech recognition with deep convolutional neural networks." Due to being implemented using only convolutional layers, this model will be lighter to train than previous models that have used heavier recurrent layers, while still achieving state-of-the-art performance. Having a working implementation of this network will be a step forward in opening the culture of automatic speech recognition. As a result, newcomers to the field will have a recent example to look at for inspiration, which is paramount because there are not many novice-friendly resources available for doing speech recognition research.



Matthew C. Kelley


  • Mike Innes
  • Christopher Rackauckas