In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing. Among different types of deep neural networks, convolutional neural networks have been most extensively studied. Due to the lack of training data and computing power in early days, it is hard to train a large high-capacity convolutional neural network without overfitting. After the rapid growth in the amount of the annotated data and the recent improvements in the strengths of graphics processor units, the research on convolutional neural networks has been emerged swiftly and achieved state-of-the-art results on various tasks. GPUs are much more effective in utilizing parallelism and pipelining than general purpose CPUs, as they are designed for high performance rendering where repeated operations are common. This proposal proposes quick and efficient implementation of CNN on both GPU and multi-core CPU. I’ll use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style.