This project aims to develop a pipeline for emotion detection using video frames. Specifically, we detect and analyze faces present in the video using deep neural networks for emotion recognition. We use a CNN and RNN based on papers submitted to Emotion Recognition In The Wild Challenge. An input video will be broken into small segments. For each segment, we will detect, crop, and align faces. This gives us a sequence of face images. A CNN will extract relevant features for each image in the sequence. These features will be sequentially fed to a RNN which will encode motion and facial expressions to predict emotion. The complete process will be implemented as a Python API with video input and JSON annotation output. Tensorflow, dlib, MTCNN and ffmpeg are used for various tasks in the project.