FFmpeg DNN (deep neural network) module supports the dnn-based filters, it has two backends, one backend is TensorFlow which invokes TensorFlow C library for model loading and inference, the other backend is native. The native backend is a CPU fallback option when the system does not support TensorFlow, and so we can’t introduce 3rd party library for native mode. The native mode is still in early development stage and the performance has not been tuned yet.
This project focuses on the native conv2d layer optimization with c/asm on Intel CPUs. Firstly, we will do some research on how to implement conv2d layer in a way which is convenient to operate parallelly. Secondly, write the corresponding C code for preliminary optimization. Thirdly, add some x86 SIMD optimization for conv2d layer to get better performance.