WANG Lei, ZHAO Yinghai, YANG Guoshun, WANG Ruoqi
Combined the big data acquisition, the key technologies of deep neural network have widely applied in the field of image classification, object detection, speech recognition, natural language processing, et al. With the developing of the deep neural network model performance, the model size and the required calculation need to be improved, so that it is reliance on high power computing platform. This paper is focus on the deep neural network model compression technology for embedded applications in order to solve the problems of storage resource, memory access speed constraints and computing resources limit in embedded system. It aims to reduce the model size and the complex computation. Meanwhile, it could optimize the process of calculation. This paper has summarized the state-of-the-art model compression technologies including model pruning, fine model designing, tensor decomposition, model quantization, etc. Through the summary on the model development, it could provide the references for the studies of the deep neural network model compression technologies.