Abstract：Most video representation methods are supervised in the field of computer vision, requiring large amounts of labeled training video sets which is expensive to scale up to rapidly growing data. To solve this problem, this paper proposes an unsupervised video representation method using deep convolutional neural network. The improved dense trajectory (iDT) is utilized to extract the video blocks which alternately train the convolutional neural network and clusters. The deep convolutional neural network model is trained by iteratively algorithm to get the unsupervised video representations. The proposed model is applied to extract features in HMDB 51 and CCV datasets for tasks of motion recognition and event detection respectively. In the experiments, a 62.6% mean accuracy and a 43.6% mean average prevision (mAP) are obtained respectively which proves the effectiveness of the proposed method.