Abstract
Objective: Image co-segmentation refers to segmenting common objects from image groups that contain the same or similar objects (foregrounds). Deep neural networks are widely used in this task given their excellent segmentation results. The end-to-end Siamese network is one of the most effective networks for image co-segmentation. However, this network has huge computational costs, which greatly limit its applications. Therefore, network compression is required. Although various network compression methods have been presented in the literature, they are mainly designed for single-branch networks and do not consider the characteristics of a Siamese network. To this end, we propose a novel network compression method specifically for Siamese networks. Method: The proposed method transfers the important knowledge of a large network to a compressed small network. This method involves three steps. First, we acquire the important knowledge of the large network. To fulfill such task, we develop a binary attention mechanism that is applied to each stage of the encode module of the Siamese network. This mechanism maintains the features of common objects and eliminates the features of non-common objects in two images. As a result, the response of each stage of the Siamese network is represented as a matrix with sparse channels. We map this sparse response matrix to a dense matrix with smaller channel dimensions through a 1 × 1 kernel size convolution layer. This dense matrix represents the important knowledge of the large network. Second, we build a small network structure. As described in the first step, the number of channels used to represent the knowledge in each stage of a large network can be reduced. Accordingly, the number of channels in each convolution and normalization layers included in each stage can also be reduced. Therefore, we reconstruct each stage of the large network according to the channel dimensions of the dense matrix obtained in the first step to determine the final small network structure. Third, we transfer the knowledge from the large network to the compressed small network. We propose a two-step knowledge distillation method to implement this step. First, the output of each stage/deconvolutional layer of the large network is used as the supervision information. We calculate the Euclidean distance between the middle-layer outputs of the large and small networks as our loss function to guide the training of the small network. This loss function is designed to make sure that the middle-layer outputs of the small and large networks are as similar as possible at the end of the first training stage. Second, we compute the dice loss between the network output and the real label to guide the final refining of the small network and to further improve the segmentation accuracy. Result: We perform two groups of experiments on three datasets, namely MLMR-COS, Internet, and iCoseg. MLMR-COS has a large scale of images with pixel-wise ground truth. An ablation study is performed on this dataset to verify the rationality of the proposed method. Meanwhile, although Internet and iCoseg are commonly used datasets for co-segmentation, they are too small to be used as training sets for methods based on deep learning. Therefore, we train our network on a training set generated by Pascal VOC 2012 and MSRC before testing it on the Internet and iCoseg to verify its effectiveness. Experimental results show that the proposed method can reduce the size of the original Siamese network by 3.3 times thereby significantly reducing the required amount of computation. Moreover, compared with the existing co-segmentation methods based on deep learning, the proposed method can significantly reduce the amount of computation required in a compressed network. The segmentation accuracy of this compressed network on three datasets is close to the stat of the art. On the MLMR-COS dataset, this compressed small network obtains an average Jaccard index that is 0.07% higher than that of the original large network. Meanwhile, on the Internet and iCoseg datasets, we compare the compressed network with 12 traditional supervised/unsupervised image co-segmentation methods and 3 co-segmentation methods based on deep learning. On the Internet dataset, the compressed network has a Jaccard index that is 5% than the those of traditional image segmentation methods and existing co-segmentation methods based on deep learning. On the iCoseg dataset with relatively complex images, the segmentation accuracy of the compressed small network is slightly lower than those of the other methods. Conclusion: We propose a network compression method by combining binary attention mechanism and knowledge distillation and apply it to a Siamese network for image co-segmentation. This network significantly reduces the amount of calculation and parameters in Siamese networks and is similar to the state-of-the-art methods in terms of co-segmentation performance.
Translated title of the contribution | Combining attention mechanism and knowledge distillation for Siamese network compression |
---|---|
Original language | Chinese (Traditional) |
Pages (from-to) | 2563-2577 |
Number of pages | 15 |
Journal | Journal of Image and Graphics |
Volume | 25 |
Issue number | 12 |
DOIs | |
Publication status | Published - 16 Dec 2020 |