1. Introduction
“Lazy snapping” is an interactive image segmentation technique. It separates coarse and fine scale processing, making object specification and detailed adjustment easy. Moreover, Lazy Snapping provides instant visual feedback, snapping the cutout contour to the true object boundary efficiently despite the presence of ambiguous or low contrast edges. Instant feedback is made possible by a novel image segmentation algorithm which combines graph cut with pre-computed over-segmentation. In this project, we will implement this algorithm and evaluate its performance.
2. Project Objective
Understand the graph cut algorithm as an efficient minimization tool.
Be able to segment the foreground of an image through Lazy Snapping technique.
In this project, we are provided with several example pictures each with some given foreground pixels (marked in red) and background pixels (marked in blue). Our program is required to compute the precise segmentation boundary of the foreground object.
3. Algorithm and Implementation Details
In the Lazy Snapping paper, the image is initially segmented using a watershed algorithm. In my implementation, I’ve simplified this by merely downsampling the image using Matlab’s imresize function. After scaling down the image (I settled on a scale of 1/8), it is passed into the following graph cutting algorithm.
We use the max-flow min-cut algorithm, like in the previous graph cut project. However, the values plugged in are changed significantly. Ultimately, we are trying to solve the following minimization problem:
In this project, we used the MATLAB wrapper of the GMEX graph cut library, and implement our algorithm by MATLAB.
Our algorithm is as follows:
1) Extract the marked foreground and background pixels
2) Cluster the foreground and background into several clusters by K-Mean algorithm
3) Use a Gaussian model to fit every cluster, and build a Gaussian Mixture model to represent the foreground and background, respectively.
4) Calculate the probability of the pixels belonging to foreground and background.
5) Calculate the parameters of data cost and smooth cost, which are ‘class’, ‘unary’,’pairwise’ and ‘labelcost’.
6) Segment the graph using graph cut function
7) [labels , E, Eafter] = GCMex(double(class’), single(unary), pairwise, single(labelcost),0)
CLASS: An 1xN vector which specifies the initial labels of each of the N nodes in the graph
UNARY: A CxN matrix specifying the potentials (data term) for each of the C possible classes at each of the N nodes.
PAIRWISE:: An NxN sparse matrix specifying the graph structure and cost
for each link between nodes in the graph.
LABELCOST:: A CxC matrix specifying the fixed label cost for the labels of each adjacent node in the graph.
EXPANSION:: A 0-1 flag which determines if the swap or expansion method is used to solve the minimization. 0 == swap, 1 == expansion. If omitted, defaults to swap.
8) Segment the image according to the labels. Label is 0 for the foreground and 1 for the background.
4. Experiments
The following figures are the results of our implementation. For each example set, in the first column are the original images with corresponding stokes on them, in the second column are the results.
From the figures we could see that our algorithm works best for dog and dance and relatively worse for Van Gogh and Mona Lisa, while medium for the lady set, since for the first two sets, the difference between the foreground and background is relatively larger than the latter two.
In the dance, lady and Mona Lisa sets, the results of the first row are much better than the second row, since the first row has more strokes for both the foreground and background, than those of the second row, which as a consequence leading to a more accurate Gaussian mixture model to approximate the probability of the image pixels belonging to foreground and background.
In the dog set, the hair of a person at the upper right corner of the image is mistakenly segmented as the foreground, because it’s great similarity in color and texture with the dog region. It can also be observed in other sets that background regions that are similar to the foreground in color and texture are quite possible classified as foreground.
The performance on Mona Lisa and Van Gogh sets are much worse, since the great similarity between much of the background regions and the foreground, especially the lower part.
In conclusion, what really matter in the lazy snapping procedures is the size of the samples and the distinction of foreground in both color and textures. In order to get good performance, we must make sure that our strokes are rich enough, inclusive of different color patches in foreground and background. For those images with low contrast between foreground and background, we need to draw more strokes.
Since the initial node weights are based on color, segmentation is very easy on subjects that are a very different color from the background, like the dog picture. It is also helpful when the fore- or background is a solid color. When there is a lot of discontinuous color, it means more lines will be needed to correctly segment the image.
In addition, in the MATLAB implementation, the pairwise matrix, denoting the graph structure of the image with corresponding link or edge costs, is an N by N sparse matrix, where N is the number of pixels in the images, usually thousands of thousands. The calculation of this matrix slows down the whole lazy snapping process, taking about two minutes on average for each segmentation task, which is far away from the real time segmentation proposed by Microsoft.
5. Conclusions and Possible Improvements
In this project, we learnt how to segment a picture using lazy snapping, a very efficient and interesting technique. However, based on the implementation, I think there are several parts need to be improved.
Since our algorithm is sensitive to the color discontinuity, while the edge cost is based on the color difference. This leads to many noisy discontinuity and noise holes in the result. Therefore, in the future, we could add some spatial distance in the distance vector only based on color, because the further a pixel is to a marked foreground or background, the less likely they belong to the same group. By combining the color and space distance, we could get better results.
The segmentation in our algorithm is coarse, so in order to get smooth boundary, we may do some edge editing after lazy snapping.
As for the speed, we should come up with some more efficient ways to calculate that huge sparse matrix, or simply re-implement the whole algorithm with C\C++, directly calling the library functions rather than the MATLAB wrappers.