1. Introduction
Background modeling is often used in different applications to model the background and then detect the moving objects in video surveillance, optical motion capture, human computer interaction, multimedia, and content based video coding. One of the most popular methods for extracting moving objects from a video frame is background subtraction. Background subtraction is one of the first low-level processing operations in any intelligent video surveillance system. It is the operation of identifying and segmenting moving objects in video frames by separating the still areas, called the background, from the moving objects, called foreground. Any background subtraction algorithm first constructs a representation of the background called the background model and then, each subsequent frame is subtracted from this background model to give the resulting foreground. Adaptive background subtraction algorithms will also update the model along the sequence to compensate for eventual changes in the background. There are many challenges in background subtraction like dynamic background, illumination changes, motion changes, high frequency background objects, shadow, camouflage, non-stationary background, etc. To take into account the problems and challenges in background subtraction, many background modeling methods have been developed.
The background modeling methods can be classified in the different categories; basic background modeling, statistical background modeling, fuzzy background modeling and background estimation. All these modeling approaches are used in background subtraction context which presents the following steps like background modeling, initialization, maintenance and foreground detection. Many works have been proposed in the literature for background subtraction over the past decades as a solution of a reliable and efficient method. Mixture of Gaussians is a widely used approach for background modeling to detect moving objects in the video sequences taken from static cameras. A Gaussian mixture model (GMM) was proposed by Friedman and Russell [1] and it was refined for real-time tracking by Stauffer and Grimson [2-4]. Numerous improvements of the original method developed by Stauffer and Grimson have been proposed over the recent years. Friedman and Russel [1] proposed to model each background pixel using a mixture of three Gaussians corresponding to road, vehicle and shadows in a traffic surveillance system. Then, the Gaussians are manually labeled. The darkest component is labeled as shadow; in the remaining two components, the one with the largest variance is labeled as vehicle and the other one as road. This remains fixed for all the process giving lack of adaptation to changes over time. Then for the foreground detection, each pixel is compared with each Gaussian and is classified according to it corresponding Gaussian. The maintenance is made using an incremental expectation-maximization (EM) algorithm for real time consideration. An important problem of the EM algorithm is that it can converge to a poor local maximum if not properly initialized. Stauffer and Grimson generalized this concept by modeling the recent history of the color features of each pixel by a mixture of K Gaussians [3]. McKenna et al. proposed a derivation for training mixture models to track video objects. However, their method required memory proportional to the temporal window size, making it unsuitable for applications where mixture modeling is pixel-based and over a long temporal window [5]. Sato and Ishii proposed a mechanism for adding, deleting, and splitting Gaussians to handle dynamic distributions similar to Gaussian reassignments and suggested temporal adaptation could be achieved by manipulating a discount factor. However, the proposed method is unclear how to define the schedule of the discount factor to implement the behavior required in surveillance applications [6].
The adaptive GMM has few problems under special conditions. It often suffers from a trade off between robustness to background changes and sensitivity to foreground abnormalities. It is inefficient in managing the trade off for various surveillance scenarios. When the likelihood factor ρ has a small value, that causes slow parameter adjustment and that lead to low precision at primary frames. The GMM does not distinguish between background elements and their shadows. To overcome these problems, modification in adaptive GMM is necessary. This work deals background subtraction based on modified adaptive GMM with TTD. The results of background subtraction on several sequences in various testing environments show that the proposed method is efficient and robust for the dynamic environment and achieves good accuracy. The related works are described in Section 2. The proposed method is applied to background subtraction is described in Section 3 and modified adaptive GMM with TTD is described in Section 4. Experimental results are described in Section 5 and the conclusions are drawn in Section 6.
2. Related Works
For foreground object detection in video surveillance system, background modeling plays a vital role. A simple method was proposed to represent the gray level or color intensity of each pixel in the image as an independent and unimodal distribution [7-10]. Hung et al proposed a region based background modeling using partial directed Hausdorff distance and MRFs [11, 12]. Zivkovic et al proposed equations to constantly update the parameters and selected an appropriate number of components for each pixel [13]. Klare et al proposed method that use many image features with intensity to enhance background modeling techniques and a clear improvement had achieved compared to using color intensities in varying illuminations [14]. Claudio Rosito Jung has proposed a background subtraction algorithm with shadow identification suited for monochromatic video sequences using luminance information. The background image was modeled using robust statistical descriptors, and a noise estimate was obtained. Foreground pixels were extracted, and a statistical approach combined with geometrical constraints was adopted to detect and remove shadows [15].
Lin et al have proposed a background subtraction method for gray-scale video, motivated by criteria leading to what a general and reasonable background model should be, and realized by a practical classification technique [16]. But the proposed methods of Jung and Lin et al are efficient for gray-scale video only and not suitable for RGB video sequences. The main difficulty in designing a robust background subtraction algorithm is the selection of a detection threshold. McHugh et al have proposed a background subtraction method to adapt this threshold to varying video statistics. They have suggested a foreground model based on small spatial neighbourhood to improve discrimination sensitivity and a Markov model to change labels to improve spatial coherence of the detections [17]. The motion-based perceptual grouping, a spatio-temporal saliency algorithm was proposed to perform background subtraction and it is applicable to scenes with highly dynamic backgrounds. Comparison of the algorithm with other techniques shows that the cost of a prohibitive processing time is long. It takes several seconds to processes a single frame. That’s why this algorithm is unsuitable for real-time applications [18]. Saliency detection techniques have been recently employed to detect moving objects in video sequences while effectively suppressing irrelevant backgrounds [19], [20]. But these methods often fail to adapt quickly to various background motions.
Chiu et al have suggested an algorithm to extract primary color backgrounds from surveillance videos using a probability-based background extraction algorithm. The intrusive objects can be segmented by a robust object segmentation algorithm that investigates the threshold values of the background subtraction from the prior frame to obtain good quality while minimizing execution time and maximizing detection accuracy [21]. Cheng et al examined the problem of segmenting foreground objects in live video when background scene textures change over time. They proposed a series of discriminative online learning algorithms with kernels for modeling the spatialtemporal characteristics of the background. By exploiting the parallel nature of the proposed algorithms, they developed an implementation that can run efficiently on the highly parallel Graphics Processing Unit (GPU) [22]. A robust vision-based system for vehicle tracking and classification was proposed and tested with different kinds of weather conditions including rainy and sunny days by Luis Unzueta [23].
Hati et al proposed an intensity range based object detection scheme for videos with fixed background and static cameras. They suggested two different algorithms; the first one models the background from initial few frames and the second algorithm extracts the objects based on local thresholding [24]. A multilayer codebook-based background subtraction (MCBS) model is proposed by Guo et al for video sequences to detect moving objects. Combining the multilayer block-based strategy and the adaptive feature extraction from blocks of various sizes, the proposed method can remove most of the dynamic background. The pixel-based classification is adopted for refining the results from the block-based background subtraction, which can further classify pixels as foreground, shadows, and highlights [25]. The classic Gaussian mixture model is based on the statistical information of every pixel and it is not robust to light changes. The method combining video coding and the Gaussian mixture model together is proposed by Huang et al. They use intra mode and motion vectors to find the foreground macro block, then add one overhead flag in the compressed video to indicate it. In the decoder, they decode possible foreground areas and detect moving objects in these areas [26].
3. Background Subtraction
Moving object detection can be achieved in the video surveillance by building a background model. Any important change in an image region from the background model signifies a moving object.The most common paradigm for performing background subtraction is to build an explicit model of the background. Foreground objects are then detected by calculating the difference between the current frame and this background model. A binary foreground mask will be made by classifying the pixel with association of absolute distinction higher than a threshold as being from a foreground object. The basic plan is to classify pixel as background or foreground by finding the distinction between the background image and the current image. In the view of the importance of real-time computations in the surveillance systems, the background subtraction method is so significant.
The proposed method is explored in Fig. 1. Input video is first converted into frames during the processing. The input video frame is partitioned off into little blocks. A classification decision is made in each block whether the block has changes or not with respect to the background. At this point, the invariant features from the image blocks are extracted. Then the images blocks are classified into two categories as foreground and background based on these features. The background model and the classifier should be adaptive when the background is dynamic. Foreground objects can be obtained by comparing the background image with the current video frame. By applying this approach to each frame in a video sequence tracking of any moving objects can be effectively achieved. This technique is commonly known as background subtraction or change detection. It is a very useful technique in video surveillance applications. Under noisy conditions the object has the same color as the background. It is difficult to separate foreground objects and background. While processing the image, the noise can be removed by using suitable filters. In our project we used median filter. Background modeling by using GMM with TTD is explained in Section 4.
Fig. 1.Proposed Method
4. Gaussian Mixture Model with TTD for Background Modeling
We use modified adaptive Gaussian mixture model (GMM) with Three Temporal Differencing (TTD) method for background modeling. A Gaussian mixture model was proposed by Friedman and Russell [16] and it was refined for real-time tracking by Stauffer and Grimson [17-19]. Adaptive GMM approach can handle challenging situations such as sudden or gradual illumination changes, slow lighting changes, long-term scene changes periodic motions from a cluttered background and repetitive motions in the clutter, etc. In this method a different threshold is selected for each pixel. These pixel-wise thresholds are adapting by time. Objects are allowed to become part of the background without destroying the existing background model. By using median filtering the Gaussians are initialized. Assuming that the background is more likely to appear in a scene, we can use the median of the previous n frames as the background model.
Where, I(u,v,t)-image at time t, B(u,v,t)- background image at time t, and i={0,1,2……n-1}
By applying a threshold to the absolute difference the foreground mask can be generated.
The values of a particular pixel are modeled as a mixture of adaptive Gaussians. Gaussians are evaluated in all the iteration to determine which ones are mostly likely to correspond to the background. Pixels that do not contest with the background Gaussians are classified as foreground. Foreground pixels are clustered using 2D connected component analysis. At any time t, what is known about a specific pixel, (u0,, v0), is its record:
This record is modeled by a mixture of K Gaussian distributions. The probability of an observed pixel with intensity value ut at time t is modeled as:
Where, ωi,t is the ith Gaussian mixture weight and ƞ(Ut|μi,t,∑i,t) the component Gaussian densities stated by:
K-means approximation is used to update the Gaussians. If a new pixel value, Ut+1, can be matched to one of the existing Gaussians (within 2.5σ), that Gaussian's μi,t+1 and are updated as follows:
Where,
and α is a learning rate.
Prior weights of all Gaussians are adjusted as follows:
Where, 𝓜i,t+1 = 1 - for the matching Gaussian and 𝓜i,t+1 = 0 - for all the others.
If Ut+1 do not match to any of the K existing Gaussians, the least probably distribution is replaced with a new one. New distribution has μt+1 = Ut+1 a high variance and a low prior weight. The Gaussians with the most supporting evidence and least variance should correspond to the background. With mean μi,t and covariance matrix Σi,t = the weight parameter ωi,t determines the time duration that ith distribution exists in the background. The weights are positive and their sum is one. The K distributions are ordered based on the fitness parameter ωk/σk and the number of active Gaussian components is calculated by assuming that the background includes B colors with the most probability. The Gaussians are ordered by the value of ω/σ. Then simply the first B distributions are selected as the background model.
Where, T is the minimum prior probability that the background is in the scene. Background subtraction process is performed by marking pixels that are more than 2.5σ away from any of the B distributions as the foreground moving objects.
But the adaptive GMM has few problems under special conditions. First problem, if the initial pixel value belongs to the foreground, there would be only one distribution with the weights equal unity. If next pixels belong to the background with the same color it takes log1-α(T) frames until adding this pixel to the background. Second problem, when the likelihood factor ρ has a small value that causes slow parameter adjustment; leading to low precision at primary frames. The third problem is that this method does not distinguish between background elements and their shadows. To overcome these problems, updating equations of the distribution parameters change as follows.
Where, -new weights, -mean at time t, -covariance at time and nrs-number of recent frames.
To model significant variance in the background, pixel intensity is modelled by a mixture of K Gaussian distributions. In most existing papers proposed for background subtraction K is a fixed number from 3 to 7. In our approach K is not a fixed number. The flexible modified adaptive GMM is proposed in this paper for background subtraction. To improve on the illumination changes and noise as the Gaussian mixture model does not vary greatly in this respect, we propose an algorithm based on linking GMM with TTD to meet good accuracy in background subtraction.
Three temporal differencing is the principle used in time subtracting image pixels continuously. The traditional method will cause the internal cavity temporal differencing case, thus the moving object shape is not complete for the follow up tracking and identifying moving objects will not be able to provide complete information. The traditional image subtraction method is determined by subtracting the previous image from the current image to obtain motion information. We use three consecutive image subtraction methods in this paper. If the three successive images were In-1(u,v), In(u,v) and In+1(u,v),
To obtain the Ic(u,v) , we give a threshold; this threshold can remove noise, and can be set for different light conditions.
5. Experimental Results
We mainly evaluated the proposed background subtraction using the following public video datasets Waving Trees, Fountain, Campus, and Water Surface. True positive (TP) for a correctly classified foreground pixel, true negative (TN) for a correctly classified background pixel, false positive (FP) for a background pixel that was incorrectly classified as foreground and false negative (FN) for a foreground pixel that was incorrectly classified as background were calculated for each pixel in background subtraction method. After the classification of every pixel into one of those four groups had been done, sensitivity, precision, F1 and similarity were calculated by using Eqs. (17), (18), (19) and (20). Sensitivity (Recall) measures the section of actual positives which are correctly identified. Precision is used to describe and measure the estimate or predict. Recall, also known as detection rate, gives the percentage of detected true positives as compared to the total number of true positives in the ground truth where is the total number of true positives in the ground truth. Moreover, we considered F1 that is the weighted harmonic mean of precision and recall.
The experiment was conducted in this way for background subtraction by using different methods and our proposed method. Experimental results of the proposed method for different dataset test video sequences are represented in Figs. 2, 3, 4 and 5. The quantitative results of the proposed method with other background subtraction are shown in Table 1, 2, 3 and 4. The experimental results on several sequences in various environments shows that proposed method achieves over good accuracy and the proposed method is efficient and robust for the dynamic environment.
Fig. 2.Results with Waving Trees test Sequences: (a) Background reference image; (b) Current frame; (c) Proposed method; (d) Ground truth image
Fig. 3.Results with Fountain test Sequences: (a) Background reference image; (b) Current frame; (c) Proposed method; (d) Ground truth image
Fig. 4.Results with Campus test Sequences (a) Background reference image (b) Current frame (c) Proposed method (d) Ground truth image
Fig. 5.Results with Water Surface test Sequences (a) Background reference image (b) Current frame (c) Proposed method (d) Ground truth image
Table 1.Performance comparisons using test sequence Waving Trees
Table 2.Performance Comparisons Using Test Sequence Fountain
Table 3.Performance Comparisons Using Test Sequence Campus
Table 4.Performance Comparisons Using Test Sequence Water Surface
6. Conclusion
The modified adaptive GMM with TTD method is applied for detecting foreground objects in video sequences which can give a good and stable detection. Experimental results show that our algorithm can lead to a better background subtraction. The results on several sequences show that this algorithm is efficient and robust for the dynamic environment with new objects in it. When comparing background subtraction with traditional method in real world, the experimental result shows that proposed approach is more accurate than other classical algorithms. For future work, we intend to evaluate new background models with updating capacity, which will allow the system to adapt to luminosity changes or sudden scene configuration changes, shadow and camouflage. We intend to implement proposed background subtraction method in FPGA.
References
- N. Friedman and S. Russell, “Image segmentation in video sequences: a probabilistic approach,” in Proc 13th Conf on Uncertainty in Artificial Intelligence, 175-181, 1997.
- W. Grimson, C. Stauffer, R. Romano and L. Lee, “Using Adaptive Tracking to Classify and Monitor Activities in a Site,” IEEE CVPR, 1998.
- C. Stauffer and W. Grimson, “Adaptive Background Mixture Models for Real-time Tracking,” IEEE Conf. Comput. Vis. Pattern Recognit., pp. 246-252, Jun. 1999.
- C. Stauffer and W. Grimson, “Learning Patterns of Activity Using Real-time Tracking,” IEEE Transactions on PAMI, 22(8): p. 747-757, 2000. https://doi.org/10.1109/34.868677
- S.J. McKenna, Y. Raja, and S. Gong, “Object Tracking Using Adaptive Color Mixture Models,” Proc. Asian Conf. Computer Vision, vol. 1, pp. 615-622, Jan. 1998.
- M.A. Sato and S. Ishii, “Online EM Algorithm for the Normalized Gaussian Network,” Neural Computation, vol. 12, pp. 407-432, 1999.
- I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 809-830, August 2000. https://doi.org/10.1109/34.868683
- E. Stringa and C. S. Regazzoni, “Real-time videoshot detection for scene surveillance applications,” IEEE Transactions on Image Processing, vol. 9, no. 1, pp. 69-79, January 2000. https://doi.org/10.1109/83.817599
- S. Gupte, O. Masoud, R. F. K. Martin, and N. P. Papanikolopoulos, “Detection and classification of vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 3, no. 1, pp. 37-47, Mar. 2002. https://doi.org/10.1109/6979.994794
- S.Y. Chien, S.Y. Ma, and L.G. Chen, “Efficient moving object segmentation algorithm using background registration technique,” IEEE Transactions on Cir-cuits and Systems for Video Technology, vol. 12, no. 7, pp. 577-586, Jul.2002. https://doi.org/10.1109/TCSVT.2002.800516
- S. S. Huang, L. C. Fu, and P. Y. Hsiao, “A region-based background modeling and subtraction using partial directed Hausdorff distance,” IEEE Int. Conf. Robotics and Automation, 2004.
- S. S. Huang, L. C. Fu, and P. Y. Hsiao, “A region-level motion-based background modeling and subtraction using MRFs,” IEEE Int. Conf. Robotics and Automation, 2005.
- Z. Zivkovic and F. V. der Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognit. Lett., vol. 27, no. 7, pp. 773-780, 2006. https://doi.org/10.1016/j.patrec.2005.11.005
- B. Klare and S. Sarkar, “Background subtraction in varying illuminations using an ensemble based on an enlarged feature set,” in IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognit. Workshops, June 2009.
- Claudio Rosito Jung, Member, IEEE, “Efficient Background Subtraction and Shadow Removal for Monochromatic Video Sequences,” IEEE Transactions on Multimedia, vol. 11, no. 3, April 2009.
- Horng-Horng Lin, Student Member, IEEE, Tyng-Luh Liu, Member, IEEE, and Jen-Hui Chuang, Senior Member, IEEE, “Learning a Scene Background Model via Classification,” IEEE Signal Processing vol. 57, no. 5, May 2009.
- J. Mike McHugh, Member, IEEE, Janusz Konrad, Fellow, IEEE, Venkatesh Saligrama, Senior Member, IEEE, and Pierre-Marc Jodoin, Member, IEEE, “Foreground-Adaptive Background Subtraction,” IEEE Signal Processing Letter, vol. 16, no. 5, May 2009.
- V. Mahadevan, and N. Vasconcelos, “Spatiotemporal Saliency in Dynamic Scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 1, pp. 171-177, January 2010. https://doi.org/10.1109/TPAMI.2009.112
- C. Guo and L. Zhang, “A novel multi resolution spatiotemporal saliency detection model and its applications in image and video compression,” IEEE Transactions on Image Processing, vol. 19, no. 1, pp. 185-198, January 2010. https://doi.org/10.1109/TIP.2009.2030969
- V. Mahadevan, N. Vasconcelos, N. Jacobson, Y-L. Lee and T.Q. Nguyen, “A Novel Approach to FRUC using Discriminant Saliency and Frame Segmentation,” IEEE Trans. Image Process., vol. 19(11), 2924-2934, November 2010. https://doi.org/10.1109/TIP.2010.2050928
- Chung-Cheng Chiu, Member, IEEE, Min-Yu Ku, and Li-Wey Liang, “A Robust Object Segmentation System Using a Probability-Based Background Extraction Algorithm,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 20, no. 4, April 2010.
- L. Cheng, M. Gong, and D. Schuurmans, “Real-time discriminative background subtraction,” IEEE Trans. Image Process., vol. 20, no. 5, pp. 1401-1414, May 2011. https://doi.org/10.1109/TIP.2010.2087764
- Luis Unzueta, Marcos Nieto, Andoni Cortés, Javier Barandiaran, Oihana Otaegui, and Pedro Sánchez, “Adaptive Multicue Background Subtraction for Robust Vehicle Counting and Classification”, IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 2, June 2012 .
- Kalyan Kumar Hati, Pankaj Kumar Sa, and Banshidhar Majhi, “Intensity Range Based Back-ground Subtraction for Effective Object Detection”, IEEE Signal Processing Letters, vol. 20, no. 8, August 2013.
- Jing-Ming Guo, Chih-Hsien Hsia, Yun-Fu Liu, Min-Hsiung Shih, Cheng-Hsin Chang, and Jing-Yu Wu, “Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 10, October 2013.
- Zhenkun Huang, Ruimin Hu, and Zhongyuan Wang, “Background Subtraction With Video Coding,” IEEE Signal Processing Letters, vol. 20, no. 11, November 2013.
Cited by
- Space moving target detection using time domain feature vol.14, pp.1, 2018, https://doi.org/10.1007/s11801-018-7194-y
- Precision Security: Integrating Video Surveillance with Surrounding Environment Changes vol.2018, pp.1099-0526, 2018, https://doi.org/10.1155/2018/2959030