I. INTRODUCTION
Recently, we have witnessed a growing interest in laser pointer interaction (LPI), which allows users to interact directly from a distance through a laser pointer. In laser pointer-based interaction systems, the captured laser spot is recognized and used for interactions by using various image processing techniques. The advantage of ensuring movement flexibility for users has led to the widespread use of this method for multimedia presentations [1-4], robot navigation [5-7], medical purposes [8], virtual reality systems [9,10], and smart houses [11].
Recently, Kim et al. [2] summarized three fundamental problems with LPI: laser spot detection, interaction function, and coordinate mapping. In [11-13], the researchers focused on the development of a laser spot detection algorithm that directly influences the performance of LPI systems. The most difficult challenges of laser spot detection are strong light environments, real-time implementation, and dynamic backgrounds. For example, the background information always changes when the speaker turns the slides in practical presentation cases.
To overcome the above mentioned problems, two types of algorithms, namely target search (TS) and background subtraction (BGS), have been developed to detect a laser spot. The TS method directly searches the laser spot without considering the background. Shin et al. [12] simply searches for pixels with maximum intensities to detect the location of the laser spot. Chávez et al. [11] used a combination of template matching and fuzzy rule-based systems to improve the success rate of laser spot detection. Geys and Van Gool [13] determined the laser spot by using clusters along with the fact that a group effect is caused on laser spots by hand jitters. However, the TS method fails because of the strong light environment and the appearance change of the moving laser spot. On the other hand, BGS covers a set of methods that aim to distinguish between the foreground and the background areas by utilizing a background model. The traditional models used to represent background include statistical models, neural networks, estimation models, and some recent models including fuzzy models, subspace models, transform domain models, and sparse models [14]. Among them, sparse models have been successfully applied in compressive sensing [15]. Cevher et al. [16] considered background subtraction as a sparse approximation problem and provided different solutions based on convex optimization. Hence, the background is learned and adapted in a low-dimensional compressed representation, which is sufficient to determine spatial innovations. Huang et al. [17] proposed a new learning algorithm called dynamic group sparsity (DGS). The idea is that the nonzero coefficients in the sparse data are often not random but tend to be a cluster such as those in the case of foreground detection. However, the dictionary of backgrounds is constructed simply by using video frames that make this model sensitive to noise and background changes. In order to solve the problem of background changes and outliers in training samples, Zhao et al. [18] formulated background modeling as a dictionary learning problem. However, the learning process is time consuming and needs all the background information, which makes it difficult to apply in practice. Therefore, to solve the problem discussed in [18], we propose a novel robust algorithm for the construction and update of a dictionary for laser spot detection. Subsequently, the proposed model can control the varying backgrounds and the real-time performance.
The remainder of this paper is organized as follows: Section II briefly explains the proposed method of background modeling and foreground detection. In Section III, we show the experimental results in comparison with those of the existing methods, and some conclusions of the proposed method are presented in Section IV.
II. THE PROPOSED SYSTEM MODEL
Suppose that we have an image Y of size n1 × n2 and we vectorize it into a column vector y of size n × 1 (n = n1 × n2) by concatenating the individual column of Y in the order from first to last. We formulate the background subtraction as a linear decomposition problem, i.e., to find a background component yB and a foreground component yF that together constitute a given frame y :
where yB and yF denote the column vectors of background a nd foreground, respectively.
A. Sparse Representation
Suppose that we have K different backgrounds yB1,yB2,...,yBK ∈ Rn ; then, we can build K configurations for dynamic backgrounds with each configuration standing for one background. Therefore, at a specific frame, the background yB can choose from one of these configurations. We define a new matrix D = [d1,d2,...,dK] as the concatenation of all the configurations; here, di denotes the ith configuration. Then, we say that background yB has the linear representation yB = dixi, where xi denotes a coefficient representing the relationship between yB and di. Thus, the background can be modeled as a sparse linear combination of atoms from a dictionary D, each atom of which characterizes one of the configurations. Next, we rewrite yB in terms of D as follows:
where x = [0,...,0,xi,0,...,0]T denotes a sparse coefficient vector whose entries are ideally zeros except at positions associated with xi.
Zhao et al. [18] summarized two assumptions for this sparse model:
Assumption 1. Background yB of a specific frame y has a sparse representation over a dictionary D.
Assumption 2. The candidate foreground yF of a frame is sparse after background subtraction.
On the basis of these two assumptions, the BGS problem can be interpreted as follows: given a frame y, find a decomposition that has the sparse coded background yB = Dx and the sparse foreground yF = y - Dx :
where ║x║0 denotes the ℓ0-norm counting the number of nonzero elements of x, D indicates the dictionary capturing of all the background configurations, and λ represents the weighting parameter balancing between the two terms.
Since Eq. (3) is an NP-hard problem because of the non-convexity of ℓ0-norm, Zhao et al. [18] replaced ℓ0-norm with ℓ1-norm and obtained the ℓ1-measured and ℓ1-regularized convex optimization problem:
Considering the LPI application, the foreground (laser spot) generally occupies a far smaller spatial area than the background. Therefore, we can simply treat the foreground as noises and obtain a Lasso problem:
This problem can be easily and rapidly solved using least angle regression (LARS) [19], and then, we can obtain the foreground using
B. Dictionary Construction
To make the sparse model robust against dynamic backgrounds, the dictionary must be able to represent all the backgrounds. Huang et al. [17] assumed that background subtraction has already been performed on the first K frames of the video sequences and let D = [y1,y2,...yK] ∈ Rn×K. It is noteworthy that this method is sensitive to noise and cannot be used in practice. Zhao et al. [18] collected all background training samples and developed a robust dictionary learning approach to construct the dictionary:
However, in LPI applications, we are unable to collect a sufficient number of training samples. For example, we are unable to capture a large number of backgrounds in a presentation application since we do not know the information of the next slide until the user gives the ‘PageDown’ or ‘PageUp’ command. Besides, solving this optimization problem is time consuming and the solution is difficult to implement in real-time.
Since the use of video sequences as a dictionary is sensitive to noise, we use information from multiple frames for ensuring robustness. Therefore, the strategy is to apply an exponentially decaying weight to run an online cumulative average on the backgrounds:
where α denotes the decay rate often chosen as a tradeoff between stability and quick update and K represents the parameter that controls the number of backgrounds. The advantage of this approach apart from its simplicity is that it can suppress noise and solve the problem low-frequency background changes to some extent. We assume that the background changes at a high frequency at the dictionary update stage but not the dictionary construction stage, which is often true in an LPI application.
C. Dictionary Update
The dictionary needs to update quickly in order to handle the occurrence of a new background. Huang et al. [17] set a time window to update the dictionary. For frame t ,the dictionary is updated by D = [yt-K,...,yt-2,yt-1]. However, this method is still sensitive to noise, which makes the model unstable. Zhao et al. [18] updated the dictionary D by solving the following optimization problem with the coefficients being updated and considered constant:
Zhao et al. [18] assumed that the atoms in D are independent of each other and thus, updated each of them separately. However, solving this optimization problem is still time consuming.
Considering that when a new background occurs, the foreground yF solved by Eqs. (5) and (6) will not be a sparse result, we can figure out whether a new background occurs by setting a threshold for the ℓ0-norm of yF. Whenever a new background occurs, we add the new background configuration into the dictionary; otherwise, we directly update the dictionary by using Eq. (8). This method can be formulated as follows:
Where Th can be set as the size of the laser spot.
The proposed strategy is made sensitive to changing backgrounds by adding new background configurations, and robust against noise by using the online cumulative average of the backgrounds. The proposed dictionary construction and update algorithm is summarized in Table 1.
Table 1.Description of the proposed dictionary construction and update algorithm
III. EXPERIMENTS AND RESULTS
To validate the ability of the proposed algorithm to handle the above mentioned high-frequency background changes and evaluate the algorithm’s real-time performance, in this section, we discuss two experiments of LPI. Through these experiments, we evaluated the performance of the proposed algorithm with the different parameters used in this algorithm, measured the detection error under dynamic backgrounds, and compared it with the running times of different algorithms as well.
A. Laser Pointer-Operated Windows
A typical example of LPI in practice is the interactive demonstration of software with a computer whose screen content is sent to a video beamer by using a common laser pointer tracked by a video camera as an input device. Algorithms use the behavior of the laser spot to realize the functions of Button Press, Button Release, and Mouse Move. When Button Press is recognized, the corresponding file or dialog may show up, which leads to a background change immediately. We record three videos of the size 160×120, 320×240, and 640×480, respectively, to simulate this process on such a system.
In LPI, the laser spot cannot be static because of the hand jitter, thus instead of measuring the detection error compared with the ground truth, we validate it using the possibility of false detected frames as follows:
The performance of the proposed algorithm is compared with that of two algorithms representing state-of-the-art sparse model approaches [17,18]. Notice that we use LARS [19] to solve Eq. (5) for all these methods in order to evaluate the dictionary construction and update approach. Fig. 1 illustrates some results of the abovementioned algorithms.
Fig. 1.Results on laser pointer-operated Windows. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [8]. (d) Proposed method.
Image sequences having a size of 320×240 are used to test how the parameters λ and α determine the detection performance. The detection errors of different parameter values are shown in Fig. 2. As we can see from Fig. 2, a larger weighting parameter λ is helpful for the detection since the sparsity of the background is the key assumption of the proposed algorithm. However, a considerably large λ value increases the reconstruction error, which leads to relatively low performance. Thus, the value of λ can be chosen from 5 to 10 in order to obtain good performance. The decay rate α is used against noises; a small α value is sensitive to noises, and a large one cannot adapt to a low frequency of background changes.
Fig. 2.Detection error with different parameters λ and α.
As can be observed in Fig. 2, a moderate α value of 0.5 can lead to better performance. In our experiments, the weighting parameter was set at λ = 5 and the decay rate at α = 0.5 .
As the other parameter values used in these tests, we select K = 20 to build the dictionary and Th = 50 to control the sparsity of the laser spot. A standard PC with a 2.0-GHz Intel CPU processor and 3 GB of memory is used in our experiments. As can be seen from Fig. 1, our algorithm can handle a situation that has dynamic backgrounds and is robust against noise. The final results of the detection error defined by Eq. (11) and the running time per frame are illustrated in Fig. 3. As can be observed, our algorithm achieves detection errors that are as low as those of the dictionary learning approach and consumes as little time as the using video images as dictionary method. Notice that the detection error of the using video images as dictionary method [17] is considerably higher than that of our algorithm, and that dictionary learning [18] consumes a considerably large amount of time and thus, cannot be implemented in real time.
Fig. 3.Results on laser pointer-operated Windows. (a) Detection errors. (b) Running time.
B. Multimedia Presentation
In a presentation application, we can use the laser pointer to change slides and draw lines. It should be noted that high-frequency changes are caused when the user changes the slides. Further, each slide may be totally different from the others. For this application, we manually change the slides to obtain dynamic backgrounds and use the above mentioned algorithms for the detection of the laser spot. The final results are shown in Figs. 4 and 5.
Fig. 4.Results of multimedia presentation. (a) Original image (size: 320×240). (b) Using video images as dictionary [17]. (c) Dictionary learning method [18]. (d) Proposed method.
Fig. 5.Results of multimedia presentation. (a) Detection errors. (b) Running time.
From Figs. 4 and 5, we can see that the proposed algorithm can achieve a lower detection error with a low time cost, which is similar to the results of the laser pointer-operated windows method. Thus, the proposed algorithm is robust against different scenarios with dynamic backgrounds. From Table 2, we can see that the detection error when the image resolution 160×120 is the highest, while similarly low detection errors are obtained when the resolutions of 320×240 and 640×480 are used. However, the time cost of using the resolution of 640×480 is considerably higher than that of using the resolution of 320×240. Thus, we recommend the use of the 320×240 resolution in practice.
Table 2.Performance comparison of different image resolutions
IV. CONCLUSION
In this paper, we focus on the laser spot detection algorithm and model it as a background subtraction problem. Further, we propose a robust dictionary construction and update algorithm based on the sparse model for laser spot detection. To test the performance of the proposed method, a large number of experiments are conducted from the perspectives of detection error and real-time performance. The experimental results confirm that the proposed method outperforms the existing methods with a lower detection error and better real-time performance when the background exhibits a high frequency of changes.
Finally, the proposed robust algorithm can also be applied to solve other practical problems, such as traffic monitoring [18] where the background switches among several configurations controlled by the status of traffic lights.
References
- C. Kirstein and H. Muller, "Interaction with a projection screen using a camera-tracked laser pointer," in Proceedings of Multimedia Modeling (MMM'98), Lausanne, Switzerland, pp. 191-192, 1998.
- N. W. Kim, S. J. Lee, B. G. Lee, and J. J. Lee, "Vision based laser pointer interaction for flexible screens," in Proceedings of the 12th International Conference on Human-Computer Interaction, Beijing, China, pp. 845-853, 2007.
- L. Zhang, Y. Shi, and B. Chen, "NALP: navigating assistant for large display presentation using laser pointer," in Proceedings of the 1st International Conference on Advances in Computer-Human Interaction, Sainte Luce, Martinique, pp. 39-44, 2008.
- R. B. Widodo, W. Chen, and T. Matsumaru, "Interaction using the projector screen and spot-light from a laser pointer: handling some fundamentals requirements," in Proceedings of SICE Annual Conference (SICE), Akita, Japan, pp. 1392-1397, 2012.
- S. Shojaeipour, S. M. Haris, A. Shojaeipour, R. K. Shirvan, and M. K. Zakaria, "Robot path obstacle locator using webcam and laser emitter," Physics Procedia, vol. 5, pp. 187-192, 2010. https://doi.org/10.1016/j.phpro.2010.08.136
- Y. Minato, T. Tsujimura, and K. Izumi, "Sign-at-ease: robot navigation system operated by connoted shapes drawn with laser beam," in Proceedings of SICE Annual Conference (SICE), Tokyo, Japan, pp. 2158-2163, 2011.
- S. Shibata, T. Yamamoto, and M. Jindai, "Human-robot interface with instruction of neck movement using laser pointer," in Proceedings of IEEE/SICE International Symposium on System Integration (SII), Kyoto, Japan, pp. 1226-1231, 2011.
- Y. Fukuda, Y. Kurihara, K. Kobayashi, and K. Watanabe, "Development of electric wheelchair interface based on laser pointer," in ICCAS-SICE International Joint Conference, Fukuoka, Japan, pp. 1148-1151, 2009.
- N. W. Kim and H. Lee, "Developing of vision-based virtual combat simulator," in Proceedings of International Conference on IT Convergence and Security (ICITCS), Macao, China, pp. 1-4, 2013.
- S. J. Kim, M. S. Jang, and T. Y. Kuc, "An interactive user interface for computer-based education: the laser shot system," in World Conference on Educational Multimedia, Hypermedia and Telecommunications, Lugano, Switzerland, pp. 4174-4178, 2004.
- F. Chavez, F. Fernandez, R. Alcala, J. Alcala-Fdez, G. Olague, and F. Herrera, "Hybrid laser pointer detection algorithm based on template matching and fuzzy rule-based systems for domotic control in real home environments," Applied Intelligence, vol. 36, no. 2, pp. 407-423, 2012. https://doi.org/10.1007/s10489-010-0268-6
- J. Shin, S. Kim, and S. Yi, "Development of multi-functional laser pointer mouse through image processing," in Proceedings of International Conference on Multimedia, Computer Graphics and Broadcasting (MulGraB), Jeju, Korea, pp. 290-298, 2011.
- I. Geys and L. Van Gool, "Virtual post-its: visual label extraction, attachment, and tracking for teleconferencing," in Proceedings of the 3rd International Conference on Computer Vision Systems (ICVS), Graz, Austria, pp. 121-130, 2003.
- T. Bouwmans, "Traditional and recent approaches in background modeling for foreground detection: an overview," Computer Science Review, vol. 11-12, pp. 31-66, 2014. https://doi.org/10.1016/j.cosrev.2014.04.001
- E. J. Candès and M. B. Wakin, "An introduction to compressive sampling," IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21-30, 2008. https://doi.org/10.1109/MSP.2007.914731
- V. Cevher, A. Sankaranarayanan, M. F. Duarte, D. Reddy, R. G. Baraniuk, and R. Chellappa, "Compressive sensing for background subtraction," in Proceedings of the 10th European Conference on Computer Vision (ECCV), Marseille, France, pp. 155-168, 2008.
- J. Huang, X. Huang, and D. Metaxas, "Learning with dynamic group sparsity," in Proceedings of IEEE 12th International Conference on Computer Vision, Kyoto, Japan, pp. 64-71, 2009.
- C. Zhao, X. Wang, and W. K. Cham, "Background subtraction via robust dictionary learning," EURASIP Journal on Image and Video Processing, vol. 2011, pp. 1-12, 2011.
- B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, "Least angle regression," The Annals of Statistics, vol. 32, no. 2, pp. 407-499, 2004. https://doi.org/10.1214/009053604000000067