1. Introduction
Over the past decades, fresh research pathway(s, apropos of computer vision, abounded thanks to the face recognition paradigm. In actual fact, it has a plethora of applications ranging from human-computer smart interaction, surveillance, telecommunication, access control [1] to internet of things (IoT) [2]. Broadly speaking, face recognition is mostesteemed for its facility to apperceive an individual’s identity gleaned from his facial features. However, noticeable are the breakthroughs and improvements in the facerecognition paradigm, a handful of loopholes do stand out [3] that are attributable to the wide intra-class facial change, such as, luminosity, pose, aging, expression, and the miniscule interclass dissimilarities, which are genuine factors that could affect the face recognition performance.
Into two divisible categories are face recognition methods classified [3]: Image-based methods use a vector that represents the entire face rather than the most significant facial features. Feature based methods try to extract features of the image and match them against the knowledge of the facial features.
Initially, Feature based methods that are developed for face recognition used mainly individual features of the faces, such as, mouth, eyes or nose to perform identification [4]. However, such methods did not lead to good results given the variability of poses, illumination, facial expression and the low amount of information used. Recently, however, a range of schemes are brought forward to overcome the difficulties of face recognition. In this respect, among the newfangled research works on face recognition based on filtering, dimension reduction and classification, we can mention:
Zhenhua Chai et al [5] proposed a new-fashioned local feature analysis formula, namely, Gabor Ordinal Measures (GOM), that combines Gabor features with ordinal measures, as a forceful formula to deal with intra-personal variations in face images and the inter-personsimilarity. Notwithstanding, this method is grounded on non-flexible Linear Discrimant Analysis (LDA) classifier. In effect, it deploys linear boundary to discriminate between facial classes. Their efficacity is lacking when employed on issues that necessitate nonlineardecision boundaries. Furthermore, the efficiency of LDA dwindles as the underlying class distribution is not normal.
Zhifeng Li et al [6] advanced a method for face recognition formulated on n on parametric discriminant analysis (NDA) and multiclassifier integration. A fresh formulation of scattermatrices, they have introduced, to extend the two-class NDA to multiclass NDA-based formula NSA (Nonparametric Subspace Analysis) and NFA (Nonparametric Feature Analysis). Indeed, they have formulated a dual NFA-based multiclassifier fusion framework by employing the overcomplete Gabor representation for face images to boost therecognition efficiency.
Recently, Arbia Soula et al presented in [7] and [8] a novel face recognition systems formulated on Gabor and ordinal filters for feature extrication, and on the Kernel Fisher Discriminant Analysis (KFD) and the Kernel Nonparametric Discriminant Analysis (KNDA), respectively, for dimension reduction and classification. In fact, both dimension reduction and classification methods are grounded on flexible non-linear facial classes’ disjunction. More particularly, KNDA integrates the paramount nearby global variations data, to hand leheterosced astic face classes.
More recently, Convolutional Neural Network (CNN), with Convolution phase and fully connected layers, has been widely used for face features extraction and classification [9]. However, this model is not based on convex nor concave optimization problem. Thus, finding the global optimal solution is not an easy task. For this reason, several research works have used Deep features to represent face images and then deployed Support Vector Machine (SVM) classifier to effectuate face recognition. Among them, we can mention:
Dat Tien Nguyen et al [10] presented an approach to optimize the level of security for facerecognition systems based on a merging of deep learning and handcrafted features extricated from face images. In fact, this system is based on hybrid features descriptor, and uses CNN medium to extricate deep image attributes, and the multi-level local binary pattern (MLBP) formula to elicit the detail features of the skin from face images. Finally, SVM is made use of to put the image features into classes.
Typical employments for the existing face recognition techniques contend that all the datais given beforehand while learning is done altogether once. In this respect, such techniques are considered as batch learning methods. However, these systems still show some loopholes in several real-life functions where data are sequentially obtained, such as, video-based facerecognition application. In fact, in a shifting scene, poses, luminescence and other image capture status may change quickly leading to accuracy degradation. In such contexts, it is unwise to re-train the classifiers on a continuous basis [11].
Because of this shortcoming, batch techniques necessitate a huge memory coupled withexpensive time training for large datasets. Moreover, batch learning displays a decline in performance when data are not present up from the start. That’s why, a novel learning formula is required.
In effect, incremental learning has been deemed to be more adequate than batch learning as new samples are added asynchronously, at different timestamps, or when dealing with agreat deal of data. So far, many face recognition methods formulated on incremental learning and dimension reduction have been proposed in literature ([12] and [13]).
For instance, B. Raducanu et al. [12] introduced a novel formula to delineate subspacelearning grounded on an incremental nonparametric discriminant analysis (INDA), wherenew examples can be added asynchronously, at different intervals. Then as new individuals are being added, we apply the incremental updates without the computation of the full scale within-class scatter matrix and between-class scatter matrix. The INDA deals with general data distributions in a proper manner, and it captures correctly the information between class boundaries. Thus, recently, we have used the INDA to build an incremental face recognitionsystem in [13].
However, all the above mentioned incremental face recognition methods are based onnon-flexible linear classifiers. In fact, they utilize linear boundaries to distinguish between face classes. Not up to standard is their efficacity when employed to difficulties that call for nonlinear decision boundaries or to problems as face datasets are nonlinearly separable. Furthermore, the efficiency of these methods dwindles as the underlying class distribution is not normal and face classes are heteroscedastic. Thus, these approaches are ineffecient for avariety of face recognition problems. Indeed, pose, luminescence and high variability infacial features induce high non-linearity and heteroscedasticity in the representation space.
As an alternative, Hsien-Ting Cheng et al. [14] presented an approach to multimodal person identity verification based on an incremental kernel Fisher’s discriminant for training image dimension reduction, and relevant feature extraction. Support Vector Machine (SVM) is used for data fusion and classification. As IKFD is based on Kernels, it separates between face classes using flexible decision boundary. However, IKFD does not hand leheterosced astic data and assumes that face classes are normally distributed, which is not common in real world applications.
NaimulMefraz Khan et al. [11] proposed a video-based face recognition method formulated on adaptive sparse dictionary, to overcome the shifts in illumination, pose, occlusion and alignment in face images. In fact, it consists of dynamical updates with present probe image into the training matrix, using a novice confidence criterion and a Bayesianinference scheme to recognize faces from unconstrained videos.
Recently, Pawel Karczmarek et al. [15] presented a face recognition system based on Chain Code-Based Local Descriptor (CCBLD) for face features extraction, it consists to achieve Bag-of-Visual-Words method through the dictionary of chain-codes.
More recently, Lufian et al. [16] advanced an incremental face recognition method formulated on Deep learning, capable to update the classification model in which newsamples can be added during use. In fact, this system is based on intelligent principle training method, namely, the S-DDL (self detection, decision and learning) employing incrementalversion of the Support Vectors Machine (SVM) algorithm in order to realize self-learning and enhance classification accuracy, while maintaining low computational time. The module of features extraction is based on CNN, and then incremental SVM is trained on the extracted features, so as to execute recognition. Although, incremental SVM makes nodistribution assumption on the data, it can be misled by data spread, since its solution is being formulated on a finite number of support vectors.
As a matter of fact, in the present paper we are working out this problem by proposing anadaptive face recognition system based on a novel Incremental Kernel Nonparametric Discriminant Analysis (IKNDA). In fact, IKNDA is advantageous since it incrementally reduces data dimension and performs classification, while dealing with general datadistributions. Moreover, unlike incremental SVM, it aptly distinguishes the structuralinformation between class boundaries and is based on kernelization to perform flexible non-linear separation between face classes, thereby improving classification performance compared to face recognition systems based on classical incremental parametric models.
Also, our proposition integrates advantages of combining distinction of Gabor response with flexibility of ordinal filters [5]. In fact, it consists of several steps: First, multichannel Gabor filters are applied on the input image. Second, several ordinal measures are applied on obtained Gabor images and encoded to generate visual primitives in facial zones. Third, thespatial histograms of these primitives are concatenated into a feature vector whose size is brought to minimum using PCA. Finally, the novel IKNDA is further used to reducedimension and classify feature vectors. The proposed adaptive face recognition systeminvestigates the effectiveness of the IKNDA method and we showed that it is appropriate tool to deal with non-stationary learning environments.
The present is grouped as follows: In the next section we describe in details the main phases of our adaptive face recognition system, including the feature extraction phase and the recognition phase using the novel IKNDA. Also, the algorithm and comprehensive block diagram of the face recognition method are provided. In section III, a decontextualizedevaluation is set into place to compare the IKNDA technique to the classical batch version of KNDA, as well as, to other relevant state-of-the-art incremental discriminant algorithms, on real datasets. Also, a contextualized comparative evaluation of the adaptive face recognition method based on IKNDA, is performed on several face datasets. Finally, in the last section, we provide concluding remarks and perspectives.
2. Face Recognition Method Phases
In this section, we give a full account of our face recognition method. This latter is made up of two main phases: Feature extraction and recognition using a novel Incremental Kernel Nonparametric Discriminant Analysis (IKNDA). Then, we provide the face recognition method algorithm and comprehensive block diagram.
2.1 Facial Feature Extraction
Feature extraction is a pivotal measure in face recognition operation. It consists in finding aspecific representation of the data that can highlight relevant information, which would helpin overcoming the human facial complications, such as, the light directions of imaging, differences of facial expression, variation of pose and aging.
Indeed, in our method we utilize a local feature analysis technique, namely, Gabor Ordinal Measures (GOM) for the representation of face features [5],which inherits the advantages of combining differences of Gabor features with forcefulness of some types of ordinal wavelets, as an auspicious answer to reduce intra-person similarities as well as maximize dissimilarity between persons. Thus, the 2D Gabor filters help to produce prominent local discriminating features that are appropriate for face recognition. The Gabor filters are expressed as follows [17]:
\(\psi_{\mu, \mathrm{r}}(z)=\frac{\left\|k_{\mu \nu}\right\|^{2}}{\sigma^{2}} e^{\left(\frac{\mu_{\mu}, \mu_{1}^{2} \mu_{1}^{2}}{2 \alpha^{2}}\right)}\left[e^{a_{\mu} \mu_{2} x}-e^{-\frac{\alpha^{2}}{2}}\right]\) (1)
Where ∈ {0, … ,4} and µ∈ {0, … ,7} are the scale and orientation of the Gabor wavelets, respectively, and \(\text { and } z=(x, y)\) denotes the spatial position. The wave vector ,\(k_{\mu, v}=k_{v} e^{i \emptyset_{\mu}}\) is of a magnitude \(k_{v}=\frac{k_{\max }}{\lambda^{v}}\), where \(\lambda\) is the frequency ratio between filters and \(\emptyset_{\mu}=\frac{\pi \mu}{8}, \emptyset_{\mu} \in[0, \pi]\)
In practice, the Gabor filters decomposition and illustration of the facial image is the convolution of the image I with a set of Gabor kernels \(\psi_{\mu, v}(z)\), defined as:
\(G_{\mu, v}(z)=I(z) \times \psi_{\mu, v}(z)\) (2)
So, the Gabor filters response produced for a definite frequency and orientation in Equation (2), is a complex number, given by the following equation [18]:
\(G_{\mu, v}(z)=I(z) \times \psi_{\mu, v}(z)\) (3)
Where, A and \(\theta\) define the magnitude response and the phase of Gabor kernel at each image position z, respectively.
The complex Gabor filter is a powerful descriptor. In fact, it is a strong tool to characterize the image texture. Therefore, it can obtained the local region matching specific frequency, spatial locality and a definite orientation that are demonstrably discriminative and robust to expression changes and illumination.
The complex Gabor response can also be described with real part and imaginary part. Thus, we can obtain four features for each face image, which are phase, magnitude, real andimaginary Gabor feature images.
As for ordinal or multi-lobe differential filtering for ordinal feature extraction, they provide a richer representation of facial features and are well-conditioned to uniform noise. Besides image space, ordinal features have the advantage of describing the neighboringrelationship in various orientations and scales of Gabor images.
From a mathematical perspective, multi-lobe differential filter (MLDF) is formed by many positive and negative lobes, which allow the arrangement of dissociated image regions inintensity and feature aspects. In respect to intensity, it helps to determine relational nature of the average intensity values of two image zones. Whereas, on the feature account, qualitative details on the image features are calculated. Thus, the MLDF has the benefit of inertness tomonotone illumination variation and forcefulness to noise. The ordinal values aredistinctively “0” or “1” as the filtering results are, respectively, negative or positive. The MLDF can be presented with Gaussian Kernels as follows:
\(M L D F=c_{p} \sum_{i=1}^{N_{P}} \frac{1}{\sqrt{2 \pi \delta_{p i}}} e^{\left[\frac{-\left(x-\omega_{p i}\right)^{2}}{2 \delta_{p i}^{2}}\right]} c_{n} \sum_{j=1}^{N_{n}} \frac{1}{\sqrt{2 \pi \delta_{n j}}} e^{\left[\frac{-\left(x-\omega_{n j}\right)^{2}}{2 \delta_{n j}^{2}}\right]}\) (4)
Where, \(\omega\) is the central position, \(\sigma\) is the frequencies of 2D Gaussian filter, Nn is the sum of negative lobes and Np is the sum of positive lobes and cn and cp are two constants.
Distinct ordinal feature representation techniques are administered to Gabor features, suchas, Gabor magnitude, phase, real and imaginary, so as to record the forceful ordinal featuresin diverse directions.
Face image can be analyzed on two levels: local intensity level and local feature level. As for local intensity variation, it is insubstantial as facial skin would have the same intensity of reflection ratio. In response to such a limited role, ordinal measures derived from featureslevel become more powerful as they have a manifest discriminatory power in facerecognition. Consequently, the use of Gabor filter aims at getting a more discriminativefeature as well as ameliorating the local details of face texture. As a pleasing result, the integration of feature Gabor images with ordinal filters leads to a better recognition rate.
As a result, the ordinal measures acquired from different components of Gabor imagessignificantly expand the feature vector area of a face image. Thus, it is essential to integratemany binary codes in GOM facial features to find a characteristic texture parameter and minimize the length of GOM feature.
The Gabor Ordinal Measures (GOM) for facial feature description could be represented in the following algorithm:
Algorithm 1. Face features extraction based on Gabor and Ordinal filtres
2.2 The Novel Incremental Kernel Nonparametric Discriminant Analysis (IKNDA)
In this part, we present the classical batch Kernel Nonparametric Discriminant Analysis (BKNDA) as it is the basis of our incremental model. In fact, the incrementality consists of sequential updates of the KNDA-eigenspace representation. Then, we will describe the novel Incremental KNDA (IKNDA) in details.
2.2.1 Batch Kernel Nonparametric Discriminant Analysis (BKNDA)
In this part, we depict the formulation of the batch Kernel Nonparametric Discriminant Analysis (BKNDA) that is employed to reduce dimension and classify feature vectors. Itintroduces a nonparametric form of the within-class scatter matrix and rather than having a Gaussian distribution on the points of like classes, it normalizes the distances between everysingle point and their nearest neighbours, which in effect shows the advantage of the nearestneighbour rule.
We assume that we have \(C_{b}, b=1,2, \dots, L\) classes making an input space of \(N=\sum_{b=1}^{L} n C_{b}\) examples, where each class Cb is formed by nCbsamples in ℝM , namely, Cb = \(\left\{x_{1}^{b}, x_{2}^{b}, \ldots, x_{n C_{b}}^{b}\right\}\).
The BKNDA consists of two stages: it defines the nonlinear mapping and transforms the data samples into a superior dimensional feature space Ƒ, where linear classification can be attained.
Let the function \(\varphi\) maps the classes \(C_{b}, b=1,2, \dots, L\) to higher-dimensional feature class\(F_{b}=\left\{\varphi\left(x_{i}^{b}\right)\right\}_{i=1}^{n C_{b}}, b=1,2, \ldots, L\), respectively.
Notwithstanding, if Ƒ is of a very high dimension, this will be unfeasible to gostraight forwardly into mapping. In this regard, the kernel trick [19] is applied to count the dot products of the higher-dimensional data rather than the examples themselves. In mathematical terms, it can be defined:
\(K\left(x_{i}, x_{j}\right)=\left\langle\varphi\left(x_{i}\right), \varphi\left(x_{j}\right)\right\rangle, \forall \mathrm{i}, \mathrm{j} \in\{1,2, \ldots, \mathrm{N}\}\) (5)
The decision function is described as follows:
\(y\left(x_{i}, \omega\right)=\sum_{i=1}^{N} f_{i}^{x} \omega_{i}+\omega_{0}\) (6)
Where \(\left(f_{1}^{x}, f_{2}^{x}, \ldots, f_{N}^{x}\right): \mathcal{X} \rightarrow F\) depicts a non-linear mapping from the input space to afeature space for the input variable x, \(f_{i}^{x}=K\left(x, x_{i}\right), \forall i \in\{1,2, \ldots, N\} \text { and }\left\{\omega_{i}\right\}_{i=1}^{N}\) are the weights to be estimated.
BKNDA is after an optimal subspace in order to attenuate the separability between classes. The latter is achieved by minimizing the within-class distance of the kernel feature classes all at a time, while also maximizing the between-class scatter matrix distance, through local eigenvectors [20]. It has as objective the finding of the projection direction w in order to increase the objective function that follows:
\(J(\omega)=\frac{\omega^{T} S_{B K} \omega}{\omega^{T} S_{W K} \omega}\) (7)
Where SWK is the within-class scatter matrix and presented as follows:
\(S_{W K}=\sum_{b=1}^{L} K_{b}\left(I-1_{n C_{b}}\right) K_{b}^{T}\) (8)
where \(K_{b}, b=1,2, . ., L\) are the kernel matrices for class \(C_{b}, b=1,2, \ldots, L\) respectively, I is the identity matrix and \(1_{n C_{b}}, b=1,2, \ldots, L\), are the matrices with all entries, \(\frac{1}{n c_{b}}\)respectively.
SBK is the between-class scatter matrix and defined as:
\(\begin{equation} S_{B K}=\frac{1}{N} \sum_{b=1}^{L} \sum_{c=1, c \neq b}^{L} \sum_{i=1}^{n C_{b}} \psi_{i}\left(C_{b,} C_{c}\right) L_{b}\left(\varphi\left(x_{i}^{c}\right)\right) L_{b}\left(\varphi\left(x_{i}^{c}\right)\right)^{T} \end{equation}\) (9)
Where ᴪi are the weighting functions to nullify the outcome of items that are far from the boundary [21]. It is defined as follows:
\(\begin{equation} \Psi_{i}\left(C_{b}, C_{c}\right)=\frac{\min \left\{d\left(\varphi\left(x_{i}^{b}\right), \varphi\left(x N N_{b i}^{\kappa}\right)\right)^{\gamma}, d\left(\varphi\left(x_{i}^{b}\right), \varphi\left(x N N_{c i}^{\kappa}\right)\right)^{\gamma}\right\}}{d\left(\varphi\left(x_{i}\right), \varphi\left(x N N_{b i}^{\kappa}\right)\right)^{\gamma}+d\left(\varphi\left(x_{i}\right), \varphi\left(x N N_{c i}^{\kappa}\right)\right)^{\gamma}} \end{equation}\) (10)
Where \(\begin{equation} \gamma \end{equation}\) is a control parameter which can range from zero to infinity, and \(\begin{equation} d\left(\varphi\left(x_{i}^{b}\right), \varphi\left(x N N_{c i}^{\kappa}\right)\right) \end{equation}\)is the Euclidean distance from \(\begin{equation} x_{i}^{b} \text { to its } \kappa-N N \end{equation}\)'s from class \(\begin{equation} C_{C} \end{equation}\)in the kernel space.
\(\begin{equation} \begin{aligned} \left(L_{b}\left(\varphi\left(x_{i}^{c}\right)\right)\right)_{j}=& K\left(x_{i}^{b}, x_{j}^{b}\right)-\left(M_{c}^{\kappa}\left(\varphi\left(x_{i}^{c}\right)\right)\right)_{j} \cdot \forall i \in\left\{1,2, \ldots, n C_{b}\right\}, \forall j \\ & \in\{1,2, \ldots, N\} \end{aligned} \end{equation}\) (11)
Where \(\begin{equation} M_{c}^{\kappa}\left(\varphi\left(x_{i}^{c}\right)\right)=\frac{1}{\kappa} \sum_{h=1}^{\kappa} \varphi\left(x_{i}^{c}\right)_{N N}(h) \end{equation}\) is the mean of the κ nearest neighbors and \(\begin{equation} \varphi\left(x_{i}^{c}\right)_{N N}(h) \end{equation}\) define the \(\begin{equation} h^{t h} \end{equation}\) nearest neighbor of items \(\begin{equation} x_{i}^{C} \end{equation}\) from class c. More precisely, κ is the free parameter which prescribes the number of neighbors up for consideration. Such aparameter should be upgraded for every single database. Eq. (11) represents the direction of the gradients of the corresponding class density functions in the feature space [22].
Expression (7) can be worked out by getting the leading eigenvectors of \(\begin{equation} S_{W K}^{-1} S_{B K} \end{equation}\). Since, the superior-dimensional feature space Ƒ is of size N . Moreover, numerical difficulty which given the matrix \(\begin{equation} S_{W K} \end{equation}\) not to be positive. Hence, the \(\begin{equation} S_{W K} \end{equation}\) must to be formalised before computing the inverse. This is attained by adding a simply multiple β of the identity matrix I[23].Thus, \(\omega\) is constituted by the eigenvector matching to the biggest eigenvalue constructions the optimal decision hyper-plane of \(\begin{equation} \left(S_{W K}+\beta I\right)^{-1} S_{B K} \end{equation}\).
However, BKNDA shows some difficulties and serious performance degradation in real world application, where data are sequentially acquired or are not present from the out set. Moreover, it requires large storage space and leads to increased training time, chiefly, forlarge scale datasets. Thus, incremental learning strategy is required.
2.2.2 The Novel Incremental version of the BKNDA
In this subsection, we describe the main principle of our novel Incremental KNDA (IKNDA) method that can process sequentially as new samples are added at any time, without the necessity for recalculating the scatter matrices SWK and SBK again. More precisely, I KNDA builds discriminant eigenspace representation without the calculating of the full scale scatter matrices \(S_{W K} \) and \(S_{B K}\), but rather utilising incremental updates of \(S_{W K} \) and \(S_{B K}\), as newsample is added.
Besides, the IKNDA realises dimension diminution and classification of data incrementally by drawing upon the nearest neighbour rule to calculate the local means, and it is capable to treat jointly the heteroscedastic and non normal data, while initially employing a restricted sum of items instances for training.
Furthermore, the proposed IKNDA focuses mainly on mapping the data non-linearly intosome feature space and then calculating incremental nonparametric discriminant analysis there, thereby implicitly performing a non-linear separation based on local nearest-neighbours in input space. Thus, improving classification accuracy as data classes arenonlinearly separable.
In the implementation of the proposed novel IKNDA method we suppose that we have calculated the \(S_{W K} \) and \(S_{B K}\) scatter matrix from at the minimum two classes.
For now, let’s assume that a fresh training pattern is arriving to the algorithm, wedifferentiate between the two cases:
Case1: The fresh training pattern \(\begin{equation} y^{C_{E}} \end{equation}\) belongs to an existing class. This case is illustrated in Fig. 1. Let us assume that the new pattern \(\begin{equation} y^{C_{E}} \end{equation}\) pertains to one of the current classes CE, where 1 < 𝐸 < L.
The liaisons between the different classes in Fig. 1 represent the old and novice state of the classes, earlier and after, append the newest pattern. So, posterior to introducing of thenew element, the expression employed to recursively sum up the \(\begin{equation} S_{B K} \end{equation}\) matrix, is disposed by:
\(\begin{equation} S_{B K}^{\prime}=S_{B K}-S_{B K}^{i n}\left(C_{E}\right)+S_{B K}^{i n}\left(C_{E} \cdot\right)+S_{B K}^{o u t}\left(y^{C_{E}}\right) \end{equation}\) (12)
Where \(\begin{equation} C_{E}^{\prime}=C_{E} \cup\left\{y^{C_{E}}\right\}, y^{C_{E}} \end{equation}\) is the last pattern and is the class of \(\begin{equation} y^{C_{E}} \cdot S_{B K}^{i n}\left(C_{E}\right) \end{equation}\). explicate the covariance matrix between classes, the present and the latent that is likely to be modified, \(\begin{equation} S_{B K}^{i n}\left(C_{E^{\prime}}\right) \end{equation}\) represents the covariance matrix between current classes \(\begin{equation} C_{E} \end{equation}\)and the refreshed class \(\begin{equation} C_{E} \end{equation}\) and by \(\begin{equation} S_{B K}^{o u t}\left(y^{C_{E}}\right) \end{equation}\) we point out the covariance matrix amidst the vector \(y^{C_E}\) and the current classes. The expression to compute these matrices are given as follows:
\(\begin{equation} S_{B K}^{i n}\left(C_{E}\right)=\sum_{j=1, j \neq E}^{L} \sum_{i=1}^{n c_{j}} \Psi_{i}\left(C_{j}, C_{E}\right) L_{j}\left(\varphi\left(x_{i}^{j}\right)\right) L_{j}\left(\varphi\left(x_{i}^{j}\right)\right)^{T} \end{equation}\) (13)
\(\begin{equation} S_{B K}^{o u t}\left(y^{C_{E}}\right)=\sum_{j=1, j \neq E}^{L} L_{j}\left(\varphi\left(y^{C_{E}}\right)\right) L_{j}\left(\varphi\left(y^{C_{E}}\right)\right)^{T} \end{equation}\) (14)
Hence, in this situation the within scatter matrix \(\begin{equation} S_{W K}^{\prime} \end{equation}\) is refreshed applying the following formula:
\(\begin{equation} S_{W K}^{\prime}=\sum_{j=1, j \neq E}^{L} S_{W K}\left(C_{j}\right)+S_{W k}\left(C_{E},\right) \end{equation}\) (15)
Where,
\(\begin{equation} S_{W K}\left(C_{E^{\prime}}\right)=S_{W K}\left(C_{E}\right)+\frac{n_{c_{E}}}{n_{c_{E}}+1}\left(y^{C_{E}}-\overline{\varphi\left(x^{C_{E}}\right)}\right)\left(y^{c_{E}} \overline{-\varphi\left(x^{C_{E}}\right)}\right)^{T} \end{equation}\) (16)
Fig. 1. Case1. The novel appended element pertains to an existing class.
Case 2: The novel training element CL+1 pertainsto to a novel class.
This situation is described in Fig. 2. A novel element \(\begin{equation} y^{C_{L+1}} \end{equation}\)pertains to a novel class \(\begin{equation} C_{L+1} \end{equation}\). The class links present only global \(\begin{equation} S_{B K} \end{equation}\) matrix classes that are up to change by the updateexpression. After the inclusion of the novel element, the equations employed to recursively determinate the \(\begin{equation} S_{B K} \end{equation}\) matrix, are expressed by:
\(\begin{equation} S_{B K}^{\prime}=S_{B K}+S_{B K}^{o u t}\left(C_{L+1}\right)+S_{B K}^{i n}\left(C_{L+1}\right) \end{equation}\) (17)
Where \(\begin{equation} S_{B K}^{o u t}\left(C_{L+1}\right) \end{equation}\) and \(\begin{equation} S_{B K}^{i n}\left(C_{L+1}\right) \end{equation}\) are defined as follows:
\(\begin{equation} S_{B K}^{o u t}\left(C_{L+1}\right)=\sum_{j=1}^{L} L_{j}\left(\varphi\left(y^{C_{L+1}}\right)\right) L_{j}\left(\varphi\left(y^{C_{L+1}}\right)\right)^{T} \end{equation}\) (18)
\(\begin{equation} S_{B K}^{i n}\left(C_{L+1}\right)=\sum_{j=1}^{L} \sum_{i=1}^{n C_{j}} \Psi_{i}\left(C_{j}, C_{L+1}\right) L_{j}\left(\varphi\left(x_{i}^{j}\right)\right) L_{j}\left(\varphi\left(x_{i}^{j}\right)\right)^{T} \end{equation}\) (19)
Concerning the novel \(\begin{equation} S_{W K}^{\prime} \end{equation}\) matrix, this last remains unchanged: \(\begin{equation} S_{W K}^{\prime}=S_{W K} \end{equation}\)
Fig. 2. Case 2. The new append sample belong to a new class.
The proposed IKNDA could be described by the following algorithm:
Algorithm 2. IncrementalKernelNonparametric Discriminant Analysis Algorithm.
In this part, we will present in details the proposed face recognition system.
2.2.3 Face Recognition Method Algorithm
The process of the face recognition system is presented along these lines.
Algorithm 3. Face Recognition method.
2.2.4 Block Diagram of the Face Recognition Method
In this part, we will represent in details the process of our face recognition system:
Fig. 3. Block Diagram of the Face Recognition Method.
3. Experimental Results
In this part, we are going to delineate a decontextualized comparison and assessment of the IKNDA to the BKNDA in the matter of classification accuracy, to highlight the prominence of our incremental model over batch model, as well as, other relevant state-of-the-artincremental classifiers. Then, we will perform an evaluative comparison of the IKNDA in the face recognition context, to show the advantage of incrementally updating eigenspace after each sample is presented.
3.1 Decontextualized Evaluation and Comparison
In this subsection, we employ real world datasets to evaluate the IKNDA and compare it to the BKNDA, the Incremental Kernel Fisher’s Discriminant Analysis (IKFD) [14], the Incremental Linear Discrimant Analysis (ILDA), and the Incremental Nonparametric Discriminant Analysis (INDA) [12], in terms of classification performance. For datakernelization, we utilize the Gaussian RBF kernel \(K(x,x_i)=\begin{equation} e^{-\left\|x-x_{i}\right\|^{2} / \sigma} \end{equation}\) . In fact, it was shown to be flexible and robust [24]. Here σ denotes positive “width” parameter. Given that the purpose of the decontextualized evaluation is to show case the performance of the IKNDA in a general context, the selected databases have no connection to Gabor and Ordinal analysis. These datasets are extracted from the UCI machine learning repository and selected minutely, so much so we have heterogeneity of dimensions and sizes, and then we may put to examination the robustness of our IKNDA against different feature sizes. Moredetails regarding the original provenance of these databases are to hand in a highly comprehensive online repository [25]. A thorough depiction of the utilized datasets is represented in Table 1.
To make certain that the attained results are premiditated and unbiased, we utilize a 10 fold cross validation process [26]. More precisely, each used dataset was arbitrarily divided into 10 subsets of identical size. To establish a model, one of the 10 subsets was eliminate torepresent test samples, and the remainder was applied as the training data. Finally, the accuracy over all models is calculated by averaging the 10 obtained accuracy estimates.
For the BKNDA and IKNDA, cross-validation and grid search are used in order to find the combination of hyperparameters, namely, σ and κ that yields the best classification performance. As far as the IKFD is concerned, the tuning of σ is achieved using cros-validation. Concerning INDA, we achieved 10 separate runs with various nearest neighbournumbers κ ∈ {1,2, … ,10}, respectively, and then we choice the value of κthat afford the finest classification efficiency. In order to evaluate and compare two class classifiers in termof performance, the Receiver Operating Characteristic (ROC) curves [27] are generally used. The ROC curve represents a powerful computation of the efficiency of studied classifiers. It does not call for the number of testing or training data points, but instead it requires only therates of appropriate and inappropriate samples classification. The ROC curve is determined by plotting the True Positive Rate (TPR) vs the False Positive Rate (FPR). To evaluate the different methods, the Area Under Curve is made use of (AUC) [28], produced by the ROC curves, and we presented them in Table 2.
Table 1. Details of the real databases
We can clearly notice from Table 2, that by averaging over 10 different models, I KNDAoutperforms the ILDA, IKFD, INDA, IKSVM and BKNDA for all used datasets, in terms of computed AUCs. Unsurprisingly, ILDA gives almost the worst accuracy values. In fact, the ILDA utilizes non-flexible linear boundary so as to discriminate between face classes. Itsperformance is lacking when exercised on issues that call for nonlinear decision boundaries. Still, the efficiency of ILDA dwindles as the underlying class distribution is not normal. The IKFD provides better results than the ILDA, since it utilizes the Kernel trick to create a flexible non-linear decision boundary between different classes. Generally speaking, in terms of AUC, there is a greater difference between the ILDA and the INDA, BKNDA and IKNDA than there is between the ILDA and the IKFD. This was expected as the INDA, BKNDA and IKNDA, are formulated on the principle that the normal vectors on the decision boundary are more relevant for distinction making. Thus, in feature space, the normalvectors are used to compute the between-class scatter matrix on a local basis in theneighbourhood of the decision boundary. As pleasing consequence, they allow the relaxation of normality assumption [39]. The IKNDA outperforms the IKSVM since this latter can bemisled by data spread, as its resolution boils down to the limited number of support vectors. Obviously, the BKNDA and IKNDA outperform the INDA thanks to the kernelization. Finally, according to Table 2, the IKNDA and BKNDA perform almost the same ondifferent datasets, in terms of classification performance. More precisely, the I KNDAslightly outperforms the BKNDA on the majority of datasets. In fact, this is interpreted by the samples introduction order, which has an effect on the evaluation of the nearestneighbours and then the computation of the between scatter matrix . Also, the estimation of the nearest neighbours in incremental could affords lightly cleaner solution.
To further show case the progress in efficiency by IKNDA, we introduce a number of statistical measurements. We put to use a paired t-tests on the AUC values by pairing up the IKNDA technique with every technique at its definite time. The paired t-test discloses if the two sets of measured values are significantly different. The last row of Table 2 provides the confidence intervals (in %) attained from the performed t tests. This confidence intervalevaluates the probability of the paired distributions being the same. As to when the confidence interval is high, it is unlikely that the underlying distributions are statisticallysimilar. Noticeably, confidence intervals in their entirety are high (very close to 100%), which makes plain that IKNDA unveils many statistically significant accuracy improvements.
Table 2. Average AUC of every single formula for the 13 real-world datasets (best formula in bold, second best emphasized)
3.2 Contextualized Evaluation and Comparison
In this subsection, we evaluate our novel adaptive face recognition method. Moreover, we make a comparative evaluation of the IKNDA to the BKNDA, ILDA, IKFD, IKSVM and INDA to showcase the efficiency of the formula we advance, in terms of classification performance and training runtime, in the face recognition context. For this reason, we have used the Yale [32], ORL [33], UMIST [34], AR [35] and Indian [36] face databases. Moreover, for the same datasets, we have compared the different face recognition methods, using Local Normalization- Local Gabor Binary Pattern (LN-LGBP) descriptor [30] and [31] instead of GOM descriptor for face image representation, in order to show the advantage of the GOM descriptor. A detailed description of the used face databases is represented in Table 4.
Table 3. Description of the face databases
For each Yale, ORL, UMIST, AR and Indian face image, we employed a set of 2D Gaborwavelets made up of 5 frequencies and 8 orientations on every facial image. Ordinal measures are extracted from diverse components of Gabor images, applying di-lobe and four-lobe Multi-Lobe Differential Filters (MLDF), with orientation values equal to 0˚, 45˚, 90˚, 135˚, 4 pixel-inter-lobe distance and 5×5 lobe size. Subsequently, each acquired image is split into blocks that correspond to feature vectors. Hence forward, we have calculated the histograms of feature vector, which can be shrunk to encapsulate only 64 bins.
Finally, we have fallen back on PCA to reduce the dimension of feature vector.
To evaluate and compare every single method’s performance, we have employed classification accuracy. This is determined by \(\begin{equation} 100 \frac{N_{C C}}{N} \end{equation}\)%, where N is the sum of data points (testing data) and NCC is the sum of points classified correctly. Also, for kernelization we have used the Gaussian RBF. Concerning the computation of the hyperparameters σ and κ,we performed the same setting as the decontextualized comparative evaluation.
Table 5, Table 6 and Table 6 illustrate, respectively, the training times and classificationaccuracies of the compared classifiers, using LN-LGBP and GOM descriptors, on the facedatabases.
According to Table 6 and Table 6, we can see that the IKNDA outperforms the otherclassifiers when it comes to recognition precision. This was expected, as the IKNDA is anincremental model based on flexible non-linear boundary, and it relaxes normality assumption. Also, we can notice that the face recognition methods based on GOM descriptoroutperform the ones based LN-LGBP descriptor. In fact, unlike the GOM descriptor, the LN-LGBP descriptors not robust to noise and face occlusions caused by the image acquisitionenvironment. Moreover, LN-LGBP descriptor considers only the qualitative relationship between two pixels, whereas GOM descriptor deploys ordinal measures among multiple image regions in intensity and feature levels.
As for computational complexity, the most time consuming phase consists of determining the \(\begin{equation} S_{B K} \end{equation}\)matrix (consisting finding the most proximate neighbors). Envisaging to show the performance of IKNDA, we calculated the computational time required to update \(\begin{equation} S_{B K} \end{equation}\) (using Eqs (12) and (17)) and compared it with time required by the Batch KNDA to recalculate \(\begin{equation} S_{B K} \end{equation}\) (using Eq (9)) each time from the beginning. More precisely, for each used face database, lysis we successively updated the \(\begin{equation} S_{B K} \end{equation}\)matrix with all face images. According to Table 5, the timeneeded for IKNDA to update the \(\begin{equation} S_{B K} \end{equation}\)matrix employing the incremental technique, as newsample is added, is significantly inferior than recalculating the from the beginning, using BKNDA. IKNDA approach more relevant for real-time applications.
Table 4. Training times in seconds for BKNDA and IKNDA on the face databases (using GOM descriptor).
Table 5. Classification accuracy of each classifier on the face databases, using GOM descriptor (best method in bold, second best emphasized).
Table 6. Classification accuracy of each classifier on the face databases, using LN-LGBP descriptor (best method in bold, second best emphasized).
4. Conclusion
In the present, a novel adaptive face recognition system formulated on Incremental Kernel Nonparametric Discriminant Analysis (IKNDA) is what we advance. This latter is based onincremental learning process, where new items may be adjoined asynchronously, overdistinct time stamps, the moment they become accessible. The IKNDA draws upon the most proximal neighbor rule. We fell back on a decontextualized comparison of the IKNDA torelevant state-of-the-art classification algorithms ILDA, IKFD, IKSVM and INDA on real world datasets. Experiments proved that the IKNDA outdid these algorithms, onclassification efficiency. Moreover, a contextualized comparison and evaluation has shown the robustness of the presented adaptive face recognition method and the superiority of the IKNDA in terms of face recognition performance and computational running time. In future work, since IKNDA was shown to be more performant than the incremental SVM, we caninvestigate its effectiveness to classify deep face features and to improve face recognition performance.
References
- A. K Jain and Li. S. Z, "Handbook of Face Recognition," New York, Springer, 2011.
- Patel Anjali and Verma, Ashok, "IOT based Facial Recognition Door Access Control Home Security System," International Journal of Computer Applications, vol.172, no.7, pp.11-17, 2017. https://doi.org/10.5120/ijca2017915177
- Zhao Wenyi, Chellappa Rama, Phillips P. Jonathon and Azriel Rosenfeld, "Face recognition: a literature survey," ACM Computing Surveys. New York. NY. USA, vol. 35, no. 4, pp. 399-458, 2003. https://doi.org/10.1145/954339.954342
- Bonnen Kathryn, Klare Brendan F and JAIN Anil K, "Component-based representation in automated face recognition," IEEE Transactions on Information Forensics and Security (IFS), vol. 8, no.1, pp. 239-253, 2013. https://doi.org/10.1109/TIFS.2012.2226580
- Chai Zhenhua, Sun Zhenan, Mendez-Vazquez Heydi, Ran He and Tiemiu Tan, "Gabor Ordinal Measures for Face Recognition," IEEE Transactions on Information Forensics and Security, vol. 9, no.1, pp-14-26, 2014. https://doi.org/10.1109/TIFS.2013.2290064
- Li Zhifeng, Lin Dahua and Tang Xiaoou, "Nonparametric Discriminant Analysis for Face Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no.4, pp. 755-761, 2009. https://doi.org/10.1109/TPAMI.2008.174
- Arbia Soula, Salma Ben Said, Riadh Ksantini and Zied Lachiri, "A Novel Kernelized Face Recognition System," in Proc. of IEEE conf. on Control Engineering & Information Technology, pp.1-5, 2016.
- Arbia Soula, Salma Ben Said, Riadh Ksantini and Zied Lachiri, "A Novel Face Recognition System Base on Nonparametric Discriminant Analysis," in Proc. of IEEE conf. on Control, Automation and Diagnosis, pp. 1-5, 2017.
- Parchami Mostafa, Bashbaghi Saman and Granger Eric, "Video-based face recognition using ensemble of haar-like deep convolutional neural networks," in Proc. of International Joint Conference in Neural Networks (IJCNN), pp. 4625-4632, 2017.
- Dat Tien Nguyen, Tuyen Danh Pham, Na Rae Baek and Kang Ryoung Park, "Combining Deep and Handcrafted Image Features for Presentation Attack Detection in Face Recognition Systems Using Visible-Light Camera Sensors," Sensors, vol.18, no. 3, pp.699, 2018. https://doi.org/10.3390/s18030699
- Naimul Mefraz Khan, Xiaoming Nan, Azhar Quddus, Edward Rosales and Ling Guan, "On video based face recognition through adaptive sparse dictionary," in Proc. Of 11th IEEE International Conference on Automatic Face and Gesture Recognition (FG), pp. 1-6, 2015.
- Raducanu Bogdan and Vitria Jordi, "Online nonparametric discriminant analysis for incremental subspace learning and recognition," Pattern Analysis and Application, vol. 11, no. 3-4, pp. 259-268, 2008. https://doi.org/10.1007/s10044-008-0131-0
- Arbia Soula, Salma Ben Said, Riadh Ksantini and Zied Lachiri, "A Novel Incremental Face Recognition Method Based on Nonparametric Discriminant Model," in proc. Of 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pp.1-5, 2018.
- Hsien-Ting Cheng, Yi-Hsiang Chao, Shih-Liang Yeh, Chu-Song Chen, Hsin-Min Wang and Yi-Ping Hung "An efficient approach to multimodal person identity verification by fusing face and voice information," in proc. Of IEEE International Conference on Multimedia and Expo (ICME), pp.542-545, 2005.
- Pawel Karczmarek, Adam Kiersztyn, Witold Pedrycz and Michal Dolecki, "An application of chain code-based local descriptor and its extension to face recognition," Pattern Recognition, vol. 65, pp. 26-34, 2017. https://doi.org/10.1016/j.patcog.2016.12.008
- Lufan Li, Zhang Jun, Jiawei Fei and Shuohao Li, "An incremental face recognition system based on deep learning," in Proc. of 15th IAPR International Conference on Machine Vision Applications (MVA), 2017.
- Wiskott, Laurenz, Fellous, Jean-Marc, Kruger, Norbert and Christoph. vonder Malsburg, "Face recognition by elastic bunch graph matching," PAMI, vol. 19, pp.775-779, 1997. https://doi.org/10.1109/34.598235
- Angel Serrano, Isaac Martin de Diego, Cristina Conde and Enrique Cabello, "Recent advances in face biometrics with Gabor wavelets: a review," Pattern Recognition Letters (PRL), vol.31, no.5, pp. 372-381, 2010. https://doi.org/10.1016/j.patrec.2009.11.002
- V. Vapnik, "Statistical Learning Theorym," Wiley, New York, 1998.
- R.O. Duda, P.E Hart and D.G Stork, "Pattern Classification," 2nd edition, Wiley, New York, 2000.
- K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition. Academic Press, London, 2000.
- Keinosuke Fukunaga and Hostetler Larry, "The estimation of the gradient of a density function, with applications in pattern recognition," in Proc. of IEEE Transactions on Information Theory, vol.1, no.21, pp-32-40, 1975.
- S. Mika, G. Ratsch, J. Weston, B. Scholkopf and K. Mullers, "Fisher discriminant analysis with kernels," in Proc. of IEEE signal processing society workshop In Neural networks for signal processing, pp. 41-48, 1999.
- M. Cristianini and J. Shawe-Taylor, "An Introduction to Support Vector Machines," Cambridge University Press, Cambridge, 2000.
- M. Lichman, "Machine Learning Repository," University of California, Iverine, School of Information and computer Sciences, 2013.
- Cesare Alippi and Manuel Roveri, "Virtual k-fold cross validation: an effective method for accuracy assessment," in Proc. of International Joint Conference on Neural Networks, pp.1-6, 2010.
- Tom Fawcett, "An introduction to ROC analysis," Pattern Recognition letters, vol. 27, pp. 861-874, 2006. https://doi.org/10.1016/j.patrec.2005.10.010
- Hanley James A and Mcneil Barbara J, "A method of comparing the areas under receiver operating characteristic curves derived from the same cases," Radiology, vol. 148, no. 3, pp. 839-843, 1983. https://doi.org/10.1148/radiology.148.3.6878708
- K. Fukunaga, "Introduction to Statistical Pattern Recognition," 2nd edition. Academic Press, London, 2000.
- Xie Xudong and Lam Kin-Man, "An efficient illumination normalization method for face recognition," Pattern Recognition Letters, vol. 27, no.6, pp. 609-617, 2006. https://doi.org/10.1016/j.patrec.2005.09.026
- Jiang Yanxia and Ren Bo, "Face Recognition using Local Gabor Phase Characteristics," in Proc. of IEEE International Conference on Intelligence and Software Engineering, pp. 1-4, 10-12 December 2010.
- Georghiades, "A- Yale face database," 1997.
- Yu Hua and Yang Jie, "A direct LDA algorithm for high-dimensional data with application to face recognition," Pattern Recognition, vol.34, pp. 2067-2070, 2001. https://doi.org/10.1016/S0031-3203(00)00162-X
- Face Database.
- AM. Martinez, R. Benavente, "The AR face database," Computer Vision Center, 1998.
- V. Jain, A. Mukherjee, "The indian face database," 2002.
Cited by
- Incremental Subclass Support Vector Machine vol.28, pp.7, 2019, https://doi.org/10.1142/s0218213019500209