DOI QR코드

DOI QR Code

Vehicle Face Re-identification Based on Nonnegative Matrix Factorization with Time Difference Constraint

  • Ma, Na (School of Business Administration, Liaoning Technical University) ;
  • Wen, Tingxin (School of Business Administration, Liaoning Technical University)
  • Received : 2021.01.19
  • Accepted : 2021.05.31
  • Published : 2021.06.30

Abstract

Light intensity variation is one of the key factors which affect the accuracy of vehicle face re-identification, so in order to improve the robustness of vehicle face features to light intensity variation, a Nonnegative Matrix Factorization model with the constraint of image acquisition time difference is proposed. First, the original features vectors of all pairs of positive samples which are used for training are placed in two original feature matrices respectively, where the same columns of the two matrices represent the same vehicle; Then, the new features obtained after decomposition are divided into stable and variable features proportionally, where the constraints of intra-class similarity and inter-class difference are imposed on the stable feature, and the constraint of image acquisition time difference is imposed on the variable feature; At last, vehicle face matching is achieved through calculating the cosine distance of stable features. Experimental results show that the average False Reject Rate and the average False Accept Rate of the proposed algorithm can be reduced to 0.14 and 0.11 respectively on five different datasets, and even sometimes under the large difference of light intensities, the vehicle face image can be still recognized accurately, which verifies that the extracted features have good robustness to light variation.

Keywords

1. Introduction

Traditionally, the vehicles with fake plates are mainly detected by viewing videos manually, however, with the rapid increase in the numbers of vehicles, the manual detection methods will face great difficulties due to the massive amounts of video data. Therefore, it is of great significance to propose an automatic and effective vehicle re-identification algorithm. For vehicle re-identification, the biggest challenge is that affected by the difference of light intensities, there may be large differences between the captured images representing the same vehicle as shown in Fig. 1, which brings great difficulties to re-identification, so it can be concluded that it is very critical to obtain stable and effective vehicle face features under various lighting conditions [1].

E1KOBZ_2021_v15n6_2098_f0001.png 이미지

Fig. 1. The captured images of the same vehicle at different times

The remainder of this paper is organized as follows. In section 2, some related works are addressed. The proposed NMF model for vehicle face recognition is proposed in section 3. In section 4, a projected gradient algorithm is used to solve the objective function of the proposed NMF model, and the matching method of vehicle face features is given in section 5. In section 6, the proposed algorithm is proved effectively through experiments. Finally, the conclusion is drawn in section 7.

2. Related Work

Nowadays, the existing vehicle recognition methods mainly include the following two categories, one is based on artificial extracted features, the other is based on the features which are obtained automatically through deep learning. The artificial features can be divided into the low-level features and the high-level features, where the low-level image features include color feature [2-3], edge feature [4], texture feature [5], and shape feature [6], et al; and scale key point features [7-8] and 3D model feature [9-12] can be considered as the high-level features. However, the artificial features depend on human experience to a large extent, and the deep information of image is not easy to be mined, so the effectiveness of artificial features is hard to be ensured. Therefore, the deep learning based vehicle recognition algorithms are paid more attention in recent years, which include some traditional deep learning models such as Convolutional Neural Network model [13-15], Deep Belief Network model [16-17], Transfer learning model [18-20], Restricted Boltzmann Machine [21-23], and some improved models such as Conv5 [24], Teacher-Student Network [25], Parsing-based View-aware Embedding Network [26], Semantics-guided Part Attention Network [27], the model fused by multiple networks [28], and the network based on reconstruction [29], et al. For the supervised vehicle classification problem, these deep learning methods have achieved good results, but for vehicle face matching problem under the conditions that the times of each vehicle being captured is very limited and the number of the training samples is too small, the universalities of these models are not very well. Therefore, under a limited number of vehicle face samples, it is very meaningful to propose a vehicle re-identification algorithm with good robustness and universality.

Nonnegative Matrix Factorization (NMF) can obtain effective basis feature images for image classification, and it has also achieved good results in vehicle face recognition in recent years [1,30]. Therefore, in the paper we propose a new vehicle face re-identification method based on improved NMF, which takes into account the image difference caused by the capturing times. In the proposed algorithm, the variable features and stable features which are easily affected and not easily affected respectively by light variation can be obtained after model training, then the stable features are used to judge whether two vehicle face images can match or not.

3. Proposed Model

After the vehicle image is captured by the surveillance cameras on the highway, in order to remove the useless information in the image and obtain the effective vehicle face features, it is necessary to segment the vehicle face region firstly in the captured image. Yolo models have been proven to be very effective in target detection, and since Yolov5 model has the advantages such as small network structure and fast processing speed, it is selected to segment the vehicle face region in the image in the proposed algorithm, where the codes of Yolov5 model and the weight files can be downloaded from https://github.com/ultralytics/yolov5 and https://github.com/ultralytics/yolov5/releases/tag/v5.0 respectively, and the segmentation results are shown in Fig. 2.

E1KOBZ_2021_v15n6_2098_f0002.png 이미지

Fig. 2. Vehicle face segmentation results

As we know, it is very important to obtain effective feature basis vectors through dimensionality reduction for object recognition, where the common dimensionality reduction methods include principal component analysis (PCA), linear discriminant analysis (LDA), local preserving projection (LPP), et al. From the principles of these methods, we know that the elements in the feature basis vectors and coefficient vectors can be either positive or negative, and the negative elements are reasonable from in terms of mathematical operation. However, for image processing, the negative elements are difficult to explain reasonably, for example, the pixels in the basis images and the weights can both not be negative. Therefore, we use NMF model to ensure the non-negativity of the decomposed matrices, where the original NMF model is shown as (1),

\(\boldsymbol{F}_{\operatorname{m \times n}} \approx \boldsymbol{U}_{\operatorname{m \times r}} \boldsymbol{V}_{r \times n}, \quad u_{i k}, v_{k j} \geq 0\)       (1)

where the columns of F, U and V represent original feature vectors, basis vectors and coefficient vectors respectively, and each coefficient vector is usually regarded as the new feature vector [31].

In the proposed algorithm, two images which represent the same vehicle can be seen as a pair of training samples, and all pairs of training samples are placed in two original feature matrices, i.e., F1 and F 2, where the column vectors at the same position in F1 and F2 represent the same vehicle. Therefore, the whole decomposition error can be obtained through fusing the decomposed errors of F1 and F2 as shown in (2),

\(J_{e r r}=\frac{1}{2} \sum_{i=1}^{2}\left\|\boldsymbol{F}_{i}-\boldsymbol{U} \boldsymbol{V}_{i}\right\|_{2}\)       (2)

where the smaller the decomposition error, the more accurate the vehicle face feature.

From the perspective of feature stability, it can be considered that each vehicle face image contains two types of features as shown in Fig. 3. One is the stable features which are robust to illumination variation such as the shape of vehicle windows and grid, the other is the variable features such as vehicle color and vehicle light brightness, et al, which will change with the light intensity.

E1KOBZ_2021_v15n6_2098_f0003.png 이미지

Fig. 3. Two types of features in vehicle face image

From the above analysis, we can see that it is very important to distinguish stable features and variable features from vehicle face image for vehicle re-identification. Therefore, in addition to ensuring the non-negativity of the decomposition results, some constraints which are conductive to accurate identification should be imposed on the stable and variable features, where the constraints are shown as follows:

1) The orthogonal constraint of stable features. After matrix factorization, the jth column in Vi can be regarded as the new feature of the jth image in Fi, i.e., Vi(j). In our proposed algorithm, the first K dimensions in the feature vector Vi(j) can be supposed as the stable feature, i.e., Si(j), where the solution of Si(j) is shown as (3),

\(\boldsymbol{S}_{i}^{(j)}=\boldsymbol{Z} \boldsymbol{V}_{i}^{(j)}\)       (3)

\(\boldsymbol{Z}=\left[\begin{array}{cc} \boldsymbol{I}_{k \times k} & \mathbf{0}_{k \times(r-k)} \\ \mathbf{0}_{(r-k) \times k} & \mathbf{0}_{(r-k) \times(r-k)} \end{array}\right]_{r \times r}\)       (4)

Ik×k is k identity matrix, 0k×(r−k) is k×(r−k) zero matrix, and r is the dimension of coefficient vector.

The stable feature of vehicle face image should have the following characteristics, that is, even if there is a large illumination difference when capturing the same vehicle two times, the stable features in two captured vehicle images still have strong similarities, on the contrary, the stable features which represent different vehicles should be different from each other, i.e., they should be as orthogonal as possible as shown in (5),

\(\left\{\begin{array}{l} \left\langle\boldsymbol{S}_{1}^{(j)}, \boldsymbol{S}_{2}^{(g)}\right\rangle \rightarrow 1, \quad j=g \\ \left\langle\boldsymbol{S}_{1}^{(j)}, \boldsymbol{S}_{2}^{(g)}\right\rangle \rightarrow 0, \quad j \neq g \end{array}\right.\)       (5)

\(\left(\boldsymbol{S}_{1}\right)^{\mathrm{T}} \boldsymbol{S}_{2} \rightarrow \boldsymbol{I}_{m \times m}\)       (6)

\(\boldsymbol{S}_{1}=\left[\begin{array}{llll} \boldsymbol{S}_{1}^{(1)} & \boldsymbol{S}_{1}^{(2)} & \cdots & \boldsymbol{S}_{1}^{(m)} \end{array}\right]\)       (7)

where m is the number of columns in Fi , i.e., the number of the pairs of training samples.

From the above analysis, the function of measuring the orthogonality of stable features can be obtained as shown in (8).

\(J_{s t a}=\left\|\left(\boldsymbol{S}_{1}\right)^{\mathrm{T}} \boldsymbol{S}_{2}-\boldsymbol{I}_{m \times m}\right\|_{2}\)       (8)

2) The similar constraint of weighted variable features. Since the feature vector Vi is composed of stable features and variable features after dimensionality reduction, the last r-k dimensions in Vi can be regarded as the variable features, whose solution is shown in (9),

\(\boldsymbol{C}_{i}^{(j)}=\left(\boldsymbol{I}_{r \times r}-\boldsymbol{Z}\right) \boldsymbol{V}_{i}^{(j)}\)       (9)

We know that the light intensity changes continuously with time, so the larger the interval of the times of capturing two images, the more uncertain the difference between variable features will be. On the contrary, if the time interval is small, the variable features will be similar relatively. Therefore, we can conclude that if the interval of the times of capturing the same vehicle is large, the negative effects of variable features on re-identification will be obvious. In order to reduce the impact of variable features, we will assign different weights to the variable features according to the time difference of image acquisitions as shown in (10), where the capturing time can be obtained through surveillance camera,

\(T_{j}=1-\frac{\left|T_{j 1}-T_{j 2}\right|}{T_{d}}\)       (10)

Tj represents the similarity of the capturing times of the jth pair of vehicle face images, i.e., the weight, Tj1 and Tj2 are the capturing times of two images, and Td is the largest time difference, 0 ≤ T≤1. Therefore, the difference between the variable features of the jth pair of vehicle face images can be measured by (11),

\(J_{j-\operatorname{var}}=T_{j}\left(\boldsymbol{C}_{1}^{(j)}\right)^{\mathrm{T}} \boldsymbol{C}_{2}^{(j)}\)       (11)

and the differences between the variable features of all pairs of vehicle face images can be obtained through (12).

\(J_{v a r}=\operatorname{tr}\left(\boldsymbol{T}\left(\boldsymbol{C}_{1}\right)^{\mathrm{T}} \boldsymbol{C}_{2}\right)\)       (12)

\(\boldsymbol{T}=\left[\begin{array}{cccc} T_{1} & 0 & \cdots & 0 \\ 0 & T_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & T_{m} \end{array}\right]_{m \times m}\)       (13)

\(\boldsymbol{C}_{1}=\left[\begin{array}{llll} \boldsymbol{C}_{1}^{(1)} & \boldsymbol{C}_{1}^{(2)} & \cdots & \boldsymbol{C}_{1}^{(m)} \end{array}\right]_{r \times m}\)       (14)

When the capturing times of the two images which represent the same vehicle are close enough, the variable features of the two images should be similar, i.e., if Tj → 1, \(\left(\boldsymbol{C}_{1}^{(j)}\right)^{\mathrm{T}} \boldsymbol{C}_{2}^{(j)} \rightarrow 1\) ; On the contrary, if the difference of capturing times is large, the similarity between the variable features of the two images will be uncertain, i.e., if Tj → 0, \(0 \leq\left(\boldsymbol{C}_{1}^{(j)}\right)^{\mathrm{T}} \boldsymbol{C}_{2}^{(j)} \leq 1\). From the above analysis, the greater the difference measure function var Jvar, the more helpful for re-identification.

In summary, the objective function of the proposed model is shown in (15), and the optimal parameters U, V1, V2 will be obtained by (16).

\(J=J_{e r r}+\alpha J_{s t a}-\beta J_{v a r}\)       (15)

\(\boldsymbol{U}^{*}, \boldsymbol{V}_{1}^{*}, \boldsymbol{V}_{2}^{*}=\underset{U, V_{1}, V_{2}}{\arg \min }\left\{J_{\text {err }}+\alpha J_{\text {sta }}-\beta J_{\text {var }}\right\}\)       (16)

4. Objective Function Solution Based on Gradient Descent Method

First, in order to solve the objective function more conveniently, Equation (2), Equation (8) and Equation (12) can be further transformed into (17), (18) and (19),

\(\begin{aligned} J_{e r r} &=\frac{1}{2} \sum_{i=1}^{2}\left\|F_{i}-U V_{i}\right\|_{2} \\ &=\frac{1}{2} \sum_{k=1}^{2} t r\left(\left(F_{i}-U V_{i}\right)^{\mathrm{T}}\left(F_{i}-U V_{i}\right)\right) \\ &=\frac{1}{2} \sum_{k=1}^{2} \operatorname{tr}\left(F_{i}^{\mathrm{T}} F_{i}-V_{i}^{\mathrm{T}} U^{\mathrm{T}} F_{i}-F_{i}^{\mathrm{T}} U V_{i}+V_{i}^{\mathrm{T}} U^{\mathrm{T}} U V_{i}\right) \end{aligned}\)       (17)

\(\begin{aligned} J_{s t a} &=\left\|\left(\boldsymbol{V}_{1}^{s t a}\right)^{\mathrm{T}} \boldsymbol{V}_{2}^{s t a}-\boldsymbol{I}_{m \times m}\right\|_{2} \\ &=\left\|\left(\boldsymbol{Z} \boldsymbol{V}_{1}\right)^{\mathrm{T}}\left(\boldsymbol{Z} \boldsymbol{V}_{2}\right)-\boldsymbol{I}_{m \times m}\right\|_{2} \\ &=\operatorname{tr}\left(\left(\boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2}-\boldsymbol{I}_{\text {mxm }}\right)^{\mathrm{T}}\left(\boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2}-\boldsymbol{I}_{m \times m}\right)\right) \\ &=\operatorname{tr}\left(\boldsymbol{V}_{2}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{1} \boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2}-\boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2}-\boldsymbol{V}_{2}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{1}+\boldsymbol{I}_{m \times m}\right) \end{aligned}\)       (18)

\(\begin{aligned} &J_{v a r}=\operatorname{tr}\left(\boldsymbol{T}\left(\boldsymbol{V}_{1}^{\text {var }}\right)^{\mathrm{T}} \boldsymbol{V}_{2}^{\text {var }}\right) \\ &=\operatorname{tr}\left(\left(\left(\boldsymbol{I}_{r \times r}-\boldsymbol{Z}\right) \boldsymbol{V}_{1}\right)^{\mathrm{T}}\left(\boldsymbol{I}_{r \times r}-\boldsymbol{Z}\right) \boldsymbol{V}_{2}\right) \\ &=\operatorname{tr}\left(\boldsymbol{V}_{1}^{\mathrm{T}}\left(\boldsymbol{I}_{r \times r}-\boldsymbol{Z}\right) \boldsymbol{V}_{2}\right) \end{aligned}\)       (19)

and the derivatives of (15) with respect to U, V1 and V2 are solved as shown in (20), (21) and (22).

\(\frac{\partial J}{\partial \boldsymbol{U}}=\sum_{i=1}^{2}\left(-\boldsymbol{F}_{i} \boldsymbol{V}_{i}^{\mathrm{T}}+\boldsymbol{U} \boldsymbol{V}_{i} \boldsymbol{V}_{i}^{\mathrm{T}}\right)\)       (20)

\(\frac{\partial J}{\partial V_{1}}=U^{\mathrm{T}} \boldsymbol{F}_{1}+\boldsymbol{U}^{\mathrm{T}} \boldsymbol{U} \boldsymbol{V}_{1}+2 \alpha\left(\boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2} \boldsymbol{V}_{2}^{\mathrm{T}} \boldsymbol{Z}-\boldsymbol{Z} \boldsymbol{V}_{2}\right)-\beta\left(\boldsymbol{I}_{r \times r}-\boldsymbol{Z}\right) \boldsymbol{V}_{2}\)       (21)

\(\frac{\partial J}{\partial \boldsymbol{V}_{2}}=\boldsymbol{U}^{\mathrm{T}} \boldsymbol{F}_{2}+\boldsymbol{U}^{\mathrm{T}} \boldsymbol{U} \boldsymbol{V}_{2}+2 \alpha\left(\boldsymbol{V}_{2}^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{1} \boldsymbol{V}_{1}^{\mathrm{T}} \boldsymbol{Z}-\boldsymbol{Z} \boldsymbol{V}_{1}\right)-\beta\left(\boldsymbol{I}_{\text {rxc }}-\boldsymbol{Z}\right) \boldsymbol{V}_{1}\)       (22)

Then, according to (20), (21) and (22), the iterative rules of U, V1 and V2 can be obtained as shown in (23), (24) and (25).

\(u^{(x, y), t+1} \leftarrow u^{(x, y), x}\left(\frac{\sum_{i=1}^{2} F_{i}\left(V_{i}^{t}\right)^{\mathrm{T}}}{\sum_{i=1}^{2} U^{t} V_{i}^{t}\left(V_{i}^{t}\right)^{\mathrm{T}}}\right)^{(x, y)}\)       (23)

\(v_{1}^{(x, y), t+1} \leftarrow v_{1}^{(x, y), t}\left(\frac{2 \alpha \boldsymbol{Z} \boldsymbol{V}_{2}^{t}+\beta \boldsymbol{V}_{2}^{t}}{\left(\boldsymbol{U}^{t}\right)^{\mathrm{T}} \boldsymbol{F}_{1}+\left(\boldsymbol{U}^{t}\right)^{\mathrm{T}} \boldsymbol{U}^{t} \boldsymbol{V}_{1}^{t}+2 \alpha\left(\boldsymbol{V}_{1}^{t}\right)^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{2}^{t}\left(\boldsymbol{V}_{2}^{t}\right)^{\mathrm{T}} \boldsymbol{Z}+\beta \boldsymbol{Z} \boldsymbol{V}_{2}^{t}}\right)^{(x, y)}\)       (24)

\(v_{2}^{(x, y), t+1} \leftarrow v_{2}^{(x, y), t}\left(\frac{2 \alpha \boldsymbol{Z} \boldsymbol{V}_{1}^{t}+\beta \boldsymbol{V}_{1}^{t}}{\left(\boldsymbol{U}^{t}\right)^{\mathrm{T}} \boldsymbol{F}_{2}+\left(\boldsymbol{U}^{t}\right)^{\mathrm{T}} \boldsymbol{U}^{t} \boldsymbol{V}_{2}^{t}+2 \alpha\left(\boldsymbol{V}_{2}^{t}\right)^{\mathrm{T}} \boldsymbol{Z} \boldsymbol{V}_{1}^{t}\left(\boldsymbol{V}_{1}^{t}\right)^{\mathrm{T}} \boldsymbol{Z}+\beta \boldsymbol{Z} \boldsymbol{V}_{1}^{t}}\right)^{(x, y)}\)       (25)

In final, we will optimize the parameters U, V1 and V2 according to their iterative rules, where the optimization process is as follows:

Step.1 Given the training data F1 and F2, the balance coefficients α and β, the error threshold ξ, the maximum number of iterations Nmax;

Step.2 Initializing the parameters U0, V10, V20 and the number of iterations t = 0;

Step.3 t = t+1 , and update the parameters U , V1 and V2 accord to (23), (24) and (25);

Step.4

\(\text { if }\left|J\left(\boldsymbol{U}^{t+1}, \boldsymbol{V}_{1}^{t+1}, \boldsymbol{V}_{2}^{t+1}\right)-J\left(\boldsymbol{U}^{t}, \boldsymbol{V}_{1}^{t}, \boldsymbol{V}_{2}^{t}\right)\right|<\xi \text { or } t>N_{\max }:\)  

goto Step.5;

else:

goto Step.3;

Step.5 Output the optimal parameters U, V1, V2.

5. Feature Matching Based on Cosine Distance

When judging whether two images Itest1 and Itest2 represent the same vehicle, we should test 1 test 2 extract their stable features by decomposing their original feature vectors and as Ftest1 and Ftest2 shown in (26) [32],

\(\boldsymbol{S}_{\text {new }}=\boldsymbol{Z}\left(\left(\boldsymbol{U}^{*}\right)^{\mathrm{T}} \boldsymbol{U}^{*}\right)^{-1}\left(\boldsymbol{U}^{*}\right)^{\mathrm{T}} \boldsymbol{F}_{\text {new }}\)       (26)

then, we will measure the similarity of two vehicle face features by using (27),

\(d\left(\boldsymbol{S}_{\text {test1 }}, \boldsymbol{S}_{\text {test } 2}\right)=\frac{\left\langle\boldsymbol{S}_{\text {test } 1}, \boldsymbol{S}_{\text {test } 2}\right\rangle}{\left\|\boldsymbol{S}_{\text {test } 1}\right\|_{2}\left\|\boldsymbol{S}_{\text {test } 2}\right\|_{2}}\)       (27)

if d > η, the two images can be considered to represent the same vehicle, on the contrary, they represent different vehicles, where η is the similarity threshold.

6. Experimental Results and Analysis

6.1 Dataset and Experimental Environment

In the experiment there are five datasets to be used which include “BITVehicle” dataset [33], “VeRI776” dataset [34], “VeRI-Wild” dataset [35], “VehicleID” dataset [36] and the self-built dataset.

1) “BITVehicle” dataset. There are 9850 vehicle images in the “BITVehicle” dataset, but not all vehicles have been captured more than twice, therefore, we selected 1500 pairs of vehicle images as positive samples from the dataset, where each pair of images represent the same vehicle; at the same time, we selected 2000 pairs of vehicle images as negative samples, where each pair of images represent the different vehicle. Some pairs of positive and negative samples are shown in Fig. 4.

E1KOBZ_2021_v15n6_2098_f0004.png 이미지

Fig. 4. Some positive samples and negative samples in the “BITVehicle” dataset

2) “VeRI776” dataset. In the “VeRI776” dataset, a total of 776 vehicles are annotated, and each vehicle contains 60 samples approximately which were captured from different angles. Since the problem we studied is based on vehicle face images, we deleted all the images which captured from invalid perspectives when selecting samples. Finally, in the experiment we selected 200 pairs of positive samples and 200 pairs of negative samples, where some pairs of positive and negative samples are shown in Fig. 5.

E1KOBZ_2021_v15n6_2098_f0005.png 이미지

Fig. 5. Some positive samples and negative samples in the “VeRI776” dataset

3) “VeRI-Wild” dataset. This dataset contains 40, 672 vehicles and a total of 416, 314 images, which are captured from different perspectives. In order to meet the test requirements of vehicle face recognition algorithm, 7, 056 pairs of vehicle face images which represent the same vehicles and meet the acquisition angle are selected as the positive samples for testing, and 9, 000 pairs of vehicle face images which represent different vehicles are selected as the negative samples, where some samples are shown in Fig. 6.

E1KOBZ_2021_v15n6_2098_f0006.png 이미지

Fig. 6. Some positive samples and negative samples in the “VeRI-Wild” dataset

4) “VehicleID” dataset. There are 221763 images of 26267 vehicles in the dataset, and the vehicle in each image is either captured from the front or the back. Since vehicle face images are useful for us, 4200 pairs of vehicle face images which represent the same vehicles are selected as the positive samples, in addition, 4000 pairs of images are selected as the negative samples, some positive samples and negative samples are shown in Fig. 7.

E1KOBZ_2021_v15n6_2098_f0007.png 이미지

Fig. 7. Some positive samples and negative samples in the “VehicleID” dataset

5) Self-built dataset. The self-built dataset is formed by the vehicle face images which were captured by 22 surveillance cameras, and there are 4898 pairs of positive samples and 6204 pairs of negative samples, where some samples are shown in Fig. 8.

E1KOBZ_2021_v15n6_2098_f0008.png 이미지

Fig. 8. Some positive samples and negative samples in the self-built dataset

In addition, in the experiment for the algorithms based on NMF, we use the PC machine whose configuration is Intel i5-10300H CPU, 16G RAM and Matlab 2017b; and for the algorithms based on deep learning, we use the server whose configuration is 16G RAM, two Geforce 1080Ti GPUs, and Tensorflow 1.0.

6.2 Parameter Setting

In the proposed algorithm, some parameters need to be set appropriately, which include the dimension m of the original feature F, the number of training samples n, the dimension of coefficient vector r, the dimension of stable feature k, the balance factors α and β, and the similarity threshold η, where some of parameters are set according to experience, and the others will be obtained through experiment.

1) Among the above parameters, m, n and r can be set according to experience. The vehicle face image is resized to 160 × 120 × 3 after normalization, and the original feature vector F will be obtained through stacking all columns, i.e., m =  57600. In addition, n and r are set as 3000 and 500 respectively.

2) From the clustering theory in pattern recognition, if the parameters k, α, β and η are appropriate enough, the positive samples should be very similar, on the contrary, there are large differences between the negative samples. Therefore, we propose a measurement function to measure the clustering property as shown in (28),

\(H=\frac{1}{N_{p o s}} \sum_{i=1}^{N_{p a x}} d\left(\boldsymbol{S}_{p o s-i, 1}, \boldsymbol{S}_{p o s-i, 2}\right)-\frac{1}{N_{n e g}} \sum_{i=1}^{N_{n g g}} d\left(\boldsymbol{S}_{n e g-i, 1}, \boldsymbol{S}_{n e g-i, 2}\right)\)       (28)

where 1000 pairs of positive samples and 1000 pairs of negative samples are selected from the self-built dataset randomly in the experiment, i.e., N and , and the Npos = 1000 and N neg = 1000, clustering properties under different parameters are shown in Table 1.

Table 1. The clustering properties under different parameters

E1KOBZ_2021_v15n6_2098_t0001.png 이미지

From Table 1, we can see that when k = 0.8 r , α = 1, β = 0.5 , the best clustering property can be obtained.

In addition, the similarity threshold η needs to be optimized to achieve the best recognition performance, and the purpose is to make False Reject Rate (FRR) and False Accept Rate (FAR) as low as possible, where the recognition performance is measured by (29),

\(\begin{aligned} \eta^{*} &=\underset{\eta}{\arg \min } P(\eta) \\ &=\underset{\eta}{\arg \min }\left[\frac{N_{\text {neg }}}{N_{\text {pos }}+N_{\text {neg }}} F R R(\eta)+\frac{N_{\text {pos }}}{N_{\text {pos }}+N_{\text {neg }}} F A R(\eta)\right] \end{aligned}\)       (29)

The curve of P(η) under different η is shown in Fig. 9, and it can be seen that the best recognition performance of the proposed algorithm is achieved when η = 0.92 .

E1KOBZ_2021_v15n6_2098_f0009.png 이미지

Fig. 9. The recognition performances under different similarity thresholds

6.3 Comparison of different algorithms

The algorithms which are selected in the comparison experiment are mainly divided into two categories: some algorithms are similar to the proposed algorithm [1,30], they are all based on NMF model, and the others are based on the deep neural network model, i.e., AlexNet, Resnet50 and VGG19. Before the experiment, the training dataset needs to be constructed firstly as shown in Table 2.

Table 2. The numbers of the samples which are selected from different datasets when constructing the training dataset

E1KOBZ_2021_v15n6_2098_t0002.png 이미지

Except for the images in the training dataset, the other pairs of samples will be used as the testing samples. Since the recognition principles of the selected two types of comparison algorithms are different, their training methods are also different. For the algorithms based on NMF model, the same training methods as the proposed algorithm will be used; for the algorithms based on deep learning, since the number of the vehicle face images which represent each vehicle is very small, the traditional training and classification methods may be not suitable. Therefore, their training methods have been modified in the experiment, i.e., each pair of three-channel samples are superimposed to form a six-channel sample, thus all pairs of positive samples, negative samples and testing samples are superimposed to form the new positive samples, the new negative samples and the new testing samples respectively, and the filters of the first layer of these deep neural networks are also adjusted accordingly.

In addition, the parameter Tj in the proposed algorithm is obtained based on the difference of the acquisition times of the two images, however, except for the self-built dataset, there is not image acquisition time information in the other datasets. Therefore, during the training Tj from different datasets will be set differently, if the training process, the parameters samples are from the self-built dataset, the parameter Tj will be obtained according to (10); if the training samples are from the other datasets, the parameters Tj will be set to 1 or 0, i.e., when the two images are both captured in the day or in the night, the parameter Tj will be set to 1, and when the two images are captured respectively in the day and in the night, the parameter Tj will be set to 0.

After the models are trained, all the samples from different datasets will be tested, and the testing results are shown in Table 3, where FRR and FAR represent false reject rate and false accept rate respectively.

Table 3. The recognition results of different algorithms

E1KOBZ_2021_v15n6_2098_t0003.png 이미지

From Table 3, it can be seen that the proposed algorithm is slightly better than the other algorithms in terms of FRR and FAR. Since there are many pairs of samples in the self-built dataset which are captured under large illumination differences, the performances of the above comparison algorithms are significantly reduced, but the proposed algorithm can still achieve good recognition results. The reason is that during training the proposed algorithm ignores the variable features which are affected by the light intensity, and pays more attention to the stable features of vehicle face images, which makes the proposed algorithm have good robustness to light variation.

The conclusion that the proposed algorithm has good robustness to light variation has been verified from the results in Table 4.

Table 4. The recognition results of some test samples

E1KOBZ_2021_v15n6_2098_t0004.png 이미지

7. Conclusion

To improve the robustness of vehicle face features to light variation when capturing image, a NMF model with time difference constraint is proposed. The innovation of the thesis is that the stable features which are less affected by light intensity variation can be obtained after model training, i.e., even if the same vehicle is captured twice under different light intensities, we can still conclude accurately that the two images represent the same vehicle. Although we have achieved good recognition results, there are still some problems to be solved, for example, when there is a large difference in the capture angle in two vehicle face images which represent the same vehicle, the recognition accuracy is still not very high. Therefore, the universality of the proposed algorithm needs to be further improved.

References

  1. X. JIA and F. M. SUN, "Vehicle face re-identification algorithm based on Siamese nonnegative matrix factorization," Chinese Journal of Scientific Instrument, vol. 41, no. 6, pp. 132-139, 2020.
  2. K. J. Kim, S. M. Park and Y. J. Choi, "Deciding the number of color histogram bins for vehicle color recognition," in Proc. of Asia-Pacific Services Computing Conference, pp. 134-138, December 9-12, 2008.
  3. N. Baek, S. M. Park, K. J. Kim and S. B. Park, "Vehicle color classification based on the support vector machine method," in Proc. of International Conference on Intelligent Computing, pp. 1133-1139, August 21-24, 2007.
  4. P. Negri, X. Clady, M. Milgram and R. Poulenard, "An oriented-contour point based voting algorithm for vehicle type classification," in Proc. of International Conference on Pattern Recognition, pp. 574-577, August 20-24, 2006.
  5. P. Chen, X. Bai and W. Liu, "Vehicle color recognition on urban road by feature context," IEEE Transactions on Intelligent Transportation Systems, vol. 15, no. 5, pp. 2340-2346, 2014. https://doi.org/10.1109/TITS.2014.2308897
  6. B. Zhang, "Reliable classification of vehicle types based on cascade classifier ensembles," IEEE Transactions on Intelligent Transportation Systems, vol. 14, no. 1, pp. 322-332, 2013. https://doi.org/10.1109/TITS.2012.2213814
  7. W. W. L. Lam, C. C. C. Pang and N. H. C. Yung, "Vehicle-Component Identification Based on Multiscale Textural Couriers," IEEE Transactions on Intelligent Transportation Systems, vol. 8, no. 4, pp. 681-694, 2007. https://doi.org/10.1109/TITS.2007.908144
  8. A. P. Psyllos, C. N. E. Anagnostopoulos and E. Kayafas, "Vehicle Logo Recognition Using a SIFT-Based Enhanced Matching Scheme," IEEE Transactions on Intelligent Transportation Systems, vol. 11, no. 2, pp. 322-328, 2010. https://doi.org/10.1109/TITS.2010.2042714
  9. M. J. Leotta and J. L. Mundy, "Vehicle surveillance with a generic, adaptive, 3D vehicle model," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 7, pp. 1457-1469, 2011. https://doi.org/10.1109/TPAMI.2010.217
  10. J. Sochor, A. Herout, and J. Havel, "Boxcars: 3d boxes as cnn input for improved fine-grained vehicle recognition," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3006-3015, June 27-30, 2016.
  11. J. Prokaj and G. Medioni, "3-D model based vehicle recognition," in Proc. of 2009 Workshop on Applications of Computer Vision, pp. 1-7, December 7-8, 2009.
  12. X. C. Liu, W. Liu, J. K. Zheng, C. G. Yan and T. Mei, "Beyond the Parts: Learning Multi-view Cross-part Correlation for Vehicle Re-identification," in Proc. of ACM International Conference on Multimedia, pp. 907-915, October 12-16, 2020.
  13. Y. O. Adu-Gyamfi, S. K. Asare, A. Sharma and T. Tienaah, "Automated vehicle recognition with deep convolutional neural networks," Transportation Research Record, vol. 2645, no. 1, pp. 113-122, 2017. https://doi.org/10.3141/2645-13
  14. K. Huang and B. Zhang, "Fine-grained vehicle recognition by deep Convolutional Neural Network," in Proc. of 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, pp. 465-470, October, 15-17, 2016.
  15. Q. Zhang, L. Zhuo, J. F. Li, J. Zhang, H. Zhang and X. G. Li, "Vehicle color recognition using Multiple-Layer Feature Representations of lightweight convolutional neural network," Signal Processing, vol. 147, pp. 146-153, 2018. https://doi.org/10.1016/j.sigpro.2018.01.021
  16. Y. Y. Wu and C. M. Tsai, "Pedestrian, bike, motorcycle, and vehicle classification via deep learning: deep belief network and small training set," in Proc. of International Conference on Applied System Innovation, pp. 1-4, May 26-31, 2016.
  17. W. Hai, Y. Cai and L Chen, "A Vehicle Detection Algorithm Based on Deep Belief Network," The Scientific World Journal, pp. 1-7, 2014.
  18. J. T. Wang, H. Zheng, Y. Huang and X. H. Ding, "Vehicle type recognition in surveillance images from labeled web-nature data using deep transfer learning," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 9, pp. 2913-2922, 2018. https://doi.org/10.1109/tits.2017.2765676
  19. H. Wang, Y. J. Yu, Y. F. Chen and X. B. Chen, "A vehicle recognition algorithm based on deep transfer learning with a multiple feature subspace distribution," Sensors, vol. 18, no. 12, pp. 4109, 2018. https://doi.org/10.3390/s18124109
  20. Y. Chen, C. Yang and S. Y. Yang, "A method for special vehicle recognition based on deep-transfer model," in Proc. of International Conference on Instrumentation & Measurement, Computer, Communication and Control, pp. 167-170, July 21-23, 2016.
  21. A. Q. Hu, H. Li, F. Zhang and W. Zhang, "Deep Boltzmann machines based vehicle recognition," in Proc. of Chinese Control and Decision Conference, pp. 3033-3038, May 31-June 2, 2014.
  22. D. F. S. Santos, G. B. De Souza and A. N. Marana, "A 2D Deep Boltzmann Machine for robust and fast vehicle classification," in Proc. of SIBGRAPI Conference on Graphics, Patterns and Images, pp. 155-162, October 17-20, 2017.
  23. C. Gou, K. F. Wang, Y. J. Yao and Z. X. Li, "Vehicle license plate recognition based on extremal regions and restricted Boltzmann machines," IEEE Transactions on Intelligent Transportation Systems, vol. 17, no. 4, pp. 1096-1107, 2015. https://doi.org/10.1109/TITS.2015.2496545
  24. X. B. Liu, S. L. Zhang, X. Y. Wang, R. C. Hong and Q. Tian, "Group-Group Loss-Based Global-Regional Feature Learning for Vehicle Re-Identification," IEEE Transactions on Image Processing, vol. 29, pp. 2638-2652, 2019. https://doi.org/10.1109/tip.2019.2950796
  25. X. Jin, C. L. Lan, W. J. Zeng and Z. B. Chen, "Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based Object Re-Identification," in Proc. of the AAAI Conference on Artificial Intelligence, pp. 11165-11172, February 7-12, 2020.
  26. D. C. Meng, L. Li, X. J. Liu, Y. D. Li, S. J. Yang, Z. J. Zha, X. Y. Gao, S. H. Wang and Q. M. Huang, "Parsing-based View-aware Embedding Network for Vehicle Re-Identification," in Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7103-7112, June 13-19, 2020.
  27. T. S. Chen, C. T. Liu, C. W. Wu and S. Y. Chien, "Orientation-aware Vehicle Re-identification with Semantics-guided Part Attention Network," in Proc. of European Conference on Computer Vision, pp. 330-346, August 23-28, 2020.
  28. Z. D. Zheng, T. Ruan, Y. C. Wei, Y. Yang and T. Mei, "VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification," IEEE Transactions on Multimedia, Early Access, 2020.
  29. P. Khorramshahi, N. Peri, J. C. Chen and R. Chellappa, "The Devil is in the Details: Self-Supervised Attention for Vehicle Re-Identification," in Proc. of European Conference on Computer Vision, pp. 369-386, August 23-28, 2020.
  30. C. H. Shi and C. D. Wu, "Vehicle Face Recognition Algorithm Based on Weighted Nonnegative Matrix Factorization with Double Regularization Terms," KSII Transactions on Internet and Information Systems, vol. 14, no. 5, pp. 2171-2185, 2020. https://doi.org/10.3837/tiis.2020.05.017
  31. X. Jia, F. M. Sun, H. J. Li, Y. D. Cao and X. Zhang, "Image Multi-Label Annotation Based on Supervised Nonnegative Matrix Factorization with New Matching Measurement," Neurocomputing, vol. 219, pp. 518-525, 2017. https://doi.org/10.1016/j.neucom.2016.09.052
  32. X. Jia, F. M. Sun, H. J. Li and Y. D. Cao, "Hand Vein Recognition Algorithm Based on NMF with Sparsity and Clustering Property Constraints in Feature Mapping Space," Chinese Journal of Electronics, vol. 28, no. 6, pp. 1184-1190, 2019. https://doi.org/10.1049/cje.2019.06.003
  33. Z. Dong, Y. W. Wu, M. T. Pei and Y. D. Jia, "Vehicle Type Classification Using a Semisupervised Convolutional Neural Network," IEEE Transactions on Intelligent Transportation Systems, vol. 16, no. 4, pp. 2247-2256, 2015. https://doi.org/10.1109/TITS.2015.2402438
  34. X. C. Liu, W. Liu, H. D. Ma and H. Y. Fu, "Large-scale vehicle re-identification in urban surveillance videos," in Proc. of IEEE International Conference on Multimedia and Expo, pp. 1-6, July 11-15, 2016.
  35. Y. H. Lou, Y. Bai, J. Liu, S. Q. Wang and L. Y. Duan, "VERI-Wild: A Large Dataset and a New Method for Vehicle Re-Identification in the Wild," in Proc. of IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3230-3238, June 15-20, 2019.
  36. H. Y. Liu, Y. H. Tian, Y. W. Wang, L. Pang and T. J. Huang, "Deep Relative Distance Learning: Tell the Difference between Similar Vehicles," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2167-2175, June 27-30, 2016.