1. Introduction
Class attendance is very important as higher education institutions require the lecturer to record and monitor student attendance. Unfortunately, recording the attendance instantly and effectively becomes a very critical challenge. There is a belief that student attendance can predict student achievement. The traditional way of recording attendance is irrelevant, and technology like image processing and biological attendance records are more convenient.
Identity verification and recognition using biological characteristics have become more ubiquitous in the digital era because of convenience and concealment [1]. Traditional verification methods, which include access cards, work cards, and ID cards, faced difficulty adapting to the rapid pursuit of the current society. These methods have resulted in unwanted congestion, wastage of human resources, and time delay. Biometrics techniques have developed personal identification technology for access privileges and information systems [2].
Over the past few decades, face recognition has been the most imperative application among biometric-based systems in the field of computer vision [3, 4].
There are metrics such as knowledge-based, physiological biometrics-based, behavioural biometrics-based, and two or multi-factor information authentication in user authentication [5]. Knowledge-based authentication is the approach where identity-related information is secret to the users, which can be in digit PINs or lock patterns. Physiological biometric based authentication utilises human body components such as fingerprints, hand geometry, face, and iris pattern. Behavioural biometric-based authentication tracks the users’ behaviour, such as voice, gait, handwriting, etc. Two or multi-factor information authentication merge two or more metrics, such as a fingerprint with face and RFID with the face [5].
Face recognition is a physiological biometric technique. The process of facial recognition includes image acquisition, face detection, feature extraction, face recognition, and finally, verification or identification. The techniques of feature extraction in face recognition such as Principal Component Analysis (PCA) [6], Linear Discriminant Analysis (LDA), Local Binary Patterns (LBP) [7], Elastic Bunch Graph Matching (EBGM) [8], Gabor Wavelet [9], and Convolutional Neural Networks (CNN) [10].
CNN is a Deep Learning (DL) algorithm which is a class of Artificial Neural Network (ANN). The CNN can take an input image, learnable weights and biases to various aspects in the image, and it also has the ability to differentiate one from the other. The ability to extract detailed information from input images make DL be the best image processing method.
DeepFace [10] and FaceNet [11] utilise the CNN technique tremendously to make a huge breakthrough in facial recognition under unconstrained environments such as pose variation, face occlusion, poor illumination. The accuracy rate of DeepFace is 97.25%, whereas the accuracy rate of FaceNet is 99.63% using the LFW benchmark dataset.
Current face recognition systems appear to perform well in relatively controlled conditions but fail in some factors such as variations in posture, occlusion, lighting, low- resolution, ageing, and make-up [12]. Therefore, an improvement is needed to tackle the problem, especially in variations in posture scenarios. The difficulties of the face recognition problem can be derived from two aspects: large concentration on the frontal face image size and the database images [12].
The aim of this paper is to propose a web-based attendance system for university classrooms with the integration of deep learning face recognition pre-trained models. The face recognition performance is measured based on the pose variation in the production site, which involves the participation of the real users. The proposed attendance system is dynamic with the integration of the database to store all the facial features data from each student.
The rest of the paper is organised as follows. Section 2 overviews related works on the comparative study of face recognition techniques and the attendance applications. Section 3 provides a comprehensive description of the proposed attendance system. Section 4 presents the results and discussions of the proposed system. Finally, the paper is concluded in Section 5.
2. Related Work
The development of face recognition systems focuses on four stages of approaches from the oldest to the latest: a statistical approach, holistic approach, machine learning, and deep learning [12]. Statistically based approaches used Eigenvectors, Eigenfaces, Hidden Markov Model (HMM) and so on. Holistic approaches focused on the whole face area, such as the mixture of PCA and LDA, spectroface and active appearance model (AAM). The evolution of 3D face recognition focused on machine learning approaches involving scale-invariant feature transform (SIFT), Face-Specific Subspace (FSS), Generalized Multiview Analysis (GMA) and so on. Recently deep learning approaches focused more on recognition using the deep learning approach such as Deep Neural Network (DNN), Discriminant Deep Metric Learning (DDML), Deep Correlation Feature Learning, Deep Canonical Correlation Analysis, Deeply Coupled Auto-encoder Networks (DCAN) and so on. Table 1 summarises the most common face recognition techniques such as PCA, LDA, EBGM, and CNN.
Table 1. Face Recognition Techniques
Table 2 shows the recent development of the attendance management system (from the year 2016 to the year 2019). It shows a bias toward IoT deployment and low use of the biometric system. Such attendance systems are Bluetooth low energy attendance android application [13], speech-based attendance system [14], bimodal biometric attendance system [15], location-aware event attendance system using QRCode and GPS technology [16] as well as IoT Based Smart Attendance System using RFID [17]. The common use of biometric-based systems may be due to the low accuracy rate and unreliable prediction under an uncontrolled environment. As for the IoT based attendance systems are convenient, fast, and reliable in any condition, thus giving a high percentage of use in the system.
Table 2. Attendance Management System Based On Different Types
3. Face Recognition Attendance System
This work used pre-trained models for face detection, facial landmark detection, and face features extraction. The detection models for face and facial landmark employs Single Shot Detector (SSD) network, where 68 landmark points were detected to be used in the face alignment process. Residual Network (ResNet) was used to get 128 values of face features within the region of interest of the detected face. These features were normalised and compared with user database features using Euclidian distance measure for label matching. The following sections explain the face recognition attendance system in detail.
3.1 Choosing Required Deep Learning Face Recognition Models
face-api.js [18] is an open-sourced Javascript face recognition API using deep convolutional neural networks built using TensorflowJS as in Fig. 1. It provides some functions such as face detection, facial expression analysis, age and gender prediction, facial landmarks detection, and face recognition. There is a total of 18 files in the “weights” directory, as shown in Fig. 2. In the “weights” directory of the API, a combination of face detection, facial landmark detection, and feature extraction model is required to enable face recognition. For the face detection task, the required files are “ssd_mobilenetv1_model- shard1, ” “ssd_mobilenetv1_model-shard2” and “ssd_mobilenetv1_model- weights_manifest.json, ” For the facial landmark detection task, the required files are “face_landmark_68_tiny_model-shard1” and “face_landmark_68_tiny_model- weights_manifest.json, ” For feature extraction task, the required files are “face_recognition_model-shard1, ” “face_recognition_model-shard2” and “face_recognition_model-weights_manifest.json, ” Table 3 shows a brief description of the three required deep learning pre-trained models. Those model was based on existing models and integrated into the proposed system. The model is built on top of neural networks architecture as documented by the model creator. Euclidean Distance was measured between embedding without training process because it requires retraining when a new person is added to the datasets. For the final results, a series of detection, alignment and feature extraction techniques are applied to produce a list of vector values. Appropriate threshold values are selected to tweak the accuracy. Further sections specify the details of the process.
Fig. 1. face-api.js [18] on Github.
Fig. 2. Deep Learning Pre-trained Models Weights & Metadata in “weights” directory of API [18]
Table 3. Description of Chosen Deep Learning Pre-trained Models
3.2 Procedure of Face Recognition
First of all, all the weight files from the three pre-trained models need to be loaded into the system memory. Next, the image of BLOB type is fed into the face detection model with a 0.8 confidence score threshold to detect positive or negative class, followed by facial landmark detection and face alignment. The aligned result of the image is cropped and undergo feature extraction to generate a list of 128 values of feature vectors. A feature vector is a numerical representation of an object, which could be derived from the pixel intensity, edges and area of the face. Next, the feature vectors in the database are fetched to match with the detected feature vectors. A matching threshold is determined by the computed shortest Euclidean Distance value. The purpose of this threshold is to detect between the positive and negative classes. The nearest value of L2 to the ‘zero’ (0), the closest the recognised face label has to match, and vice versa. Other researchers, as in Fig.3, use L2 values of 0.78, 0.49, 0.34. To reduce the False Positive case (large L2 will cause mismatch), the value of threshold 0.45 is chosen to enable the unique scenario. Finally, the face ID from the datasets with the shortest difference of feature vector similarity using Euclidean Distance is the recognised face and mark with the label in the image canvas accordingly. The visualisation of the face recognition procedure is demonstrated in Fig. 3. The face recognition task is divided into two important processes: the registration of face datasets and face matching.
Fig. 3. Visualisation of Face Recognition Procedure.
3.3 Registration of Face Datasets
Each user with a student role is required to upload their face photograph for face registration. In the registration of face datasets, face detection is performed on the input image to detect the location of the face. If the confidence score of a detected face is beyond the minimum confidence score threshold, which is 0.8, a bounding box will be drawn onto the image canvas to indicate the face location. Next, facial landmark detection is performed to detect 68 key points for face alignment. The feature extraction is carried out to generate the feature vectors. If there are more than one face feature vectors generated which indicate more than one face is detected, the system will reject the registration to avoid matching error from the datasets during the evaluation. The extracted feature vectors are the 128 normalised values shown in Fig. 4, combined into String datatype to be stored in the database. Fig. 5 shows the pseudocode and workflow of face dataset registration. Fig. 6 shows the feature vectors stored in the MongoDB database.
Fig. 4. 128 values of normalised feature vectors.
Fig. 5. Pseudocode of Registration of Face Datasets.
Fig. 6. Feature vector strings in MongoDB database.
3.4 Face Matching
The matching process is performed when taking the attendance. During the matching process, all the feature vectors of face datasets in the course are fetched from the server, and the results are stored into a JSON array as in Fig. 7. The retrieved feature vectors string is parsed into Float32Array datatype. Next, the image capturing device is opened to allow for image acquisition. For every 0.2 seconds, the system will perform face detection to check for the current frame and check for the face's location as the same as the registration process of face datasets. After the feature extraction, the detected feature vectors are matched between the other feature vectors using Euclidean Distance. As demonstrated in Fig. 8, the matching threshold, the shortest distance, indicates the recognised face. The respective face will be marked with the student’s name and matric number. The attendance transaction (Fig. 9) is stored in the database. Fig. 10 illustrates the pseudocode of the face matching process during attendance taking.
Fig. 7. Sample JSON data of all participants fetched from the server
Fig. 8. Shortest Euclidean Distance (L2) between detected feature vectors and one of the feature vectors in the face datasets
Fig. 9. Code snippet of saving the attendance transaction of recognised participant’s ID
Fig. 10. Pseudocode of Face Matching to record attendance
3.5 System Architecture
The section discussed the system architecture of the project, as shown in Fig. 11. The project is a client-server architecture that involves the network connection to provide the application services. There are two main modules: the face datasets registration module and face matching module, which require the transfer of the weight files from the deep learning p retrained models to perform face recognition tasks. There are three submodules in the application, which are user management, course management and attendance management. Each module is associated with API on the server, then the server interacts with the database for the particular CRUD operation.
Fig. 11. System Architecture.
3.6 Database Design
ERD is illustrated in Fig. 12 to outline the collections and the relationship. There is a total of 6 collections to be created in MongoDB Cloud. A short description of each collection is presented in Table 4. A pre-computed feature vector for each face was used in the matching process. Therefore, only a URL reference was used to point to the actual face location in a different database. This implementation will reduce the complexity of the database and make inference faster.
Fig. 12. ERD.
Table 4. Description of Chosen Deep Learning Pre-trained Models.
3.7 System Functionality Description
Below listed some important functions in the proposed attendance system:
a) Add Course by Lecturer: The lecturer creates a course by entering code, name, and session. A unique ten-digit course ID will be automatically assigned. Students need the course ID to enrol in the course.
b) Enrol Course by Student: Students need to enrol in the course by entering the course ID.
c) Upload Face Photograph by Student: Each user with the student role must upload at least two face photographs for the matching process. Features vectors will be generated for each uploaded photo. The system will first perform face detection to check there is only a single face; otherwise, the “Save” button will be disabled.
d) Create Attendance by Lecturer: The lecturer creates the attendance by clicking the “Create Attendance” button, then redirecting to the attendance form page. After specifying the date and time, the attendance is created after submission.
e) Attendance Room: After the lecturer submits the attendance form, the enrolled students have to go into the attendance list and enter the attendance room. An attendance setting is provided to the lecturer who is “mode” and “open, ” The webcam is opened on the student’s web browser for Remote Attendance, whereas the webcam is opened on the lecturer’s web browser for F2F Attendance (Fig. 13). When the open is on, attendance transaction is enabled and vice versa.
Fig. 13. The interface of Remote Attendance from the student (A) and lecturer view (B).
f) Attendance Record: The attendance transaction details such as student name, matric number, status, check-in date, and check-in time are displayed. A visualisation chart is drawn to describe the overall class attendance (Fig. 14).
Fig. 14. The interface of Attendance Record.
4. Result and Discussions
4.1 Dataset Collection
Twenty respondents were invited for the performance testing on the proposed system. Before the performance testing, each respondent must register an account as the student role and upload their face images with the frontal view orientation at a minimum of two samples. A smaller number of datasets was used because the chosen deep learning pre-trained models have been trained with thousands of data; thus, trained parameters from the models are transferred and used with only a small number to achieve the same result. Appendix 1 shows an example of uploaded face datasets from five of the respondents. The face datasets need to be in frontal view with unlimited face expressions. The dataset can be in various formats and types as long as it meets the frontal view requirements.
4.2 Performance Testing
Three tests were conducted based on pose variation such as frontal, right, left. The tests measured the outputs for five respondents in terms of the three-pose variation (e.g., frontal, right, left). The outputs for five of the respondents are demonstrated in Fig. 15. The recognised face is labelled with the name, matrix number, and calculated distance (L2). The resulting L2 value is the shortest among all the spaces between detected feature vectors and the feature vectors in the datasets.
Fig. 15. Testing results (e.g., Frontal, Right, Left)
4.3 Evaluation Result
Table 5, Table 6 and Table 7 illustrate the prediction values using the matching threshold of 0.45 and recorded True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) for frontal, right and left facing faces of 20 respondents. Five new subjects were chosen to test against the performance, whereby their faces were not part of the database. Hence, the testing datasets consist of a total of 25 faces. The testing is done with three times the attendance process of each subject.
The testing has been done on the position of the frontal face, right face (±30°) and left face (±30°) at a distance between 50 to 100cm.
The results summarised that the proposed attendance system with deep learning models successfully achieved the highest true prediction values. True Positive from 18 respondents with frontal faces, right and left facing faces were correctly predicted, meanwhile only two respondents with False Negative. Table 8 demonstrates the overall performance based on Confusion Matrix. The proposed system achieved an overall of 92% accuracy (Eq. 1), 100% precision (Eq. 3) and 90% recall (Eq. 2).
\(\text { Accuracy }=\frac{T P+T N}{T P+F P+T N+F N} \times 100 \%\) (1)
\(\text { Recall }=\frac{T P}{T P+F N} \times 100 \%\) (2)
\(\text { Precision }=\frac{T P}{T P+F P} \times 100 \%\) (3)
Table 5. Testing Result (Frontal Face)
Table 6. Testing Result (Facing Right
Table 7. Testing Result (Facing Left)
Table 8. Overall Result
As shown in Table 8, the precision r5a4te+ is6 100% because of the strict matching value used. Thus, it can reduce the possibility of False Positive cases with little False Negative cases. However, the testing result can be varied if tested using different thresholds and distances from the camera. Since the testing is carried out individually at a distance between 50 cm to 100 cm, thus the resolution of the face region is quite high, and the captured feature vectors contain more information and are accurate. On the other hand, the testing is done remotely using a different camera and computer hardware. Hence these factors can cause slight issues on the performance testing. For future works, performance testing needs to be done using different matching thresholds to determine the desired threshold. More variation of the face, such as other resolution and head tilt, can be collected from the respondents to get a more diverse reference to boost the accuracy in a different environment. The web application can be visited at: https://attendlytical.netlify.app for a live demo.
5. Conclusion
This study presented a web-based attendance system using deep learning face recognition. The proposed attendance system utilised the state-of-the-art deep learning face recognition pre-trained models, combined them into the web environment and linked them to an online database for feature storage. The design of the proposed system implemented the procedure of face recognition which are registration of face datasets and face matching, integrated into a client-server architecture web-based system. The performance testing on the face recognition was conducted and measured in the production site to simulate the real scenario. The face datasets collected from all the respondents should have the appropriate illumination and high resolution for future improvement. Besides, several test cases should be carried out to determine the suitable threshold value to get the optimal testing results since the selected matching threshold is arbitrary.
References
- Zhu, Y., & Jiang, Y., "Optimisation of face recognition algorithm based on deep learning multi-feature fusion driven by big data," Image and Vision Computing, vol. 104, pp. 1-8, 2020.
- Wati, V., Kusrini, K., Al Fatta, H., & Kapoor, N., "Security of facial biometric authentication for attendance system," Multimedia Tools and Applications, vol. 80, pp. 23625-23646, 2021. https://doi.org/10.1007/s11042-020-10246-4
- Kak, S. F., Mustafa, F. M., & Valente, P., "A review of person recognition based on face model," Eurasian Journal of Science & Engineering, vol. 4, no. 1, pp. 157-168, 2018.
- Sinha, P., Balas, B., Ostrovsky, Y., & Russell, R., "Face recognition by humans: Nineteen results all computer vision researchers should know about," in Proc. of the IEEE, vol. 94, no. 11, pp. 1948-1962, 2006. https://doi.org/10.1109/JPROC.2006.884093
- Wang, C., Wang, Y., Chen, Y., Liu, H., & Liu, J., "User authentication on mobile devices: Approaches, threats and trends," Computer Networks, vol. 170, 2020.
- Yang, J., Zhang, D., Frangi, A. F., & Yang, J. Y., "Two-dimensional PCA: a new approach to appearance-based face representation and recognition," IEEE Transactions nn Pattern Analysis and Machine Intelligence, vol. 26, no. 1, pp. 131-137, 2004. https://doi.org/10.1109/TPAMI.2004.1261097
- Gardezi, S. J. S., Faye, I., Adjed, F., Kamel, N., & Hussain, M., "Mammogram classification using chi-square distribution on local binary pattern features," Journal of Medical Imaging and Health Informatics, vol. 7, no. 1, pp. 30-34, 2017. https://doi.org/10.1166/jmihi.2017.1982
- Wiskott, L., Kruger, N., Kuiger, N., & Von Der Malsburg, C., "Face recognition by elastic bunch graph matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 775-779, 1997. https://doi.org/10.1109/34.598235
- Ahmed, S., Frikha, M., Hussein, T. D. H., & Rahebi, J., "Optimum feature selection with particle swarm optimization to face recognition system using gabor wavelet transform and deep learning," BioMed Research International, 2021.
- Taigman, Y., Yang, M., Ranzato, M. A., & Wolf, L., "Deepface: closing the gap to human-level performance in face verification," in Proc. of the IEEE conference on computer vision and pattern recognition, pp. 1701-1708, 2014.
- Schroff, F., Kalenichenko, D., & Philbin, J., "Facenet: A unified embedding for face recognition and clustering," in Proc. of The IEEE Conference On Computer Vision and Pattern Recognition, pp. 815-823, 2015.
- Ahmed, S. B., Ali, S. F., Ahmad, J., Adnan, M., & Fraz, M. M., "On the frontiers of pose invariant face recognition: a review," Artificial Intelligence Review, vol. 53, pp. 2571-2634, 2020. https://doi.org/10.1007/s10462-019-09742-3
- Apoorv, R., & Mathur, P., "Smart attendance management using bluetooth low energy and android," in Proc. of 2016 IEEE Region 10 Conference (TENCON), pp. 1048-1052, 2016.
- Amri, U. F., Hashim, N. N. W. N., & Hanif, N. H. H. M., "Speech-based class attendance," in Proc. of IOP Conference Series: Materials Science and Engineering, vol. 260, no. 1, p. 012008, 2017.
- Charity, A., Okokpujie, K., & Etinosa, N. O., "A bimodal biometric student attendance system," in Proc. of 2017 IEEE 3rd International Conference on Electro-Technology for National Development (NIGERCON), pp. 464-471, 2017.
- Ayop, Z., Lin, C. Y., Anawar, S., Hamid, E., & Azhar, M. S., "Location-aware event attendance system using QR code and GPS technology," International Journal of Advanced Computer Science and Applications, vol. 9, no. 9, pp. 466-473, 2018.
- Shah, S. N., & Abuzneid, A., "IoT based smart attendance system (SAS) using RFID," in Proc. of 2019 IEEE Long Island Systems, Applications and Technology Conference (LISAT), pp. 1-6, 2019.
- justadudewhohacks/face-api.js [Online]. Available: https://github.com/justadudewhohacks/faceapi.js/, Accessed on June 22, 2021
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C., "SSD: Single shot multibox detector," in Proc. of European Conference on Computer Vision, pp. 21-37, 2016.
- He, K., Zhang, X., Ren, S., & Sun, J., "Deep residual learning for image recognition," in Proc. of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770-778, 2016.