DOI QR코드

DOI QR Code

Real time character and speech commands recognition system

  • Dong-jin Kwon (Department of Computer Electronics Engineering, Seoil University) ;
  • Sang-hoon Lee (Korea Institute of Science and Technology)
  • Received : 2024.09.07
  • Accepted : 2024.09.18
  • Published : 2024.11.30

Abstract

With the advancement of modern AI technology, the field of computer vision has made significant progress. This study introduces a parking management system that leverages Optical Character Recognition (OCR) and speech recognition technologies. When a vehicle enters the parking lot, the system recognizes the vehicle's license plate using OCR, while the administrator can issue simple voice commands to control the gate. OCR is a technology that digitizes characters by recognizing handwritten or image-based text through image scanning, enabling computers to process the text. The voice commands issued by the user are recognized using a machine learning model that analyzes spectrograms of voice signals. This allows the system to manage vehicle entry and exit records via voice commands, and automatically calculate paid services such as parking fees based on license plate recognition. The system identifies the text areas from images using a bounding box, converting them into digital characters to distinguish license plates. Additionally, the microphone collects the user's voice data, converting it into a spectrogram, which is used as input for a machine learning model to process 2D voice signal data. Based on the model's inference, the system controls the gate, either opening or closing it, while recording the time in real-time. This study introduces a parking management system that integrates OCR and a speech command recognition model. By training the model with multiple users' data, we aim to enhance its accuracy and offer a practical solution for parking management.

Keywords

Acknowledgement

The present research has been conducted by the Research Grant of Seoil University

References

  1. Thi Tuyet Hai Naguyen, Adam jatowt, Mickael Coustaty and Antoine Doucet, " Survey of Post-OCR Processing Approaches", ACM Computing Surveys (CSUR), vol. 54, pp. 1 - 37, 2021. DOI: https://doi.org/10.1145/3453476
  2. Juyoung Kim, Dai Yeol Yun, Oh Seko Kwon, Seok Jae Moon and CHio gon Hwang " Comparative Analysis of Speech Recognition Open API Error Rate" International Journal of Advanced Smart Convergence (IJASC), vol. 10, pp. 79-85, 2021 DOI: https://doi.org/10.7236/IJASC.2021.10.2.79
  3. Felipe R. Monteiro, Mario A. P. Garcia, Lucas C. Cordeiro and Eddie B. de Lima Filho, " Bounded model checking of C++ programs based on the Qt cross-platform framework", ASE 18 Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 954, 2015. DOI: https://doi.org/10.1109/GCCE.2015.7398699
  4. Hancheng Wu, John Ravi and Michela Becchi " Compiling SIMT Programs on Multi- and Many-Core Processors with Wide Vector Units: A Case Study with CUDA" International Conference on High Performance Computing(HiPC), pp. 123-132, 2018 DOI: https://doi.org/10.1109/HiPC.2018.00022
  5. Chirag Patel, Atul Patel and Dharmendra Patel" Optical Character Recognition By Open Source OCR Tool Tesseract: A Case Study" International Journal of Computer Applications(IJCA), vol. 55, 2012 DOI: https://doi.org/10.5120/8794-2784
  6. Lonce Wyse" Audio Spectrogram Representations for Processing with Convolutional Neural Network" Joint with the International Joint Conference on Neural Networks (IJCNN), pp. 1 - 65, 2017. DOI: https://doi.org/10.48550/arXiv.1706.09559
  7. Y. Zhang, B. Li, H. Fang and Q. Meng, " Spectrogram Transformers for Audio Classification", International Sustainability Transitions(IST), 2022 DOI:https://doi.org/10.1109/IST55454.2022.9827729
  8. J. Liang," Image classification based on RESNET", International Conference on Computer Information Science and Application Technology (CISAT), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1634/1/012110
  9. Z. Zhong, M. Zheng, H. Mai, J. Zhao and X. Liu," Cancer image classification based on DenseNet model", International Conference on Artificial Intelligence Technologies and Application (ICAITA), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1651/1/012143
  10. Xue Ying," An Overview of Overfitting and its Solutions", Journal of Physics: Conference Series, vol. 1168, 2019. DOI: https://doi.org/10.1088/1742-6596/1168/2/022022