• Title/Summary/Keyword: CRNN

Search Result 20, Processing Time 0.031 seconds

Time-domain Sound Event Detection Algorithm Using Deep Neural Network (심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘)

  • Kim, Bum-Jun;Moon, Hyeongi;Park, Sung-Wook;Jeong, Youngho;Park, Young-Cheol
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.472-484
    • /
    • 2019
  • This paper proposes a time-domain sound event detection algorithm using DNN (Deep Neural Network). In this system, time domain sound waveform data which is not converted into the frequency domain is used as input to the DNN. The overall structure uses CRNN structure, and GLU, ResNet, and Squeeze-and-excitation blocks are applied. And proposed structure uses structure that considers features extracted from several layers together. In addition, under the assumption that it is practically difficult to obtain training data with strong labels, this study conducted training using a small number of weakly labeled training data and a large number of unlabeled training data. To efficiently use a small number of training data, the training data applied data augmentation methods such as time stretching, pitch change, DRC (dynamic range compression), and block mixing. Unlabeled data was supplemented with insufficient training data by attaching a pseudo-label. In the case of using the neural network and the data augmentation method proposed in this paper, the sound event detection performance is improved by about 6 %(based on the f-score), compared with the case where the neural network of the CRNN structure is used by training in the conventional method.

Nonlinear Adaptive Prediction using Locally and Globally Recurrent Neural Networks (지역 및 광역 리커런트 신경망을 이용한 비선형 적응예측)

  • 최한고
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.1
    • /
    • pp.139-147
    • /
    • 2003
  • Dynamic neural networks have been applied to diverse fields requiring temporal signal processing such as signal prediction. This paper proposes the hybrid network, composed of locally(LRNN) and globally recurrent neural networks(GRNN), to improve dynamics of multilayered recurrent networks(RNN) and then describes nonlinear adaptive prediction using the proposed network as an adaptive filter. The hybrid network consists of IIR-MLP and Elman RNN as LRNN and GRNN, respectively. The proposed network is evaluated in nonlinear signal prediction and compared with Elman RNN and IIR-MLP networks for the relative comparison of prediction performance. Experimental results show that the hybrid network performs better with respect to convergence speed and accuracy, indicating that the proposed network can be a more effective prediction model than conventional multilayered recurrent networks in nonlinear prediction for nonstationary signals.

Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features (차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상)

  • Kwak, Jin-Yeol;Chung, Yong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.3
    • /
    • pp.401-406
    • /
    • 2021
  • Recently, mean-teacher models based on convolutional recurrent neural networks are popularly used in audio event detection. The mean-teacher model is an architecture that consists of two parallel CRNNs and it is possible to train them effectively on the weakly-labelled and unlabeled audio data by using the consistency learning metric at the output of the two neural networks. In this study, we tried to improve the performance of the mean-teacher model by using additional derivative features of the log-mel spectrum. In the audio event detection experiments using the training and test data from the Task 4 of the DCASE 2018/2019 Challenges, we could obtain maximally a 8.1% relative decrease in the ER(Error Rate) in the mean-teacher model using proposed derivative features.

SCLC-Edge Detection Algorithm for Skin Cancer Classification (피부암 병변 분류를 위한 SCLC-Edge 검출 알고리즘)

  • June-Young Park;Chang-Min Kim;Roy C. Park
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.23 no.4
    • /
    • pp.256-263
    • /
    • 2022
  • Skin cancer is one of the most common diseases in the world, and the incidence rate in Korea has increased by about 100% over the past five years. In the United States, more than 5 million people are diagnosed with skin cancer every year. Skin cancer mainly occurs when skin tissue is damaged for a long time due to exposure to ultraviolet rays. Melanoma, a malignant tumor of skin cancer, is similar in appearance to Atypical melanocytic nevus occurring on the skin, making it difficult for the general public to be aware of it unless secondary signs occur. In this paper, we propose a skin cancer lesion edge detection algorithm and a deep learning model, CRNN, which performs skin cancer lesion classification for early detection and classification of these skin cancers. As a result of the experiment, when using the contour detection algorithm proposed in this paper, the classification accuracy was the highest at 97%. For the Canny algorithm, 78% was shown, 55% for Sobel, and 46% for Laplacian.

CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm (CTC를 적용한 CRNN 기반 한국어 음소인식 모델 연구)

  • Hong, Yoonseok;Ki, Kyungseo;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.115-122
    • /
    • 2019
  • For Korean phoneme recognition, Hidden Markov-Gaussian Mixture model(HMM-GMM) or hybrid models which combine artificial neural network with HMM have been mainly used. However, current approach has limitations in that such models require force-aligned corpus training data that is manually annotated by experts. Recently, researchers used neural network based phoneme recognition model which combines recurrent neural network(RNN)-based structure with connectionist temporal classification(CTC) algorithm to overcome the problem of obtaining manually annotated training data. Yet, in terms of implementation, these RNN-based models have another difficulty in that the amount of data gets larger as the structure gets more sophisticated. This problem of large data size is particularly problematic in the Korean language, which lacks refined corpora. In this study, we introduce CTC algorithm that does not require force-alignment to create a Korean phoneme recognition model. Specifically, the phoneme recognition model is based on convolutional neural network(CNN) which requires relatively small amount of data and can be trained faster when compared to RNN based models. We present the results from two different experiments and a resulting best performing phoneme recognition model which distinguishes 49 Korean phonemes. The best performing phoneme recognition model combines CNN with 3hop Bidirectional LSTM with the final Phoneme Error Rate(PER) at 3.26. The PER is a considerable improvement compared to existing Korean phoneme recognition models that report PER ranging from 10 to 12.

Isolated Digit Recognition Combined with Recurrent Neural Prediction Models and Chaotic Neural Networks (회귀예측 신경모델과 카오스 신경회로망을 결합한 고립 숫자음 인식)

  • Kim, Seok-Hyun;Ryeo, Ji-Hwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.6
    • /
    • pp.129-135
    • /
    • 1998
  • In this paper, the recognition rate of isolated digits has been improved using the multiple neural networks combined with chaotic recurrent neural networks and MLP. Generally, the recognition rate has been increased from 1.2% to 2.5%. The experiments tell that the recognition rate is increased because MLP and CRNN(chaotic recurrent neural network) compensate for each other. Besides this, the chaotic dynamic properties have helped more in speech recognition. The best recognition rate is when the algorithm combined with MLP and chaotic multiple recurrent neural network has been used. However, in the respect of simple algorithm and reliability, the multiple neural networks combined with MLP and chaotic single recurrent neural networks have better properties. Largely, MLP has very good recognition rate in korean digits "il", "oh", while the chaotic recurrent neural network has best recognition in "young", "sam", "chil".

  • PDF

Recognition of Characters Printed on PCB Components Using Deep Neural Networks (심층신경망을 이용한 PCB 부품의 인쇄문자 인식)

  • Cho, Tai-Hoon
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.6-10
    • /
    • 2021
  • Recognition of characters printed or marked on the PCB components from images captured using cameras is an important task in PCB components inspection systems. Previous optical character recognition (OCR) of PCB components typically consists of two stages: character segmentation and classification of each segmented character. However, character segmentation often fails due to corrupted characters, low image contrast, etc. Thus, OCR without character segmentation is desirable and increasingly used via deep neural networks. Typical implementation based on deep neural nets without character segmentation includes convolutional neural network followed by recurrent neural network (RNN). However, one disadvantage of this approach is slow execution due to RNN layers. LPRNet is a segmentation-free character recognition network with excellent accuracy proved in license plate recognition. LPRNet uses a wide convolution instead of RNN, thus enabling fast inference. In this paper, LPRNet was adapted for recognizing characters printed on PCB components with fast execution and high accuracy. Initial training with synthetic images followed by fine-tuning on real text images yielded accurate recognition. This net can be further optimized on Intel CPU using OpenVINO tool kit. The optimized version of the network can be run in real-time faster than even GPU.

Real-time traffic situation analysis and fire type artificial intelligence application study when 119 fire trucks are dispatched Intelligence research (119 소방차 출동 시 실시간 교통상황 분석 및 화재유형 인공지능 적용 연구)

  • Lee, Han-young;Park, Dea-woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.222-224
    • /
    • 2022
  • Korea has more than 2,000 fires and more than 2,000 casualties every year. This study takes measures to facilitate the incorporation of 119 fire trucks by judging vehicles or standing signs using real-time image reading YOLO5 before the fire trucks arrive at the fire site. It is possible to shorten the time to extinguish a fire by photographing a fire site, transmitting the situation of the site, and analyzing the components of smoke to determine the type of fire. As a result, it is expected that it will be able to minimize casualties by keeping the golden time.

  • PDF

The Edge Computing System for the Detection of Water Usage Activities with Sound Classification (음향 기반 물 사용 활동 감지용 엣지 컴퓨팅 시스템)

  • Seung-Ho Hyun;Youngjoon Chee
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.2
    • /
    • pp.147-156
    • /
    • 2023
  • Efforts to employ smart home sensors to monitor the indoor activities of elderly single residents have been made to assess the feasibility of a safe and healthy lifestyle. However, the bathroom remains an area of blind spot. In this study, we have developed and evaluated a new edge computer device that can automatically detect water usage activities in the bathroom and record the activity log on a cloud server. Three kinds of sound as flushing, showering, and washing using wash basin generated during water usage were recorded and cut into 1-second scenes. These sound clips were then converted into a 2-dimensional image using MEL-spectrogram. Sound data augmentation techniques were adopted to obtain better learning effect from smaller number of data sets. These techniques, some of which are applied in time domain and others in frequency domain, increased the number of training data set by 30 times. A deep learning model, called CRNN, combining Convolutional Neural Network and Recurrent Neural Network was employed. The edge device was implemented using Raspberry Pi 4 and was equipped with a condenser microphone and amplifier to run the pre-trained model in real-time. The detected activities were recorded as text-based activity logs on a Firebase server. Performance was evaluated in two bathrooms for the three water usage activities, resulting in an accuracy of 96.1% and 88.2%, and F1 Score of 96.1% and 87.8%, respectively. Most of the classification errors were observed in the water sound from washing. In conclusion, this system demonstrates the potential for use in recording the activities as a lifelog of elderly single residents to a cloud server over the long-term.

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.