• Title/Summary/Keyword: Model recognition

Search Result 3,397, Processing Time 0.026 seconds

The development of food image detection and recognition model of Korean food for mobile dietary management

  • Park, Seon-Joo;Palvanov, Akmaljon;Lee, Chang-Ho;Jeong, Nanoom;Cho, Young-Im;Lee, Hae-Jeung
    • Nutrition Research and Practice
    • /
    • v.13 no.6
    • /
    • pp.521-528
    • /
    • 2019
  • BACKGROUND/OBJECTIVES: The aim of this study was to develop Korean food image detection and recognition model for use in mobile devices for accurate estimation of dietary intake. MATERIALS/METHODS: We collected food images by taking pictures or by searching web images and built an image dataset for use in training a complex recognition model for Korean food. Augmentation techniques were performed in order to increase the dataset size. The dataset for training contained more than 92,000 images categorized into 23 groups of Korean food. All images were down-sampled to a fixed resolution of $150{\times}150$ and then randomly divided into training and testing groups at a ratio of 3:1, resulting in 69,000 training images and 23,000 test images. We used a Deep Convolutional Neural Network (DCNN) for the complex recognition model and compared the results with those of other networks: AlexNet, GoogLeNet, Very Deep Convolutional Neural Network, VGG and ResNet, for large-scale image recognition. RESULTS: Our complex food recognition model, K-foodNet, had higher test accuracy (91.3%) and faster recognition time (0.4 ms) than those of the other networks. CONCLUSION: The results showed that K-foodNet achieved better performance in detecting and recognizing Korean food compared to other state-of-the-art models.

Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model (감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.7
    • /
    • pp.209-214
    • /
    • 2013
  • Improve the recognition performance of speech recognition systems as a method for recognizing human listening skills were incorporated into the system. In noisy environments by separating the speech signal and noise, select the desired speech signal. but In terms of practical performance of speech recognition systems are factors. According to recognized environmental changes due to noise speech detection is not accurate and learning model does not match. In this paper, to improve the speech recognition feature extraction using gamma tone and learning model using acoustic model was proposed. The proposed method the feature extraction using auditory scene analysis for human auditory perception was reflected In the process of learning models for recognition. For performance evaluation in noisy environments, -10dB, -5dB noise in the signal was performed to remove 3.12dB, 2.04dB SNR improvement in performance was confirmed.

Pose-invariant Face Recognition using a Cylindrical Model and Stereo Camera (원통 모델과 스테레오 카메라를 이용한 포즈 변화에 강인한 얼굴인식)

  • 노진우;홍정화;고한석
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.929-938
    • /
    • 2004
  • This paper proposes a pose-invariant face recognition method using cylindrical model and stereo camera. We divided this paper into two parts. One is single input image case, the other is stereo input image case. In single input image case, we normalized a face's yaw pose using cylindrical model, and in stereo input image case, we normalized a face's pitch pose using cylindrical model with previously estimated pitch pose angle by the stereo geometry. Also, since we have an advantage that we can utilize two images acquired at the same time, we can increase overall recognition performance by decision-level fusion. Through representative experiments, we achieved an increased recognition rate from 61.43% to 94.76% by the yaw pose transform, and the recognition rate with the proposed method achieves as good as that of the more complicated 3D face model. Also, by using stereo camera system we achieved an increased recognition rate 5.24% more for the case of upper face pose, and 3.34% more by decision-level fusion.

Development of Facial Emotion Recognition System Based on Optimization of HMM Structure by using Harmony Search Algorithm (Harmony Search 알고리즘 기반 HMM 구조 최적화에 의한 얼굴 정서 인식 시스템 개발)

  • Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.3
    • /
    • pp.395-400
    • /
    • 2011
  • In this paper, we propose an study of the facial emotion recognition considering the dynamical variation of emotional state in facial image sequences. The proposed system consists of two main step: facial image based emotional feature extraction and emotional state classification/recognition. At first, we propose a method for extracting and analyzing the emotional feature region using a combination of Active Shape Model (ASM) and Facial Action Units (FAUs). And then, it is proposed that emotional state classification and recognition method based on Hidden Markov Model (HMM) type of dynamic Bayesian network. Also, we adopt a Harmony Search (HS) algorithm based heuristic optimization procedure in a parameter learning of HMM in order to classify the emotional state more accurately. By using all these methods, we construct the emotion recognition system based on variations of the dynamic facial image sequence and make an attempt at improvement of the recognition performance.

Efficient context dependent process modeling using state tying and decision tree-based method (상태 공유와 결정트리 방법을 이용한 효율적인 문맥 종속 프로세스 모델링)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.3
    • /
    • pp.369-377
    • /
    • 2010
  • In vocabulary recognition systems based on HMM(Hidden Markov Model)s, training process unseen model bring on show a low recognition rate. If recognition vocabulary modify and make an addition then recreated modeling of executed database collected and training sequence on account of bring on additional expenses and take more time. This study suggest efficient context dependent process modeling method using decision tree-based state tying. On study suggest method is reduce recreated of model and it's offered that robustness and accuracy of context dependent acoustic modeling. Also reduce amount of model and offered training process unseen model as concerns context dependent a likely phoneme model has been used unseen model solve the matter. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.38%.

A STUDY ON THE IMPLEMENTATION OF ARTIFICIAL NEURAL NET MODELS WITH FEATURE SET INPUT FOR RECOGNITION OF KOREAN PLOSIVE CONSONANTS (한국어 파열음 인식을 위한 피쳐 셉 입력 인공 신경망 모델에 관한 연구)

  • Kim, Ki-Seok;Kim, In-Bum;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.535-538
    • /
    • 1990
  • The main problem in speech recognition is the enormous variability in acoustic signals due to complex but predictable contextual effects. Especially in plosive consonants it is very difficult to find invariant cue due to various contextual effects, but humans use these contextual effects as helpful information in plosive consonant recognition. In this paper we experimented on three artificial neural net models for the recognition of plosive consonants. Neural Net Model I used "Multi-layer Perceptron ". Model II used a variation of the "Self-organizing Feature Map Model". And Model III used "Interactive and Competitive Model" to experiment contextual effects. The recognition experiment was performed on 9 Korean plosive consonants. We used VCV speech chains for the experiment on contextual effects. The speech chain consists of Korean plosive consonants /g, d, b, K, T, P, k, t, p/ (/ㄱ, ㄷ, ㅂ, ㄲ, ㄸ, ㅃ, ㅋ, ㅌ, ㅍ/) and eight Korean monothongs. The inputs to Neural Net Models were several temporal cues - duration of the silence, transition and vot -, and the extent of the VC formant transitions to the presence of voicing energy during closure, burst intensity, presence of asperation, amount of low frequency energy present at voicing onset, and CV formant transition extent from the acoustic signals. Model I showed about 55 - 67 %, Model II showed about 60%, and Model III showed about 67% recognition rate.

  • PDF

A Study on the Application of Object Detection Method in Construction Site through Real Case Analysis (사례분석을 통한 객체검출 기술의 건설현장 적용 방안에 관한 연구)

  • Lee, Kiseok;Kang, Sungwon;Shin, Yoonseok
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.2
    • /
    • pp.269-279
    • /
    • 2022
  • Purpose: The purpose of this study is to develop a deep learning-based personal protective equipment detection model for disaster prevention at construction sites, and to apply it to actual construction sites and to analyze the results. Method: In the method of conducting this study, the dataset on the real environment was constructed and the developed personal protective equipment(PPE) detection model was applied. The PPE detection model mainly consists of worker detection and PPE classification model.The worker detection model uses a deep learning-based algorithm to build a dataset obtained from the actual field to learn and detect workers, and the PPE classification model applies the PPE detection algorithm learned from the worker detection area extracted from the work detection model. For verification of the proposed model, experimental results were derived from data obtained from three construction sites. Results: The application of the PPE recognition model to construction site brings up the problems related to mis-recognition and non-recognition. Conclusions: The analysis outcomes were produced to apply the object recognition technology to a construction site, and the need for follow-up research was suggested through representative cases of worker recognition and non-recognition, and mis-recognition of personal protective equipment.

Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network (심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용)

  • Han, Jaemin;Kim, Min Sik;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.64-68
    • /
    • 2020
  • In a conventional speech recognition system using Gaussian Mixture Model-Hidden Markov Model (GMM-HMM), the cepstral feature normalization method based on pole filtering was effective in improving the performance of recognition of short utterances in noisy environments. In this paper, the usefulness of this method for the state-of-the-art speech recognition system using Deep Neural Network (DNN) is examined. Experimental results on AURORA 2 DB show that the cepstral mean and variance normalization based on pole filtering improves the recognition performance of very short utterances compared to that without pole filtering, especially when there is a large mismatch between the training and test conditions.

HMM-based Speech Recognition using DMS Model and Double Spectral Feature (DMS 모델과 이중 스펙트럼 특징을 이용한 HMM에 의한 음성 인식)

  • Ann Tae-Ock
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.7 no.4
    • /
    • pp.649-655
    • /
    • 2006
  • This paper proposes a HMM-based recognition method using DMSVQ(Dynamic Multi-Section Vector Quantization) codebook by DMS model and double spectral feature, as a method on the speech recognition of speaker-independent. LPC cepstrum parameter is used as a instantaneous spectral feature and LPC cepstrum's regression coefficient is used as a dynamic spectral feature These two spectral features are quantized as each VQ codebook. HMM using DMS model is modeled by receiving instantaneous spectral feature and dynamic spectral feature by input. Other experiments to compare with the results of recognition experiments using proposed method are implemented by the various conventional recognition methods under the equivalent environment of data and conditions. Through the experiment results, it is proved that the proposed method in this paper is superior to the conventional recognition methods.

  • PDF

Korean Speech Recognition using DHMM (DHMM을 이용한 한국어 음성 인식)

  • Ann, T.O.;Lee, K.S.;Yoo, H.K.;Lee, H.J.;Cho, H.J.;Byun, Y.G.;Kim, S.H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.52-60
    • /
    • 1991
  • This paper describes the study on isolated word recognition by using DHMM(Dynamic Hidden Markov Model) which has dynamic feature of spectrum as a parameter. This paper discusses speech recognition experiment basedon HMM which can evaluate not only instantaneous spectral features but also dynamic spectral features. LPC cepstrum parameters is used as a static feature and LPC cepstrum's regression coefficient is used as a dynamic feature. These two features are quantized by each VQ codebook. DHMM is modeled by receiving static vector and dynamic vector by input. In the whole experiment, as recognition experiment using DHMM shows 92.7% of recognition rate while the experiment using conventional HMM shows 88.8% of recognition rate, DHMM proved to be a useful model.

  • PDF