Search | Korea Science

Performance comparison of wake-up-word detection on mobile devices using various convolutional neural networks (다양한 합성곱 신경망 방식을 이용한 모바일 기기를 위한 시작 단어 검출의 성능 비교)

Kim, Sanghong;Lee, Bowon
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.454-460
- /
- 2020
Artificial intelligence assistants that provide speech recognition operate through cloud-based voice recognition with high accuracy. In cloud-based speech recognition, Wake-Up-Word (WUW) detection plays an important role in activating devices on standby. In this paper, we compare the performance of Convolutional Neural Network (CNN)-based WUW detection models for mobile devices by using Google's speech commands dataset, using the spectrogram and mel-frequency cepstral coefficient features as inputs. The CNN models used in this paper are multi-layer perceptron, general convolutional neural network, VGG16, VGG19, ResNet50, ResNet101, ResNet152, MobileNet. We also propose network that reduces the model size to 1/25 while maintaining the performance of MobileNet is also proposed.
https://doi.org/10.7776/ASK.2020.39.5.454 인용 PDF KSCI

Development of a Recognition System of Smile Facial Expression for Smile Treatment Training (웃음 치료 훈련을 위한 웃음 표정 인식 시스템 개발)

Li, Yu-Jie;Kang, Sun-Kyung;Kim, Young-Un;Jung, Sung-Tae
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.4
- /
- pp.47-55
- /
- 2010
In this paper, we proposed a recognition system of smile facial expression for smile treatment training. The proposed system detects face candidate regions by using Haar-like features from camera images. After that, it verifies if the detected face candidate region is a face or non-face by using SVM(Support Vector Machine) classification. For the detected face image, it applies illumination normalization based on histogram matching in order to minimize the effect of illumination change. In the facial expression recognition step, it computes facial feature vector by using PCA(Principal Component Analysis) and recognizes smile expression by using a multilayer perceptron artificial network. The proposed system let the user train smile expression by recognizing the user's smile expression in real-time and displaying the amount of smile expression. Experimental result show that the proposed system improve the correct recognition rate by using face region verification based on SVM and using illumination normalization based on histogram matching.
https://doi.org/10.9708/jksci.2010.15.4.047 인용 PDF KSCI

Modified Error Back Propagation Algorithm using the Approximating of the Hidden Nodes in Multi-Layer Perceptron (다층퍼셉트론의 은닉노드 근사화를 이용한 개선된 오류역전파 학습)

Kwak, Young-Tae;Lee, young-Gik;Kwon, Oh-Seok
- Journal of KIISE:Software and Applications
- /
- v.28 no.9
- /
- pp.603-611
- /
- 2001
This paper proposes a novel fast layer-by-layer algorithm that has better generalization capability. In the proposed algorithm, the weights of the hidden layer are updated by the target vector of the hidden layer obtained by least squares method. The proposed algorithm improves the learning speed that can occur due to the small magnitude of the gradient vector in the hidden layer. This algorithm was tested in a handwritten digits recognition problem. The learning speed of the proposed algorithm was faster than those of error back propagation algorithm and modified error function algorithm, and similar to those of Ooyen's method and layer-by-layer algorithm. Moreover, the simulation results showed that the proposed algorithm had the best generalization capability among them regardless of the number of hidden nodes. The proposed algorithm has the advantages of the learning speed of layer-by-layer algorithm and the generalization capability of error back propagation algorithm and modified error function algorithm.
PDF

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

Ohn, Syng-Yup;Chi, Seung-Do
- Journal of the Korea Society for Simulation
- /
- v.20 no.2
- /
- pp.77-85
- /
- 2011
It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.
https://doi.org/10.9709/JKSS.2011.20.2.077 인용 PDF KSCI

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

김명원;한문성;이순신;류정우
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.41 no.3
- /
- pp.67-77
- /
- 2004
The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.
PDF KSCI

Gaze Detection by Computing Facial Rotation and Translation (얼굴의 회전 및 이동 분석에 의한 응시 위치 파악)

Lee, Jeong-Jun;Park, Kang-Ryoung;Kim, Jai-Hie
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.39 no.5
- /
- pp.535-543
- /
- 2002
In this paper, we propose a new gaze detection method using 2-D facial images captured by a camera on top of the monitor. We consider only the facial rotation and translation and not the eyes' movements. The proposed method computes the gaze point caused by the facial rotation and the amount of the facial translation respectively, and by combining these two the final gaze point on a monitor screen can be obtained. We detected the gaze point caused by the facial rotation by using a neural network(a multi-layered perceptron) whose inputs are the 2-D geometric changes of the facial features' points and estimated the amount of the facial translation by image processing algorithms in real time. Experimental results show that the gaze point detection accuracy between the computed positions and the real ones is about 2.11 inches in RMS error when the distance between the user and a 19-inch monitor is about 50~70cm. The processing time is about 0.7 second with a Pentium PC(233MHz) and 320${\times}$240 pixel-size images.
PDF KSCI

Performance Comparison for Radar Target Classification of Monostatic RCS and Bistatic RCS (모노스태틱 RCS와 바이스태틱 RCS의 표적 구분 성능 분석)

Lee, Sung-Jun;Choi, In-Sik
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.21 no.12
- /
- pp.1460-1466
- /
- 2010
In this paper, we analyzed the performance of radar target classification using the monostatic and bistatic radar cross section(RCS) for four different wire targets. Short time Fourier transform(STFT) and continuous wavelet transform (CWT) were used for feature extraction from the monostatic RCS and the bistatic RCS of each target, and a multi-layered perceptron(MLP) neural network was used as a classifier. Results show that CWT yields better performance than STFT for both the monostatic RCS and the bistatic RCS. And, when STFT was used, the performance of the bistatic RCS was slightly better than that of the monostatic RCS. However, when CWT was used, the performance of the monostatic RCS was slightly better than that of the bistatic RCS. Resultingly, it is proven that bistatic RCS is a good cadndidate for application to radar target classification in combination with a monostatic RCS.
https://doi.org/10.5515/KJKIEES.2010.21.12.1460 인용 PDF KSCI

Dynamic Gesture Recognition for the Remote Camera Robot Control (원격 카메라 로봇 제어를 위한 동적 제스처 인식)

Lee Ju-Won;Lee Byung-Ro
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.8 no.7
- /
- pp.1480-1487
- /
- 2004
This study is proposed the novel gesture recognition method for the remote camera robot control. To recognize the dynamics gesture, the preprocessing step is the image segmentation. The conventional methods for the effectively object segmentation has need a lot of the cole. information about the object(hand) image. And these methods in the recognition step have need a lot of the features with the each object. To improve the problems of the conventional methods, this study proposed the novel method to recognize the dynamic hand gesture such as the MMS(Max-Min Search) method to segment the object image, MSM(Mean Space Mapping) method and COG(Conte. Of Gravity) method to extract the features of image, and the structure of recognition MLPNN(Multi Layer Perceptron Neural Network) to recognize the dynamic gestures. In the results of experiment, the recognition rate of the proposed method appeared more than 90[%], and this result is shown that is available by HCI(Human Computer Interface) device for .emote robot control.
PDF KSCI

Realization of home appliance classification system using deep learning (딥러닝을 이용한 가전제품 분류 시스템 구현)

Son, Chang-Woo;Lee, Sang-Bae
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.9
- /
- pp.1718-1724
- /
- 2017
Recently, Smart plugs for real time monitoring of household appliances based on IoT(Internet of Things) have been activated. Through this, consumers are able to save energy by monitoring real-time energy consumption at all times, and reduce power consumption through alarm function based on consumer setting. In this paper, we measure the alternating current from a wall power outlet for real-time monitoring. At this time, the current pattern for each household appliance was classified and it was experimented with deep learning to determine which product works. As a result, we used a cross validation method and a bootstrap verification method in order to the classification performance according to the type of appliances. Also, it is confirmed that the cost function and the learning success rate are the same as the train data and test data.
https://doi.org/10.6109/jkiice.2017.21.9.1718 인용 PDF KSCI

Feed-forward Learning Algorithm by Generalized Clustering Network (Generalized Clustering Network를 이용한 전방향 학습 알고리즘)

Min, Jun-Yeong;Jo, Hyeong-Gi
- The Transactions of the Korea Information Processing Society
- /
- v.2 no.5
- /
- pp.619-625
- /
- 1995
This paper constructs a feed-forward learning complex algorithm which replaced by the backpropagation learning. This algorithm first attempts to organize the pattern vectors into clusters by Generalized Learning Vector Quantization(GLVQ) clustering algorithm(Nikhil R. Pal et al, 1993), second, regroup the pattern vectors belonging to different clusters, and the last, recognize into regrouping pattern vectors by single layer perceptron. Because this algorithm is feed-forward learning algorithm, time is less than backpropagation algorithm and the recognition rate is increased. We use 250 ASCII code bit patterns that is normalized to 16$\times$8. As experimental results, when 250 patterns devide by 10 clusters, average iteration of each cluster is 94.7, and recognition rate is 100%.
PDF

Search Result 383, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)