통합 검색 | Korea Science

심층신경망을 이용한 짧은 발화 음성인식에서 극점 필터링 기반의 특징 정규화 적용 (Applying feature normalization based on pole filtering to short-utterance speech recognition using deep neural network)

한재민;김민식;김형순
- 한국음향학회지
- /
- 제39권1호
- /
- pp.64-68
- /
- 2020
가우스 혼합 모델-은닉 마코프 모델(Gaussian Mixture Model-Hidden Markov Model, GMM-HMM)을 이용하는 전통적인 음성인식 시스템에서는, 극점 필터링 기반의 켑스트럼 특징 정규화 방식이 잡음 환경에서 짧은 발화의 인식 성능을 향상시키는데 효과적이었다. 본 논문에서는 심층신경망(Deep Neural Network, DNN)을 이용하는 최신의 음성인식 시스템에서도 이 방식의 유용성이 있는지 검토한다. AURORA 2 DB에 대한 실험 결과, 특히 훈련 및 테스트 환경 사이의 불일치가 클 때에, 극점 필터링 기반의 켑스트럼 평균 분산 정규화 방식이 극점 필터링을 사용하지 않는 방식에 비해 매우 짧은 발화의 인식 성능을 개선시킴을 보여 준다.
https://doi.org/10.7776/ASK.2020.39.1.064 인용 PDF KSCI

카오틱 신경망을 이용한 서체 숫자 인식 (Recognition of Unconstrained Handwritten Numerals using Chaotic Neural Network)

조재홍;성정원
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 1998년도 추계종합학술대회 논문집
- /
- pp.1301-1304
- /
- 1998
Several neural networks have been successfully used to classify complex patterns such as handwritten numerals or words. This paper describes the discrimination of totally unconstrained handwritten numerals using the proposed chaotic neural network (CNN) to improve the recognition rate. The recognition system in the paper consists of the preprocessing stage to extract features using Kirsch mask and the classification stage to recognize numerals using the CNN. In order to evaluate the performance of the proposed network, we performed the recognition with unconstrained handwritten numeral database of Concordia university, Canada. Experimental results show that the CNN based recognizer performs higher recognition rate than other neural network-based methods reported using same database.
PDF

계층구조 시간지연 신경망을 이용한 한국어 변이음 인식에 관한 연구 (A Study on Korean Allophone Recognition Using Hierarchical Time-Delay Neural Network)

김수일;임해창
- 전자공학회논문지B
- /
- 제32B권1호
- /
- pp.171-179
- /
- 1995
In many continuous speech recognition systems, phoneme is used as a basic recognition unit However, the coarticulation generated among neighboring phonemes makes difficult to recognize phonemes consistently. This paper proposes allophone as an alternative recognition unit. We have classified each phoneme into three different allophone groups by the location of phoneme within a syllable. For a recognition algorithm, time-delay neural network(TDNN) has been designed. To recognize all Korean allophones, TDNNs are constructed in modular fashion according to acoustic-phonetic features (e.g. voiced/unvoiced, the location of phoneme within a word). Each TDNN is trained independently, and then they are integrated hierarchically into a whole speech recognition system. In this study, we have experimented Korean plosives with phoneme-based recognition system and allophone-based recognition system. Experimental results show that allophone-based recognition is much less affected by the coarticulation.
PDF

신경망과 구문분석을 이용한 한국어 연결 숫자음 인식 (Connected Korean Digit Recognition Using Neural Networks and Lexical Analysis)

이종석;이상욱
- 전자공학회논문지B
- /
- 제30B권12호
- /
- pp.21-30
- /
- 1993
In this paper, we propose a connected Korean digit recohnition system employing neural networks and lexical constraints of the Korean digits. In the proposed recognition system, firstly, each frame of digit string is labelled by phoneme classification neural networks.which are trained with the reference phoneme segments extracted form an isolated digit based on the position information. And, the frame labels are combined with each other for constructing the phoneme segments. Then, these segments are combined to form a digit candidate using the digit combination rules. The digit candidate is decided based on the condition for digit decision. If the condition is not satisfied, the digit candidate is further recognized using the digit decision neural network in the next step. In our approach, the neural networks are trained with 10 isolated digits uttered by 5 male speakers. To investigate the performance of the proposed recognition system, an intensive computer simulation on the 30 connected digit strings uttered by 5 male speakers is performed. The simulation result indicates that 95.6% digit recognition rate and 82% digit string recognition rate are provided by the proposed Korean digit recognition system.
PDF

License Plate Recognition System Using Artificial Neural Networks

Turkyilmaz, Ibrahim;Kacan, Kirami
- ETRI Journal
- /
- 제39권2호
- /
- pp.163-172
- /
- 2017
A high performance license plate recognition system (LPRS) is proposed in this work. The proposed LPRS is composed of the following three main stages: (i) plate region determination, (ii) character segmentation, and (iii) character recognition. During the plate region determination stage, the image is enhanced by image processing algorithms to increase system performance. The rectangular license plate region is obtained using edge-based image processing methods on the binarized image. With the help of skew correction, the plate region is prepared for the character segmentation stage. Characters are separated from each other using vertical projections on the plate region. Segmented characters are prepared for the character recognition stage by a thinning process. At the character recognition stage, a three-layer feedforward artificial neural network using a backpropagation learning algorithm is constructed and the characters are determined.
https://doi.org/10.4218/etrij.17.0115.0766 인용 PDF KSCI

Effective Hand Gesture Recognition by Key Frame Selection and 3D Neural Network

Hoang, Nguyen Ngoc;Lee, Guee-Sang;Kim, Soo-Hyung;Yang, Hyung-Jeong
- 스마트미디어저널
- /
- 제9권1호
- /
- pp.23-29
- /
- 2020
This paper presents an approach for dynamic hand gesture recognition by using algorithm based on 3D Convolutional Neural Network (3D_CNN), which is later extended to 3D Residual Networks (3D_ResNet), and the neural network based key frame selection. Typically, 3D deep neural network is used to classify gestures from the input of image frames, randomly sampled from a video data. In this work, to improve the classification performance, we employ key frames which represent the overall video, as the input of the classification network. The key frames are extracted by SegNet instead of conventional clustering algorithms for video summarization (VSUMM) which require heavy computation. By using a deep neural network, key frame selection can be performed in a real-time system. Experiments are conducted using 3D convolutional kernels such as 3D_CNN, Inflated 3D_CNN (I3D) and 3D_ResNet for gesture classification. Our algorithm achieved up to 97.8% of classification accuracy on the Cambridge gesture dataset. The experimental results show that the proposed approach is efficient and outperforms existing methods.
https://doi.org/10.30693/SMJ.2020.9.1.23 인용 PDF KSCI

계층적 CNN을 이용한 방송 매체 내의 객체 인식 시스템 성능향상 방안 (Performance Improvement of Object Recognition System in Broadcast Media Using Hierarchical CNN)

권명규;양효식
- 디지털융복합연구
- /
- 제15권3호
- /
- pp.201-209
- /
- 2017
본 논문은 계층적 Convolutional Nerual Network(CNN)을 이용한 스마트폰용 객체 인식 시스템이다. 전체적인 구성은 스마트폰과 서버를 연결하여 서버에서 컨볼루셔널 뉴럴 네트워크로 객체 인식을 하고 수집된 데이터를 매칭시켜 스마트폰으로 객체의 상세정보를 전달하는 방법이다. 또한 계층적 컨볼루셔널 뉴럴 네트워크와 단편적 컨볼루셔널 뉴럴 네트워크와 비교하였다. 계층적 컨볼루셔널 뉴럴 네트워크는 88%, 단편적 컨볼루셔널 뉴럴 네트워크는 73%의 정확도를 가지며 15%p의 성능 향상을 보였다. 이를 기반으로 스마트폰과 방송매체와 연동한 T-Commerce 시장 확장의 가능성을 보여준다. 아울러 방송영상을 시청하면서 Information Retrieval, AR/VR 서비스도 제공 가능하다.
https://doi.org/10.14400/JDC.2017.15.3.201 인용 PDF KSCI

Human Gait Recognition Based on Spatio-Temporal Deep Convolutional Neural Network for Identification

Zhang, Ning;Park, Jin-ho;Lee, Eung-Joo
- 한국멀티미디어학회논문지
- /
- 제23권8호
- /
- pp.927-939
- /
- 2020
Gait recognition can identify people's identity from a long distance, which is very important for improving the intelligence of the monitoring system. Among many human features, gait features have the advantages of being remotely available, robust, and secure. Traditional gait feature extraction, affected by the development of behavior recognition, can only rely on manual feature extraction, which cannot meet the needs of fine gait recognition. The emergence of deep convolutional neural networks has made researchers get rid of complex feature design engineering, and can automatically learn available features through data, which has been widely used. In this paper,conduct feature metric learning in the three-dimensional space by combining the three-dimensional convolution features of the gait sequence and the Siamese structure. This method can capture the information of spatial dimension and time dimension from the continuous periodic gait sequence, and further improve the accuracy and practicability of gait recognition.
https://doi.org/10.9717/kmms.2020.23.8.927 인용 PDF KSCI HTML

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

Haskey, S.J.;Datta, S.
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 1996년도 10월 학술대회지
- /
- pp.464-467
- /
- 1996
This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker
PDF

임베디드 스마트 응용을 위한 신경망기반 SoC (A SoC Based on a Neural Network for Embedded Smart Applications)

이봉규
- 전기학회논문지
- /
- 제58권10호
- /
- pp.2059-2063
- /
- 2009
This paper presents a programmable System-On-a-chip (SoC) for various embedded smart applications that need Neural Network computations. The system is fully implemented into a prototyping platform based on Field Programmable Gate Array (FPGA). The SoC consists of an embedded processor core and a reconfigurable hardware accelerator for neural computations. The performance of the SoC is evaluated using a real image processing application, an optical character recognition (OCR) system.
PDF KSCI

검색결과 379건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)