Performance of the Phoneme Segmenter in Speech Recognition System

Lee, Gwang-seok;

한국정보통신학회:학술대회논문집 (Proceedings of the Korean Institute of Information and Commucation Sciences Conference)

한국정보통신학회 (The Korea Institute of Information and Commucation Engineering)

음성인식 시스템에서의 음소분할기의 성능

Performance of the Phoneme Segmenter in Speech Recognition System

이광석 (진주산업대학교)

Lee, Gwang-seok (Jinju National University)

발행 : 2009.10.29

PDF

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 연구는 자연음성의 인식을 위하여 신경회로망을 기초로 한 음소 분할기에 대하여 기술하였다. 자연음성의 인식을 위한 음소 분할기의 입력으로는 16차 멜 스케일의 FFT, 정규화된 프레임 에너지, 0~3[KHz] 주파수 대역 및 그 이상의 대역에서의 에너지 비를 사용하였다. 모든 특징들은 두개의 연속적인 10[msec] 프레임의 차이며, 본 연구에 사용한 음소분할기는 하나의 72입력을 가지는 은닉층 퍼셉트론, 20은닉노드 및 하나의 출력노드로 구성하여 사용하였다. 자연음성에 대한 음소분할의 정확도는 7.8%삽입을 가지는 78%를 얻을 수 있었다.

This research describes a neural network-based phoneme segmenter for recognizing spontaneous speech. The input of the phoneme segmenter for spontaneous speech is 16th order mel-scaled FFT, normalized frame energy, ratio of energy among 0~3[KHz] band and more than 3[KHz] band. All the features are differences of two consecutive 10 [msec] frame. The main body of the segmenter is single-hidden layer MLP(Multi-Layer Perceptron) with 72 inputs, 20 hidden nodes, and one output node. The segmentation accuracy is 78% with 7.8% insertion.

한국정보통신학회:학술대회논문집 (Proceedings of the Korean Institute of Information and Commucation Sciences Conference)

음성인식 시스템에서의 음소분할기의 성능

Performance of the Phoneme Segmenter in Speech Recognition System

초록

키워드

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)