• 제목/요약/키워드: Continuous Speech

검색결과 315건 처리시간 0.025초

한국어 연속음성에서의 조사 및 어미 인식에 관한 연구 (A Study on Recognition of Korean Postpositions and Suffixes in Continuous Speech)

  • 송민석;이기영
    • 음성과학
    • /
    • 제6권
    • /
    • pp.181-195
    • /
    • 1999
  • This study proposes a method of recognizing postpositions and suffixes in Korean spoken language, using prosodic information. We detect grammatical boundaries automatically at first, by using prosodic information of the accentual phrase, and then we recognize grammatical function words by backward-tracking from the boundaries. The experiment employs 300 sentential speech data of 10 men's and 5 women's voice spoken in standard Korean, in which 1080 accentual phrases and 11 postpositions and suffixes are included. The result shows the recognition rate of postpositions in two cases. In one case in which only correctly detected boundaries are included, the recognition rate is 97.5%, and in the other case in which all detected boundaries are included, the recognition rate is 74.8%.

  • PDF

Discriminant 학습을 이용한 전화 숫자음 인식 (Telephone Digit Speech Recognition using Discriminant Learning)

  • 한문성;최완수;권현직
    • 대한전자공학회논문지TE
    • /
    • 제37권3호
    • /
    • pp.16-20
    • /
    • 2000
  • 대부분의 음성인식 시스템이 확률 모델을 기반으로 한 HMM 방법을 가장 많이 사용하고 있다. 한국어 고립 전화 숫자음 인식인 경우에 만약 충분한 학습 데이터가 주어지면 HMM 방법을 사용해도 높은 인식률을 얻는다 그러나 한국어 연속 전화 숫자음 인식인 경우에 비슷하게 발음되는 전화 숫자음들에 대해서는 HMM방법이 한계를 가지고 있다. 본 논문에서는 한국어 연속 전화 숫자음 인식에서 HMM 방법의 한계를 극복하기 위해 discriminant 학습 방법을 제시한다. 실험결과는 우리가 제시한 discriminant 학습 방법이 비슷하게 발음되는 전화 숫자음들에 대해서 높은 인식률을 갖는 것을 보여준다.

  • PDF

Landmark-Guided Segmental Speech Decoding for Continuous Mandarin Speech Recognition

  • Chao, Hao;Song, Cheng
    • Journal of Information Processing Systems
    • /
    • 제12권3호
    • /
    • pp.410-421
    • /
    • 2016
  • In this paper, we propose a framework that attempts to incorporate landmarks into a segment-based Mandarin speech recognition system. In this method, landmarks provide boundary information and phonetic class information, and the information is used to direct the decoding process. To prove the validity of this method, two kinds of landmarks that can be reliably detected are used to direct the decoding process of a segment model (SM) based Mandarin LVCSR (large vocabulary continuous speech recognition) system. The results of our experiment show that about 30% decoding time can be saved without an obvious decrease in recognition accuracy. Thus, the potential of our method is demonstrated.

Deep Neural Network 언어모델을 위한 Continuous Word Vector 기반의 입력 차원 감소 (Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model)

  • 김광호;이동현;임민규;김지환
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.3-8
    • /
    • 2015
  • In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google's Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-${\left|V\right|}$ coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).

다중 Stream 구조를 가지는 VQ를 이용하여 연산량을 개선한 CHMM에 관한 연구 (A Study of CHMM Reducing Computational Load Using VQ with Multiple Streams)

  • 방영규;정익주
    • 산업기술연구
    • /
    • 제26권B호
    • /
    • pp.233-242
    • /
    • 2006
  • Continuous, discrete and semi continuous HMM systems are used for the speech recognition. Discrete systems have the advantage of low run-time computation. However, vector quantization reduces accuracy and this can lead to poor performance. Continuous systems let us get good correctness but they need much calculation so that occasionally they are unable to be used for practice. Although there are semi-continuous systems which apply advantage of continuous and discrete systems, they also require much computation. In this paper, we proposed the way which reduces calculation for continuous systems. The proposed method has the same computational load as discrete systems but can give better recognition accuracy than discrete systems.

  • PDF

Discriminative Training of Stochastic Segment Model Based on HMM Segmentation for Continuous Speech Recognition

  • Chung, Yong-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권4E호
    • /
    • pp.21-27
    • /
    • 1996
  • In this paper, we propose a discriminative training algorithm for the stochastic segment model (SSM) in continuous speech recognition. As the SSM is usually trained by maximum likelihood estimation (MLE), a discriminative training algorithm is required to improve the recognition performance. Since the SSM does not assume the conditional independence of observation sequence as is done in hidden Markov models (HMMs), the search space for decoding an unknown input utterance is increased considerably. To reduce the computational complexity and starch space amount in an iterative training algorithm for discriminative SSMs, a hybrid architecture of SSMs and HMMs is programming using HMMs. Given the segment boundaries, the parameters of the SSM are discriminatively trained by the minimum error classification criterion based on a generalized probabilistic descent (GPD) method. With the discriminative training of the SSM, the word error rate is reduced by 17% compared with the MLE-trained SSM in speaker-independent continuous speech recognition.

  • PDF

연속 음성 인식 향상을 위해 LMS 알고리즘을 이용한 CHMM 모델링 (CHMM Modeling using LMS Algorithm for Continuous Speech Recognition Improvement)

  • 안찬식;오상엽
    • 디지털융복합연구
    • /
    • 제10권11호
    • /
    • pp.377-382
    • /
    • 2012
  • 본 논문은 반향 제거 평균 예측 LMS 알고리즘을 이용하여 반향 잡음에 강인한 연속 음성 인식 모델인 CHMM 모델을 구성하는 방법을 제안하였다. 변화하는 반향 잡음에 적응하고 연속 음성 인식 성능 향상을 위한 반향 잡음 제거 평균 예측 LMS 알고리즘을 이용하여 CHMM 모델을 구성하였다. 제안한 알고리즘에 의해 구성된 CHMM 모델에 대하여 연속 인식 성능을 평가하였다. 실험 결과 변화하는 환경 잡음을 제거하여 얻은 음성의 SNR은 평균 1.93dB이 향상되었고 연속 음성의 인식률은 2.1% 향상되었다.

품사 부착 말뭉치를 이용한 임베디드용 연속음성인식의 어휘 적용률 개선 (Vocabulary Coverage Improvement for Embedded Continuous Speech Recognition Using Part-of-Speech Tagged Corpus)

  • 임민규;김광호;김지환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.181-193
    • /
    • 2008
  • In this paper, we propose a vocabulary coverage improvement method for embedded continuous speech recognition (CSR) using a part-of-speech (POS) tagged corpus. We investigate 152 POS tags defined in Lancaster-Oslo-Bergen (LOB) corpus and word-POS tag pairs. We derive a new vocabulary through word addition. Words paired with some POS tags have to be included in vocabularies with any size, but the vocabulary inclusion of words paired with other POS tags varies based on the target size of vocabulary. The 152 POS tags are categorized according to whether the word addition is dependent of the size of the vocabulary. Using expert knowledge, we classify POS tags first, and then apply different ways of word addition based on the POS tags paired with the words. The performance of the proposed method is measured in terms of coverage and is compared with those of vocabularies with the same size (5,000 words) derived from frequency lists. The coverage of the proposed method is measured as 95.18% for the test short message service (SMS) text corpus, while those of the conventional vocabularies cover only 93.19% and 91.82% of words appeared in the same SMS text corpus.

  • PDF

통신망환경 한국어 공통음성 DB 구축 (Common Speech Database Collection for Telecommunications)

  • 김상훈;박문환;김현숙
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 5월 학술대회지
    • /
    • pp.23-26
    • /
    • 2003
  • This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.

  • PDF

공동 이용을 위한 음성 인식 및 합성용 음성코퍼스의 발성 목록 설계 (Design of Linguistic Contents of Speech Copora for Speech Recognition and Synthesis for Common Use)

  • 김연화;김형주;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.89-99
    • /
    • 2002
  • Recently, researches into ways of improving large vocabulary continuous speech recognition and speech synthesis are being carried out intensively as the field of speech information technology is progressing rapidly. In the field of speech recognition, developments of stochastic methods such as HMM require large amount of speech data for training, and also in the field of speech synthesis, recent practices show that synthesis of better quality can be produced by selecting and connecting only the variable size of speech data from the large amount of speech data. In this paper we design and discuss linguistic contents for speech copora for speech recognition and synthesis to be shared in common.

  • PDF