• Title/Summary/Keyword: Speech Corpus

Search Result 300, Processing Time 0.019 seconds

Some considerations on SiTEC segmental and prosodic labeling convention for Korean (음성 코퍼스 구축을 위한 SiTEC 분절음.운율 레이블링 기준의 검토 및 제안)

  • Lee Sook-Hyang;Shin Jiyoung;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.46
    • /
    • pp.127-143
    • /
    • 2003
  • This paper presents segmental labeling conventions proposed by SiTEC (Speech Information Technology Engineering Center) 2002 and proposes a new directions of a revision for a simpler version. The paper also reviews one of the prosody labelling conventions for Korean, K-ToBI convention(ver. 3.1) and proposes a couple of modifications and suggestions.

  • PDF

Prosodic characteristics of French language in conversational discourse (프랑스어의 대화 담화에 나타난 운율 연구)

  • Ko, Young-Lim;Yoon, Ae-Sun
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.165-180
    • /
    • 2001
  • In this paper prosodic characteristics of French language are analysed with a corpus of radio interview. Intonation patterns are interpreted in terms of raising pattern, focal raising pattern and falling pattern. Accentual prominence is classified in two types, rhythmic accent and focal accent. Focal accent permit to explain the cohesion in a utterance or between two utterances. As a prosodic variable of discourse pauses are described by their form of realization (filled pause, silent pause, hesitation etc), their distribution and their function in utterance.

  • PDF

Automatic Correction of Word-spacing Errors using by Syllable Bigram (음절 bigram를 이용한 띄어쓰기 오류의 자동 교정)

  • Kang, Seung-Shik
    • Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.83-90
    • /
    • 2001
  • We proposed a probabilistic approach of using syllable bigrams to the word-spacing problem. Syllable bigrams are extracted and the frequencies are calculated for the large corpus of 12 million words. Based on the syllable bigrams, we performed three experiments: (1) automatic word-spacing, (2) detection and correction of word-spacing errors for spelling checker, and (3) automatic insertion of a space at the end of line in the character recognition system. Experimental results show that the accuracy ratios are 97.7 percent, 82.1 percent, and 90.5%, respectively.

  • PDF

Decision-Tree-Based Markov Model for Phrase Break Prediction

  • Kim, Sang-Hun;Oh, Seung-Shin
    • ETRI Journal
    • /
    • v.29 no.4
    • /
    • pp.527-529
    • /
    • 2007
  • In this paper, a decision-tree-based Markov model for phrase break prediction is proposed. The model takes advantage of the non-homogeneous-features-based classification ability of decision tree and temporal break sequence modeling based on the Markov process. For this experiment, a text corpus tagged with parts-of-speech and three break strength levels is prepared and evaluated. The complex feature set, textual conditions, and prior knowledge are utilized; and chunking rules are applied to the search results. The proposed model shows an error reduction rate of about 11.6% compared to the conventional classification model.

  • PDF

Design of the Linguistic Contents of Speech Corpus for Speech Recognition and Synthesis (인식 및 합성용 음성 코퍼스의 발성 목록 설계)

  • 김형주;김봉완;이용주
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2002.05c
    • /
    • pp.330-335
    • /
    • 2002
  • 최근 컴퓨터와 인간간의 대화 수단으로 음성을 활용하는 기술인 음성정보기술이 발달함에 따라 대어휘 연속 음성 인식 및 무제한 어휘 음성 합성의 고도화를 위한 연구가 진행되고 있다. 음성 인식의 경우 HMM으로 대표되는 통계적 수법의 발달에 따라 시스템의 학습을 위해 대량의 음성데이터가 필요하며, 음성 합성의 경우에도 최근 대형의 음성 데이터 베이스로부터 임의 길이의 음성 부분을 골라내어 접속함으로써 좋은 합성 품질을 얻고 있다. 본 논문에서는 이러한 음성 인식 및 합성을 위해 공동으로 사용하기 위한 음성 데이터베이스의 발성 목록을 설계하고 설계된 결과에 대하여 논의한다.

  • PDF

Language Model Adaptation for Broadcast News Recognition (방송 뉴스 인식을 위한 언어 모델 적응)

  • Kim Hyun Suk;Jeon Hyung Bae;Kim Sanghun;Choi Joon Ki;Yun Seung
    • MALSORI
    • /
    • no.51
    • /
    • pp.99-115
    • /
    • 2004
  • In this parer, we propose LM adaptation for broadcast news recognition. We collect information of recent articles from the internet on real time, make a recent small size LM, and then interpolate recent LM with a existing LM composed of existing large broadcast news corpus. We performed interpolation experiments to get the best type of articles from recent corpus because collected recent corpus is composed of articles which are related with test set, and which are unrelated. When we made an adapted LM using recent LM with similar articles to test set through Tf-Idf method and existing LM, we got the best result that ERR of pseudo-morpheme based recognition performance has 17.2 % improvement and the number of OOV has reduction from 70 to 27.

  • PDF

Creation and Assessment of Korean Speech and Noise DB in Car Environments (자동차 환경에서의 노이즈 DB 및 한국어 음성 DB 구축)

  • Lee Kwang-Hyun;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.48
    • /
    • pp.141-153
    • /
    • 2003
  • Researches into robust recognition in noise environments, especially in car environments, are being carried out actively in speech community. In this paper we will report on three types of corpora that SiTEC (Speech Information TEchnology & industry promotion Center) has created for research into speech recognition in car noise environments. The first is the recordings of 900 Korean native speakers, distributed according to gender, age, and region, who uttered application words in car environments. The second is the collections of mixed noise in 3 car types by model while setting up various noise patterns which can be obtained with the car engine on or off, at different driving speed, and in different road conditions with windows open or closed. The third is the recordings of simulated speech by HATS (Head and Torso Simulator) in car environments with the internal and external noise factors added. These three types of recordings were all made through synchronized 8 channel microphones that are fixed in a car. The creation and applications of these corpora will be reported on in detail.

  • PDF

Decision of the Korean Speech Act using Feature Selection Method (자질 선택 기법을 이용한 한국어 화행 결정)

  • 김경선;서정연
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.278-284
    • /
    • 2003
  • Speech act is the speaker's intentions indicated through utterances. It is important for understanding natural language dialogues and generating responses. This paper proposes the method of two stage that increases the performance of the korean speech act decision. The first stage is to select features from the part of speech results in sentence and from the context that uses previous speech acts. We use x$^2$ statistics(CHI) for selecting features that have showed high performance in text categorization. The second stage is to determine speech act with selected features and Neural Network. The proposed method shows the possibility of automatic speech act decision using only POS results, makes good performance by using the higher informative features and speed up by decreasing the number of features. We tested the system using our proposed method in Korean dialogue corpus transcribed from recording in real fields, and this corpus consists of 10,285 utterances and 17 speech acts. We trained it with 8,349 utterances and have test it with 1,936 utterances, obtained the correct speech act for 1,709 utterances(88.3%). This result is about 8% higher accuracy than without selecting features.

A Study on the Design and the Construction of a Korean Speech DB for Common Use (공동이용을 위한 음성DB의 설계 및 구축에 관한 연구)

  • Kim, Bong-Wan;Kim, Jong-Jin;Kim, Sun-Tae;Lee, Yong-Ju
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.4
    • /
    • pp.35-41
    • /
    • 1997
  • Speech database is an indispensable part of speech research. Speech database is necessary to use in speech research and development processes, and to evaluate performances of various speech-processing systems. To use speech database for common purpose, it is necessary to design utterance list that has all the possible phonetical events in minimal number of words, and is independent of tasks. To meet those restrictions this paper extracts PBW set from large text corpus. Speech database that was constructed using PBW set for utterance list and its properties are described in this paper.

  • PDF

AM-FM Decomposition and Estimation of Instantaneous Frequency and Instantaneous Amplitude of Speech Signals for Natural Human-robot Interaction (자연스런 인간-로봇 상호작용을 위한 음성 신호의 AM-FM 성분 분해 및 순간 주파수와 순간 진폭의 추정에 관한 연구)

  • Lee, He-Young
    • Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.53-70
    • /
    • 2005
  • A Vowel of speech signals are multicomponent signals composed of AM-FM components whose instantaneous frequency and instantaneous amplitude are time-varying. The changes of emotion states cause the variation of the instantaneous frequencies and the instantaneous amplitudes of AM-FM components. Therefore, it is important to estimate exactly the instantaneous frequencies and the instantaneous amplitudes of AM-FM components for the extraction of key information representing emotion states and changes in speech signals. In tills paper, firstly a method decomposing speech signals into AM - FM components is addressed. Secondly, the fundamental frequency of vowel sound is estimated by the simple method based on the spectrogram. The estimate of the fundamental frequency is used for decomposing speech signals into AM-FM components. Thirdly, an estimation method is suggested for separation of the instantaneous frequencies and the instantaneous amplitudes of the decomposed AM - FM components, based on Hilbert transform and the demodulation property of the extended Fourier transform. The estimates of the instantaneous frequencies and the instantaneous amplitudes can be used for modification of the spectral distribution and smooth connection of two words in the speech synthesis systems based on a corpus.

  • PDF