• 제목/요약/키워드: Speech Corpus

검색결과 300건 처리시간 0.022초

Praat과 R로 분석한 한국인 대화 음성 말뭉치의 fundamental frequency(f0)값 분포 (The fundamental frequency (f0) distribution of Korean speakers in a dialogue corpus using Praat and R)

  • 양병곤
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.17-25
    • /
    • 2023
  • 이 논문은 국립국어원에서 배포한 한국인 대화 음성 말뭉치에서 화자의 성대의 진동을 나타내는 fundamental frequency(f0)값을 측정해서 한국인이 일상 대화를 할 때 f0값의 기초적인 통계자료를 살펴보고, 나이와 f0값의 분포는 어떤 관계를 보이는지를 조사했다. 연구자료 수집과 분석은 Praat과 R을 이용했고, 개인별 억양구마다 상자도를 구하고 사분위값을 활용하여 극단값을 제거하는 방법으로 최종 f0값 자료를 구했다. 그 결과 전체 한국인들의 f0값의 평균값은 185 Hz이고 중앙값은 187 Hz로 나왔다. 자료의 분포모양을 나타내는 왜도는 0.11의 정적분포를 보였고, 첨도는 -0.09로 정상분포에 거의 가까운 모양을 보였다. 일상대화의 피치값의 변화범위로는 238 Hz로 나타났다. 남녀 간의 f0값의 차이는 남성의 중앙값 114 Hz의 거의 두 배에 해당하는 199 Hz가 여성의 중앙값으로 나타났고 t검증결과 유의미한 차이를 보였다. 분포모양을 나타내는 왜도는 남성이 1.24이었고, 여성은 그것의 반에 해당하는 0.58이었다. 첨도는 남녀집단 각각 5.21과 3.88로 나타나 남성의 값이 34% 정도 더 뾰족한 모양을 보였다. 연령대별로는 남녀집단을 합하여 볼 때, 나이가 들수록 f0값이 서서히 내려가는 경향을 보였다. 연령대별 f0중앙값과 나이 간의 회귀분석을 실행한 결과 기울기가 남성집단에서는 0.15, 여성집단에서는 -0.586으로 서로 반대되는 경향을 기록했다. 결론적으로, 대규모 참여자가 녹음한 대화 음성에서 한국인의 집단별 연령별 다양한 f0분포를 규명할 수 있지만, 나이와 f0관계는 더 정밀한 자료수집이 필요함을 알 수 있었다.

자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리 (Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation)

  • 박혜영;김형순
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.137-150
    • /
    • 2002
  • Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.

  • PDF

HMM 기반의 한국어 음성합성에서 음색변환에 관한 연구 (A Study on the Voice Conversion with HMM-based Korean Speech Synthesis)

  • 김일환;배건성
    • 대한음성학회지:말소리
    • /
    • 제68권
    • /
    • pp.65-74
    • /
    • 2008
  • A statistical parametric speech synthesis system based on the hidden Markov models (HMMs) has grown in popularity over the last few years, because it needs less memory and low computation complexity and is suitable for the embedded system in comparison with a corpus-based unit concatenation text-to-speech (TTS) system. It also has the advantage that voice characteristics of the synthetic speech can be modified easily by transforming HMM parameters appropriately. In this paper, we present experimental results of voice characteristics conversion using the HMM-based Korean speech synthesis system. The results have shown that conversion of voice characteristics could be achieved using a few sentences uttered by a target speaker. Synthetic speech generated from adapted models with only ten sentences was very close to that from the speaker dependent models trained using 646 sentences.

  • PDF

잡음 환경에서의 음성 감정 인식을 위한 특징 벡터 처리 (Feature Vector Processing for Speech Emotion Recognition in Noisy Environments)

  • 박정식;오영환
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.77-85
    • /
    • 2010
  • This paper proposes an efficient feature vector processing technique to guard the Speech Emotion Recognition (SER) system against a variety of noises. In the proposed approach, emotional feature vectors are extracted from speech processed by comb filtering. Then, these extracts are used in a robust model construction based on feature vector classification. We modify conventional comb filtering by using speech presence probability to minimize drawbacks due to incorrect pitch estimation under background noise conditions. The modified comb filtering can correctly enhance the harmonics, which is an important factor used in SER. Feature vector classification technique categorizes feature vectors into either discriminative vectors or non-discriminative vectors based on a log-likelihood criterion. This method can successfully select the discriminative vectors while preserving correct emotional characteristics. Thus, robust emotion models can be constructed by only using such discriminative vectors. On SER experiment using an emotional speech corpus contaminated by various noises, our approach exhibited superior performance to the baseline system.

  • PDF

한국어 음성합성기의 운율 예측을 위한 의사결정트리 모델에 관한 연구 (A Study of Decision Tree Modeling for Predicting the Prosody of Corpus-based Korean Text-To-Speech Synthesis)

  • 강선미;권오일
    • 음성과학
    • /
    • 제14권2호
    • /
    • pp.91-103
    • /
    • 2007
  • The purpose of this paper is to develop a model enabling to predict the prosody of Korean text-to-speech synthesis using the CART and SKES algorithms. CART prefers a prediction variable in many instances. Therefore, a partition method by F-Test was applied to CART which had reduced the number of instances by grouping phonemes. Furthermore, the quality of the text-to-speech synthesis was evaluated after applying the SKES algorithm to the same data size. For the evaluation, MOS tests were performed on 30 men and women in their twenties. Results showed that the synthesized speech was improved in a more clear and natural manner by applying the SKES algorithm.

  • PDF

한영 병렬 코퍼스 구축을 위한 하이브리드 기반 문장 자동 정렬 방법 (A Hybrid Sentence Alignment Method for Building a Korean-English Parallel Corpus)

  • 박정열;차정원
    • 대한음성학회지:말소리
    • /
    • 제68권
    • /
    • pp.95-114
    • /
    • 2008
  • The recent growing popularity of statistical methods in machine translation requires much more large parallel corpora. A Korean-English parallel corpus, however, is not yet enoughly available, little research on this subject is being conducted. In this paper we present a hybrid method of aligning sentences for Korean-English parallel corpora. We use bilingual news wire web pages, reading comprehension materials for English learners, computer-related technical documents and help files of localized software for building a Korean-English parallel corpus. Our hybrid method combines sentence-length based and word-correspondence based methods. We show the results of experimentation and evaluate them. Alignment results from using a full translation model are very encouraging, especially when we apply alignment results to an SMT system: 0.66% for BLEU score and 9.94% for NIST score improvement compared to the previous method.

  • PDF

한국어 품사 부착 말뭉치의 오류 검출 및 수정 (Detecting and correcting errors in Korean POS-tagged corpora)

  • 최명길;서형원;권홍석;김재훈
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제37권2호
    • /
    • pp.227-235
    • /
    • 2013
  • 품사 부착 말뭉치의 품질은 품사 부착기를 개발하는데 있어서 매우 중요한 역할을 수행한다. 그러나 세종 말뭉치를 비롯하여 한국에서 구축된 많은 품사 부착 말뭉치들은 여전히 다양한 형태의 오류를 포함하고 있다. 이런 오류들을 살펴보면 품사 부착 오류는 물론이고 철자 오류, 문자의 삽입 및 삭제 등 매우 다양하다. 본 논문에서는 오류 패턴을 이용하여 품사 부착 오류를 검출하고 이를 효과적으로 수정하는 도구를 개발한다. 제안된 방법과 도구를 이용해서 오류를 수정할 경우 평균 9배 이상 빠르게 오류를 수정할 수 있어서 이 방법이 매우 효과적인 방법임을 확인할 수 있었다.

PROSODY IN SPEECH TECHNOLOGY - National project and some of our related works -

  • Hirose Keikichi
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2002년도 하계학술발표대회 논문집 제21권 1호
    • /
    • pp.15-18
    • /
    • 2002
  • Prosodic features of speech are known to play an important role in the transmission of linguistic information in human conversation. Their roles in the transmission of para- and non- linguistic information are even much more. In spite of their importance in human conversation, from engineering viewpoint, research focuses are mainly placed on segmental features, and not so much on prosodic features. With the aim of promoting research works on prosody, a research project 'Prosody and Speech Processing' is now going on. A rough sketch of the project is first given in the paper. Then, the paper introduces several prosody-related research works, which are going on in our laboratory. They include, corpus-based fundamental frequency contour generation, speech rate control for dialogue-like speech synthesis, analysis of prosodic features of emotional speech, reply speech generation in spoken dialogue systems, and language modeling with prosodic boundaries.

  • PDF

대화체 억양구말 형태소의 경계성조 연구 (Boundary Tones of Intonational Phrase-Final Morphemes in Dialogues)

  • 한선희
    • 음성과학
    • /
    • 제7권4호
    • /
    • pp.219-234
    • /
    • 2000
  • The study of boundary tones in connected speech or dialogues is one of the most underdeveloped areas of Korean prosody. This. paper concerns the boundary tones of intonational phrase-final morphemes which are shown in the speech corpus of dialogues. Results of phonetic analysis show that different kinds of boundary tones are realized, depending on the positions of the intonational phrase-final morphemes in the sentences.. This study has also shown that boundary tone patterning is somewhat related to the sentence structure, and for better speech recognition and speech synthesis, it presents a simple model of boundary tones based on the fundamental frequency contour. The results of this study will contribute to our understanding of the prosodic pattern of Korean connected speech or dialogues.

  • PDF

Active Learning과 군집화를 이용한 고정키어구 추출 (Keyphrase Extraction Using Active Learning and Clustering)

  • 이현우;차정원
    • 대한음성학회지:말소리
    • /
    • 제66호
    • /
    • pp.87-103
    • /
    • 2008
  • We describe a new active learning method in conditional random fields (CRFs) framework for keyphrase extraction. To save elaboration in annotation, we use diversity and representative measure. We select high diversity training candidates by sentence confidence value. We also select high representative candidates by clustering the part-of-speech patterns of contexts. In the experiments using dialog corpus, our method achieves 86.80% and saves 88% training corpus compared with those of supervised method. From the results of experiment, we can see that the proposed method shows improved performance over the previous methods. Additionally, the proposed method can be applied to other applications easily since its implementation is independent on applications.

  • PDF