• Title/Summary/Keyword: Korean standard speech corpus

Search Result 16, Processing Time 0.023 seconds

Developing a Korean Standard Speech DB (한국인 표준 음성 DB 구축)

  • Shin, Jiyoung;Jang, Hyejin;Kang, Younmin;Kim, Kyung-Wha
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.139-150
    • /
    • 2015
  • The data accumulated in this database will be used to develop a speaker identification system. This may also be applied towards, but not limited to, fields of phonetic studies, sociolinguistics, and language pathology. We plan to supplement the large-scale speech corpus next year, in terms of research methodology and content, to better answer the needs of diverse fields. The purpose of this study is to develop a speech corpus for standard Korean speech. For the samples to viably represent the state of spoken Korean, demographic factors were considered to modulate a balanced spread of age, gender, and dialects. Nine separate regional dialects were categorized, and five age groups were established from individuals in their 20s to 60s. A speech-sample collection protocol was developed for the purpose of this study where each speaker performs five tasks: two reading tasks, two semi-spontaneous speech tasks, and one spontaneous speech task. This particular configuration of sample data collection accommodates gathering of rich and well-balanced speech-samples across various speech types, and is expected to improve the utility of the speech corpus developed in this study. Samples from 639 individuals were collected using the protocol. Speech samples were collected also from other sources, for a combined total of samples from 1,012 individuals.

Corpus-based evaluation of French text normalization (코퍼스 기반 프랑스어 텍스트 정규화 평가)

  • Kim, Sunhee
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.31-39
    • /
    • 2018
  • This paper aims to present a taxonomy of non-standard words (NSW) for developing a French text normalization system and to propose a method for evaluating this system based on a corpus. The proposed taxonomy of French NSWs consists of 13 categories, including 2 types of letter-based categories and 9 types of number-based categories. In order to evaluate the text normalization system, a representative test set including NSWs from various text domains, such as news, literature, non-fiction, social-networking services (SNSs), and transcriptions, is constructed, and an evaluation equation is proposed reflecting the distribution of the NSW categories of the target domain to which the system is applied. The error rate of the test set is 1.64%, while the error rate of the whole corpus is 2.08%, reflecting the NSW distribution in the corpus. The results show that the literature and SNS domains are assessed as having higher error rates compared to the test set.

Monophthong Analysis on a Large-scale Speech Corpus of Read-Style Korean (한국어 대용량발화말뭉치의 단모음분석)

  • Yoon, Tae-Jin;Kang, Yoonjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.139-145
    • /
    • 2014
  • The paper describes methods of conducting vowel analysis from a large-scale corpus with the aids of forced alignment and optimal formant ceiling methods. 'Read Style Corpus of Standard Korean' is used for building the forced alignment system and a subset of the corpus for the processing and extraction of features for vowel analysis based on optimal formant ceiling. The results of the vowel analysis are reliable and comparable to the results obtained using traditional analytical methods. The findings indicate that the methods adopted for the analysis can be extended and be used for more fine-grained analysis without time-consuming manual labeling without losing accuracy and reliability.

Developing a Korean standard speech DB (II) (한국인 표준 음성 DB 구축(II))

  • Shin, Jiyoung;Kim, KyungWha
    • Phonetics and Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.9-22
    • /
    • 2017
  • The purpose of this paper is to report the whole process of developing Korean Standard Speech Database (KSS DB). This project is supported by SPO (Supreme Prosecutors' Office) research grant for three years from 2014 to 2016. KSS DB is designed to provide speech data for acoustic-phonetic and phonological studies and speaker recognition system. For the samples to represent the spoken Korean, sociolinguistic factors, such as region (9 regional dialects), age (5 age groups over 20) and gender (male and female) were considered. The goal of the project is to collect over 3,000 male and female speakers of nine regional dialects and five age groups employing direct and indirect methods. Speech samples of 3,191 speakers (2,829 speakers and 362 speakers using direct and indirect methods, respectively) are collected and databased. KSS DB designs to collect read and spontaneous speech samples from each speaker carrying out 5 speech tasks: three (pseudo-)spontaneous speech tasks (producing prolonged simple vowels, 28 blanked sentences and spontaneous talk) and two read speech tasks (reading 55 phonetically and phonologically rich sentences and reading three short passages). KSS DB includes a 16-bit, 44.1kHz speech waveform file and a orthographic file for each speech task.

An Intonation Study of Predicate ending in Current Korean - From final endings of ${\ulcorner}$-a/e, $t{\int}ijo$${\lrcorner}$ and ${\ulcorner}$p/simnida${\lrcorner}$ - (현대 서울말 평서문에 나타나는 억양 연구 - 어말어미 "-아/어, -지요" 와 "-ㅂ/습니다" 를 중심으로 -)

  • Yu, Ki-Won
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.3-7
    • /
    • 2005
  • This research is for finding prototypes and characteristics of intonation found in ${\ulcorner}$-a/e, $t{\int}ijo$<${\lrcorner}$ and ${\ulcorner}$p/simnida${\lrcorner}$ among modern Korean predicate statements by constructing spoken corpus based on the current radio broadcast. So the result of the study is as follows. : (1) The construction of the balanced spoken corpus and the standard for boundary determination of rhythm are needed for the intonation model of speech synthesis. (2) Korean intonation units have the splited word tone which includes the nuclear tone and the pre-nuclear tone makes unclear tone more detailed. (3) I made man and woman intonation models individually through t-test of SPSS. (4) The standard intonation model is devided '-ajo'type and '-nida'type

  • PDF

Emergency dispatching based on automatic speech recognition (음성인식 기반 응급상황관제)

  • Lee, Kyuwhan;Chung, Jio;Shin, Daejin;Chung, Minhwa;Kang, Kyunghee;Jang, Yunhee;Jang, Kyungho
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.31-39
    • /
    • 2016
  • In emergency dispatching at 119 Command & Dispatch Center, some inconsistencies between the 'standard emergency aid system' and 'dispatch protocol,' which are both mandatory to follow, cause inefficiency in the dispatcher's performance. If an emergency dispatch system uses automatic speech recognition (ASR) to process the dispatcher's protocol speech during the case registration, it instantly extracts and provides the required information specified in the 'standard emergency aid system,' making the rescue command more efficient. For this purpose, we have developed a Korean large vocabulary continuous speech recognition system for 400,000 words to be used for the emergency dispatch system. The 400,000 words include vocabulary from news, SNS, blogs and emergency rescue domains. Acoustic model is constructed by using 1,300 hours of telephone call (8 kHz) speech, whereas language model is constructed by using 13 GB text corpus. From the transcribed corpus of 6,600 real telephone calls, call logs with emergency rescue command class and identified major symptom are extracted in connection with the rescue activity log and National Emergency Department Information System (NEDIS). ASR is applied to emergency dispatcher's repetition utterances about the patient information. Based on the Levenshtein distance between the ASR result and the template information, the emergency patient information is extracted. Experimental results show that 9.15% Word Error Rate of the speech recognition performance and 95.8% of emergency response detection performance are obtained for the emergency dispatch system.

XML Based Meta-data Specification for Industrial Speech Databases (산업용 음성 DB를 위한 XML 기반 메타데이터)

  • Joo Young-Hee;Hong Ki-Hyung
    • MALSORI
    • /
    • v.55
    • /
    • pp.77-91
    • /
    • 2005
  • In this paper, we propose an XML based meta-data specification for industrial speech databases. Building speech databases is very time-consuming and expensive. Recently, by the government supports, huge amount of speech corpus has been collected as speech databases. However, the formats and meta-data for speech databases are different depending on the constructing institutions. In order to advance the reusability and portability of speech databases, a standard representation scheme should be adopted by all speech database construction institutions. ETRI proposed a XML based annotation scheme [51 for speech databases, but the scheme has too simple and flat modeling structure, and may cause duplicated information. In order to overcome such disadvantages in this previous scheme, we first define the speech database more formally and then identify object appearing in speech databases. We then design the data model for speech databases in an object-oriented way. Based on the designed data model, we develop the meta-data specification for industrial speech databases.

  • PDF

Standardization for Annotation Information Description of Speech Database (음성 DB 부가 정보 기술방안 표준화를 위한 제안)

  • Kim Sanghun;Lee Youngjik;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.109-120
    • /
    • 2003
  • This paper presents about the activities of speech database standardization in ETRI. Recently, with the support of government, ETRI and SiTEC have been gathering the large speech corpus for the domestic speech related companies. First, due to the lack of sharing the knowledge of speech database specification, the distributed speech database has a different format. Hence it seems to be needed to have the same format as soon as possible. ETRI and SiTEC are trying to find the better representation format of speech database. Second, we introduce a new description method of the annotation information of speech database. As one of the structured description method, XML based description will be applied to represent the metadata of the speech database. It will be continuously revised through the speech technology standard forum during this year.

  • PDF

Efficient Part-of-Speech Set for Knowledge-based Word Sense Disambiguation of Korean Nouns (한국어 명사의 지식기반 의미중의성 해소를 위한 효과적인 품사집합)

  • Kwak, Chul-Heon;Seo, Young-Hoon;Lee, Chung-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.4
    • /
    • pp.418-425
    • /
    • 2016
  • This paper presents the part-of-speech set which is highly efficient at knowledge-based word sense disambiguation for Korean nouns. 174,000 sentences extracted for test set from Sejong semantic tagged corpus whose sense is based on Standard korean dictionary. We disambiguate selected nouns in test set using glosses and examples in Standard Korean dictionary. 15 part-of-speeches which give the best performance for all test set and 17 part-of-speeches which give the best performance for accuracy average of selected nouns are selected. We obtain 12% more performance by those part-of-speech sets than by full 45 part-of-speech set.

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

  • Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2000.04a
    • /
    • pp.366-369
    • /
    • 2000
  • This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.

  • PDF