• 제목/요약/키워드: speech

검색결과 7,763건 처리시간 0.029초

한국인 표준 음성 DB 구축(II) (Developing a Korean standard speech DB (II))

  • 신지영;김경화
    • 말소리와 음성과학
    • /
    • 제9권2호
    • /
    • pp.9-22
    • /
    • 2017
  • The purpose of this paper is to report the whole process of developing Korean Standard Speech Database (KSS DB). This project is supported by SPO (Supreme Prosecutors' Office) research grant for three years from 2014 to 2016. KSS DB is designed to provide speech data for acoustic-phonetic and phonological studies and speaker recognition system. For the samples to represent the spoken Korean, sociolinguistic factors, such as region (9 regional dialects), age (5 age groups over 20) and gender (male and female) were considered. The goal of the project is to collect over 3,000 male and female speakers of nine regional dialects and five age groups employing direct and indirect methods. Speech samples of 3,191 speakers (2,829 speakers and 362 speakers using direct and indirect methods, respectively) are collected and databased. KSS DB designs to collect read and spontaneous speech samples from each speaker carrying out 5 speech tasks: three (pseudo-)spontaneous speech tasks (producing prolonged simple vowels, 28 blanked sentences and spontaneous talk) and two read speech tasks (reading 55 phonetically and phonologically rich sentences and reading three short passages). KSS DB includes a 16-bit, 44.1kHz speech waveform file and a orthographic file for each speech task.

A User friendly Remote Speech Input Unit in Spontaneous Speech Translation System

  • 이광석;김흥준;송진국;추연규
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2008년도 춘계종합학술대회 A
    • /
    • pp.784-788
    • /
    • 2008
  • In this research, we propose a remote speech input unit, a new method of user-friendly speech input in speech recognition system. We focused the user friendliness on hands-free and microphone independence in speech recognition applications. Our module adopts two algorithms, the automatic speech detection and speech enhancement based on the microphone array-based beamforming method. In the performance evaluation of speech detection, within-200msec accuracy with respect to the manually detected positions is about 97percent under the noise environments of 25dB of the SNR. The microphone array-based speech enhancement using the delay-and-sum beamforming algorithm shows about 6dB of maximum SNR gain over a single microphone and more than 12% of error reduction rate in speech recognition.

  • PDF

공동 이용을 위한 음성 인식 및 합성용 음성코퍼스의 발성 목록 설계 (Design of Linguistic Contents of Speech Copora for Speech Recognition and Synthesis for Common Use)

  • 김연화;김형주;김봉완;이용주
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.89-99
    • /
    • 2002
  • Recently, researches into ways of improving large vocabulary continuous speech recognition and speech synthesis are being carried out intensively as the field of speech information technology is progressing rapidly. In the field of speech recognition, developments of stochastic methods such as HMM require large amount of speech data for training, and also in the field of speech synthesis, recent practices show that synthesis of better quality can be produced by selecting and connecting only the variable size of speech data from the large amount of speech data. In this paper we design and discuss linguistic contents for speech copora for speech recognition and synthesis to be shared in common.

  • PDF

지적장애 아동의 롬바드 효과에 따른 말산출 특성 (The Lombard effect on the speech of children with intellectual disability)

  • 이현주;이지윤;김유경
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.115-122
    • /
    • 2016
  • This study investigates the acoustic-phonetic features and speech intelligibility of Lombard speech in children with intellectual disability, by examining the effect of Lombard speech at 3 levels of non-noise, 55dB, and 65dB. Eight children with intellectual disability read sentences and played speaking games, and their speech were analyzed in terms of intensity, pitch, vowel space of /a/, /i/, and /u/, VAI(3), articulation rate and speech intelligibility. Results showed, first, that intensity and pitch increased as noise level increased; second, that VAI(3) increased as the noise level increased; third, that articulation rate decreased as noise intensity increased; finally, that speech intelligibility increased as noise intensity increased. The Lombard speech changed the VAI(3), vowel space, articulation rate, speech intelligibility of the children with intellectual disability as well. This study suggests that the Lombard speech will be clinically useful for the persons who have intellectual disability and difficulties in self-control.

AURORA 잡음 처리 알고리즘을 이용한 전화망 환경에서의 강인한 음성 검출 (Robust Speech Detection Using the AURORA Front-End Noise Reduction Algorithm under Telephone Channel Environments)

  • 서영주;지미경;김회린
    • 대한음성학회지:말소리
    • /
    • 제48호
    • /
    • pp.155-173
    • /
    • 2003
  • This paper proposes a noise reduction-based speech detection method under telephone channel environments. We adopt the AURORA front-end noise reduction algorithm based on the two-stage mel-warped Wiener filter approach as a preprocessor for the frequency domain speech detector. The speech detector utilizes mel filter-bank based useful band energies as its feature parameters. The preprocessor firstly removes the adverse noise components on the incoming noisy speech signals and the speech detector at the next stage detects proper speech regions for the noise-reduced speech signals. Experimental results show that the proposed noise reduction-based speech detection method is very effective in improving not only the performance of the speech detector but also that of the subsequent speech recognizer.

  • PDF

Disfluencies and Speech Rates of Standard Korean Speakers in Story-telling and Reading Contexts

  • Shim, Hong-Im;Chon, Hee-Cheong;Ko, Do-Heung
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.45-51
    • /
    • 2005
  • The purpose of this study is to compare disfluencies and speech rates (overall speech rate and articulation rate) of normal adult speakers who use the standard Korean according to dissimilar speech tasks (story-telling and text-reading). Participants were 100 Korean adult speakers. The results are summarized as follows: First, the most frequent type of disfluency in the story-telling task was 'interjection', whereas that in the text-reading task was 'revision'. Second, the overall speech rates (syllables per second and syllables per minute) showed significant differences depending on the speech tasks. Third, the articulation rates (syllables per second and syllables per minute) showed significant differences depending on the speech tasks.

  • PDF

MPEG-4 TTS (Text-to-Speech)

  • 한민수
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1999년도 하계종합학술대회 논문집
    • /
    • pp.699-707
    • /
    • 1999
  • It cannot be argued that speech is the most natural interfacing tool between men and machines. In order to realize acceptable speech interfaces, highly advanced speech recognizers and synthesizers are inevitable. Text-to-Speech(TTS) technology has been attracting a lot of interest among speech engineers because of its own benefits. Namely, the possible application areas of talking computers, emergency alarming systems in speech, speech output devices fur speech-impaired, and so on. Hence, many researchers have made significant progresses in the speech synthesis techniques in the sense of their own languages and as a result, the quality of currently available speech synthesizers are believed to be acceptable to normal users. These are partly why the MPEG group had decided to include the TTS technology as one of its MPEG-4 functionalities. ETRI has made major contributions to the current MPEG-4 TTS among various MPEG-4 functionalities. They are; 1) use of original prosody for synthesized speech output, 2) trick mode functions fer general users without breaking synthesized speech prosody, 3) interoperability with Facial Animation(FA) tools, and 4) dubbing a moving/animated picture with lib-shape pattern information.

  • PDF

Successful and rapid response of speech bulb reduction program combined with speech therapy in velopharyngeal dysfunction: a case report

  • Shin, Yu-Jeong;Ko, Seung-O
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • 제37권
    • /
    • pp.22.1-22.4
    • /
    • 2015
  • Velopharyngeal dysfunction in cleft palate patients following the primary palate repair may result in nasal air emission, hypernasality, articulation disorder and poor intelligibility of speech. Among conservative treatment methods, speech aid prosthesis combined with speech therapy is widely used method. However because of its long time of treatment more than a year and low predictability, some clinicians prefer a surgical intervention. Thus, the purpose of this report was to increase an attention on the effectiveness of speech aid prosthesis by introducing a case that was successfully treated. In this clinical report, speech bulb reduction program with intensive speech therapy was applied for a patient with velopharyngeal dysfunction and it was rapidly treated by 5months which was unusually short period for speech aid therapy. Furthermore, advantages of pre-operative speech aid therapy were discussed.

치료 받은 말더듬 성인의 느린 구어에서 나타나는 휴지 특성 (Pauses Characteristics in Slowed Speech of Treated Stutterer)

  • 전희숙
    • 음성과학
    • /
    • 제15권4호
    • /
    • pp.189-197
    • /
    • 2008
  • In the process of speech therapy, fluency is acquired and speech rate increases in the process when the behavioral modification strategy, inducing speech fluency by making speech rate slower intentionally in an early stage, is applied. Therefore, the purpose of this study was to investigate the pause characteristics in slowed speech intentionally of treated stutterer. In this study, 10 developmental stutterers who had well established fluency in speech were involved. We had collected each 200 syllables sample of intentionally much slowed speech and a little slowed one in reading task. To measure the features of pause, total frequency of pauses, total durations of pauses, average duration of pauses and proportions of pause were investigated. The findings were as follows: Both the total durations and total frequency of pauses of much slowed speech were higher than that of a little slowed one. However, both the average duration and proportions of pauses of much slowed speech were not significantly higher than that of a little slowed one.

  • PDF

한국어 발음 교육을 위한 음성 DB 구축 방안 (Designing of Speech DB for Korean Pronunciation Education)

  • 정명숙
    • 대한음성학회지:말소리
    • /
    • 제47호
    • /
    • pp.51-72
    • /
    • 2003
  • The purpose of this paper is to design Speech Database for Korean pronunciation education. For this purpose, I investigated types of speech errors of Korean-learners, made texts for recording, which involves all types of speech errors, and showed how to gather speech data and how to tag their informations. It's natural that speech data should include Korean-learners' speech and Korean people's speech, because Speech DB that I try to develop is for teaching Korean pronunciation to foreigners. So this DB should have informations about speakers and phonetic informations, which are about phonetic value of segments and intonation of sentences. The intonation of sentence varies with the type of sentence, the structure of prosodic units, the length of a prosodic unit and so on. For this reason, Speech DB must involve tags about these informations.

  • PDF