• Title/Summary/Keyword: reference speaker

Search Result 87, Processing Time 0.022 seconds

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

  • 이기승;김정수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.2
    • /
    • pp.135-144
    • /
    • 2003
  • A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.

A Study of Normal Nasalance and Velopharyngeal Port Activity in the Speech of Korean Adults (정상 성인의 비음도와 비인강 활성도에 관한 연구)

  • Leem Dae-Ho;Shin Hyo-Keun;Baek Jin-A.;Kim Hyun-Gi;Kwon Min-Su
    • Korean Journal of Cleft Lip And Palate
    • /
    • v.7 no.2
    • /
    • pp.123-132
    • /
    • 2004
  • The purpose of this study was to obtain normative nasalance scores for adult subjects speaking the Korean language. Additional objectives of the study were to determine if speaker sex played a role in differences in nasalance score and there was significantly correlation of nasalance score with nasalance slope score. The subjects include 75 healthy young Korean adults with normal oral and velopharyngeal resource and function. They had no history of speech problem, were judged as having normal speech and resonance at the time of testing, and had no upper respiratory tract infections or allergies at the time of testing. The Nasometer II 6400 was used to obtain nasalance scores and nasalance slope scores for /a/, /i/, /e/, /o/, /u/, /ja/, /je/, /wi/, /p'ap'i/ and /sasi/. The data of nasalance and nasalance slope were analyzed statistically. The mean nasalance score of the female was significantly higher than that of male at /a/, /i/, /wi/, /p'ap'i/ and /sasi/(p <0.10). The mean nasalance score of /i/ was highest and that of /o/ was the lowest. In this study, we could not and the relationship of the nasalance score and the closing slope score. However, there was negative correlation between the mean nasalance score and the opening slope score at ie/ and /;ai, positive to /sasi/. These normative nasalance scores for normal young adults speaking the Korean language provide important reference information for Korean cleft palate teams. In the future study of velopharygneal activity with the Nasometer, the opening slope score will be able to be the important parameter.

  • PDF

The f0 distribution of Korean speakers in a spontaneous speech corpus

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.31-37
    • /
    • 2021
  • The fundamental frequency, or f0, is an important acoustic measure in the prosody of human speech. The current study examined the f0 distribution of a corpus of spontaneous speech in order to provide normative data for Korean speakers. The corpus consists of 40 speakers talking freely about their daily activities and their personal views. Praat scripts were created to collect f0 values, and a majority of obvious errors were corrected manually by watching and listening to the f0 contour on a narrow-band spectrogram. Statistical analyses of the f0 distribution were conducted using R. The results showed that the f0 values of all the Korean speakers were right-skewed, with a pointy distribution. The speakers produced spontaneous speech within a frequency range of 274 Hz (from 65 Hz to 339 Hz), excluding statistical outliers. The mode of the total f0 data was 102 Hz. The female f0 range, with a bimodal distribution, appeared wider than that of the male group. Regression analyses based on age and f0 values yielded negligible R-squared values. As the mode of an individual speaker could be predicted from the median, either the median or mode could serve as a good reference for the individual f0 range. Finally, an analysis of the continuous f0 points of intonational phrases revealed that the initial and final segments of the phrases yielded several f0 measurement errors. From these results, we conclude that an examination of a spontaneous speech corpus can provide linguists with useful measures to generalize acoustic properties of f0 variability in a language by an individual or groups. Further studies would be desirable of the use of statistical measures to secure reliable f0 values of individual speakers.

Analysis of Podcast User Behaviors and Classification of Users (팟캐스트 콘텐츠 이용자 행태분석 및 유형 파악)

  • Kang, Minjeong
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.94-104
    • /
    • 2022
  • As the audio content market grows due to the spread of the AI speaker market and the influence of connected cars, the demand for podcast service is increasing. Therefore, in this study, the behaviors of podcast users were identified and the user types were classified. In the background study, podcast usage motives and user types were studied, and they were referred to when making the questionnaire. In the survey, preferred audio content was identified according to the situation, and in the in-depth interview, the user type and insights were derived by identifying the audio service usage behavior. As a result of the survey, there was little difference between preferred content for single listening and multitasking, but the difference in preferred content according to time period was statistically significant. The three user types derived from the in-depth interview were divided into users who listen alone for the purpose of study, find and listen to useful information quickly while on the go, and multitask and listen to the light and comfortable contents. It is expected that the results of this study will be an important reference for designing an audio content platform to improve user experience.

Non-aspectual Uses of the English Progressive

  • Lee, Seung-Ah
    • Journal of English Language & Literature
    • /
    • v.57 no.6
    • /
    • pp.1067-1088
    • /
    • 2011
  • While there is a high degree of convergence in linguistics in the treatment of the progressive as an aspect, the English progressive is unusually wide in its range of uses. This paper highlights the distinction between aspectual and non-aspectual progressives. The primary function of the progressive is to present a situation as ongoing, and this strictly aspectual use of the progressive is referred to as 'aspectual progressive'. On the other hand, the uses of the English progressive that are not, in a strict sense, aspectual is called 'non-aspectual progressive'. There are at least three basic uses of non-aspectual progressives. The first is the so-called progressive futurate (e.g., John is leaving tomorrow). In English, the present progressive can be used to express future time reference. This use of the progressive is regarded as a non-aspectual one, on the grounds that its meaning cannot be accounted for in terms of ongoingness. The second use is the habitual progressive (e.g., She's smoking a lot these days). Given that the habitual is an aspect, it is natural that the habitual progressive is not an aspectual progressive because one cannot view a situation in two different ways. In addition, ongoingness is not a defining property of the habitual progressive but is only a contingent or subsidiary property. The real essence of the habitual progressive is habituality. The third use of non-aspectual progressives is the experiential or interpretative progressive (e.g., You're imagining things), whose main characteristic is the subjectivity of the speaker's interpretation. The experiential or interpretative progressive does not serve a primarily aspectual function because the meaning of ongoingness has nothing to do with the content of the utterance.

Sound System Design and Characteristic Analysis based on Power Line Communication (전력선통신 기반 음향 시스템 설계 및 특성 분석)

  • Kim, Kwan-Kyu;Yeom, Keong-Tae;Kim, Kwan-Woong;Kim, Yong-Kab
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.6
    • /
    • pp.1-7
    • /
    • 2008
  • The paper is to solve the problem of existing sound system, which has difficulties of system organization and the increase of additional install cost and unfriendly interior. To solve the existing system, we drew the new sound system based on PLC and studied it. A transmitter and a receiver were designed using the PLC chip INT5500CS. Sound system was configured with a CD player that sound signals are sent from the transmitter and a speaker connected to the receiver. For analysis of characteristics of this system, a USBPre external sound card and Smaart Live 5 which is a PC-based sound measuring program were added. As a result of our experiment, the measured signal level is $2{\sim}3$[dB] lower than reference signal, latency is 16.69[ms] and the specific character of coherency is bad in high frequency band. Otherwise, this system transmits and receives signals over 90[%] in good condition as a result of measuring pink noise, frequency(1kHz), and phase, magnitude. In view of the result so far achieved, the system designed our team has excellent performance, it resolves defect of existing audio signal transmition system.

A Study on a Model Parameter Compensation Method for Noise-Robust Speech Recognition (잡음환경에서의 음성인식을 위한 모델 파라미터 변환 방식에 관한 연구)

  • Chang, Yuk-Hyeun;Chung, Yong-Joo;Park, Sung-Hyun;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.5
    • /
    • pp.112-121
    • /
    • 1997
  • In this paper, we study a model parameter compensation method for noise-robust speech recognition. We study model parameter compensation on a sentence by sentence and no other informations are used. Parallel model combination(PMC), well known as a model parameter compensation algorithm, is implemented and used for a reference of performance comparision. We also propose a modified PMC method which tunes model parameter with an association factor that controls average variability of gaussian mixtures and variability of single gaussian mixture per state for more robust modeling. We obtain a re-estimation solution of environmental variables based on the expectation-maximization(EM) algorithm in the cepstral domain. To evaluate the performance of the model compensation methods, we perform experiments on speaker-independent isolated word recognition. Noise sources used are white gaussian and driving car noise. To get corrupted speech we added noise to clean speech at various signal-to-noise ratio(SNR). We use noise mean and variance modeled by 3 frame noise data. Experimental result of the VTS approach is superior to other methods. The scheme of the zero order VTS approach is similar to the modified PMC method in adapting mean vector only. But, the recognition rate of the Zero order VTS approach is higher than PMC and modified PMC method based on log-normal approximation.

  • PDF