• 제목/요약/키워드: acoustic features

검색결과 322건 처리시간 0.023초

한국인의 영어피치악센트 발음에 관한 연구 (An Acoustic Study of the Pronunciation of English Pitch. Accents Uttered by Korean Speakers)

  • 구희산
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.223-236
    • /
    • 2003
  • The purpose of this experimental study is to investigate characteristics of English pitch accents uttered by Korean speakers. Six English sentences were uttered five times by fifteen male undergraduate and graduate students from three groups, Seoul, Yongnam and Honam dialect speakers. We compared the subjects' data with the data of a native speaker of English as model pronunciation of English pitch accents. Acoustic features(Fo, duration, amplitude) were measured from sound spectrograms made by the PC Works. Results showed that (1) acoustic features of English pitch accents are Fo and duration for the native speaker and Korean speakers altogether, (2) Seoul dialect speakers uttered English pitch accents more similarly to the English native speaker than the other dialect speakers and (3) Korean speakers generally have difficulties in pronouncing L* accents. It appears that Korean speakers have more problems in pronouncing L* accents than H* accents.

  • PDF

한국어 의문사 의문문과 예-아니오 의문문의 의미 구별에 관여하는 음향 자질 (Acoustic Features Determining the Comprehension of Wh and Yes-no Questions in Standard Korean)

  • 민광준
    • 음성과학
    • /
    • 제4권1호
    • /
    • pp.35-46
    • /
    • 1998
  • In this paper production and perception data were examined to discover what acoustic features are used in distinguishing wh-questions and yes/no-questions. Production data show that the two question types are distinguished by different accentual phrasing, pitch ranges in wh-phrases, and initial lenis stop voicing of the first syllable in verb phrases. Perception data by synthetic intonation show that the two question types are distinguished by the width of pitch ranges between the first and the second syllable in wh-phrases. Initial lenis stop voicing of the first syllable in verb phrases produces a strong effect on the perceptual discrimination of the two question types.

  • PDF

RNN을 이용한 Expressive Talking Head from Speech의 합성 (Synthesis of Expressive Talking Heads from Speech with Recurrent Neural Network)

  • 사쿠라이 류헤이;심바 타이키;야마조에 히로타케;이주호
    • 로봇학회논문지
    • /
    • 제13권1호
    • /
    • pp.16-25
    • /
    • 2018
  • The talking head (TH) indicates an utterance face animation generated based on text and voice input. In this paper, we propose the generation method of TH with facial expression and intonation by speech input only. The problem of generating TH from speech can be regarded as a regression problem from the acoustic feature sequence to the facial code sequence which is a low dimensional vector representation that can efficiently encode and decode a face image. This regression was modeled by bidirectional RNN and trained by using SAVEE database of the front utterance face animation database as training data. The proposed method is able to generate TH with facial expression and intonation TH by using acoustic features such as MFCC, dynamic elements of MFCC, energy, and F0. According to the experiments, the configuration of the BLSTM layer of the first and second layers of bidirectional RNN was able to predict the face code best. For the evaluation, a questionnaire survey was conducted for 62 persons who watched TH animations, generated by the proposed method and the previous method. As a result, 77% of the respondents answered that the proposed method generated TH, which matches well with the speech.

서울 방언과 대구 방언 파열음의 음향 특징 (Acoustic characteristics of Stops in Seoul and Daegu dialects)

  • 조민하;신지영
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2004년도 춘계 학술대회 발표논문집
    • /
    • pp.139-142
    • /
    • 2004
  • This study examines the acoustic characteristics of Korean stops of two dialect, Seoul and Daegu, 20 speakers of these two dialects were asked to read 15 words containing the stops of different places of articulation and phonation types at initial. The stops in the two dialects show mainly two acoustic differences. Firstly, There was a difference in distinctive features for phonetic types in the two dialects. Secondly, lenis revel fortis`s characters in Daegu dialect.

  • PDF

헥사플루오르프로펜 플라즈마박막을 이용한 표면탄성파발진기 습도센서 (surface acoustic wave oscillator hymidity sensor using hexafluoropropene plasma thin film)

  • 박남천;서은덕
    • 한국전기전자재료학회:학술대회논문집
    • /
    • 한국전기전자재료학회 1992년도 춘계학술대회 논문집
    • /
    • pp.144-146
    • /
    • 1992
  • Surface acoustic wave(SAW) oscillator offers many attractive features for application to vapor sensors. The perturbation of SAW velocity by the hexafluoropropence plasma polymer thin film has been studied for relative humidity sensing. adsorption of moisture produces rapid aid changes in the properties of the film, resulting in a change in the velocity of surface acoustic waves and, hence, in the frequency of one SAW oscillator. The device used in our experiments have 55 MHZ SAW oscillator fabricated on a LiNbO substrate.

  • PDF

소리체제에서 음향 자질[noise]: 한국어와 기타 언어들에서의 한 예증 (An acoustic feature [noise] in the sound pattern of Korean and other languages)

  • 이석재
    • 음성과학
    • /
    • 제6권
    • /
    • pp.103-117
    • /
    • 1999
  • This paper suggests that the onset-coda asymmetry found in languages like Korean and others should be dealt with in terms of one acoustic feature rather than other articulatory features, claiming that the acoustic feature involved here is [noise], i.e., 'aperiodic waveform energy'. It determines the structural well-formedness of the languages in question whether a coda ends in [noise] or not, regardless of the intensity, the frequency, and the time duration of the [noise]. Fricatives, affricates, aspirated stops, tense stops, and released stops are all disallowed in the coda position due to the acoustic feature [noise] they, commonly end with if they were, posited in the coda. The proposal implies that the three seemingly separate prohibitions of consonants in the coda position -- i) no fricatives/affricates, ii) no aspirated/tense stops, and iii) no released stops -- are directly correlated with each other. Incorporation of the one acoustic feature [noise] in the feature theory enables us to see that the aspects of onset-coda asymmetry are derived from one single source: ban, of [noise] in the coda.

  • PDF

따라 말하기 과제에서의 음향적 처리와 음운적 처리 (Acoustic and phonological processes in the repetition tasks)

  • 유세진;이경민
    • 한국인지과학회:학술대회논문집
    • /
    • 한국인지과학회 2010년도 춘계학술대회
    • /
    • pp.42-47
    • /
    • 2010
  • Speech shares acoustic features with other sound-based processing, which makes it difficult to distinguish phonological process from acoustic process in speech processing. In this study, we examined the difference between acoustic process and phonological process during repetition tasks. By contrasting various stimuli in different lengths, we localized neural correlates of acoustic process within bilateral superior temporal gyrus, which was consistent with the previous studies. The activated patterns were widely overlapped between words and pseudowords, i.e., contents-free. In contrast, phonological process showed left-lateralized activation in middle temporal gyrus located at anterior temporal areas. It implies that phonological process is contents-specific as shown in our previous study, and at the same time, more language-specific. Thus, we suggest that phonological process is distinguished from acoustic process in that it is always accompanied with the obligatory access to available phonological codes, which can be an entry of the mental lexicon.

  • PDF

Acoustic correlates of prosodic prominence in conversational speech of American English, as perceived by ordinary listeners

  • Mo, Yoon-Sook
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.19-26
    • /
    • 2011
  • Previous laboratory studies have shown that prosodic structures are encoded in the modulations of phonetic patterns of speech including suprasegmental as well as segmental features. Drawing on a prosodically annotated large-scale speech data from the Buckeye corpus of conversational speech of American English, the current study first evaluated the reliability of prosody annotation by a large number of ordinary listeners and later examined whether and how prosodic prominence influences the phonetic realization of multiple acoustic parameters in everyday conversational speech. The results showed that all the measures of acoustic parameters including pitch, loudness, duration, and spectral balance are increased when heard as prominent. These findings suggest that prosodic prominence enhances the phonetic characteristics of the acoustic parameters. The results also showed that the degree of phonetic enhancement vary depending on the types of the acoustic parameters. With respect to the formant structure, the findings from the present study more consistently support Sonority Expansion Hypothesis than Hyperarticulation Hypothesis, showing that the lexically stressed vowels are hyperarticulated only when hyperarticulation does not interfere with sonority expansion. Taken all into account, the present study showed that prosodic prominence modulates the phonetic realization of the acoustic parameters to the direction of the phonetic strengthening in everyday conversational speech and ordinary listeners are attentive to such phonetic variation associated with prosody in speech perception. However, the present study also showed that in everyday conversational speech there is no single dominant acoustic measure signaling prosodic prominence and listeners must attend to such small acoustic variation or integrate acoustic information from multiple acoustic parameters in prosody perception.

  • PDF

Class-Based Histogram Equalization for Robust Speech Recognition

  • Suh, Young-Joo;Kim, Hoi-Rin
    • ETRI Journal
    • /
    • 제28권4호
    • /
    • pp.502-505
    • /
    • 2006
  • A new class-based histogram equalization method is proposed for robust speech recognition. The proposed method aims at not only compensating the acoustic mismatch between training and test environments, but also at reducing the discrepancy between the phonetic distributions of training and test speech data. The algorithm utilizes multiple class-specific reference and test cumulative distribution functions, classifies the noisy test features into their corresponding classes, and equalizes the features by using their corresponding class-specific reference and test distributions. Experiments on the Aurora 2 database proved the effectiveness of the proposed method by reducing relative errors by 18.74%, 17.52%, and 23.45% over the conventional histogram equalization method and by 59.43%, 66.00%, and 50.50% over mel-cepstral-based features for test sets A, B, and C, respectively.

  • PDF

Acoustic Event Detection in Multichannel Audio Using Gated Recurrent Neural Networks with High-Resolution Spectral Features

  • Kim, Hyoung-Gook;Kim, Jin Young
    • ETRI Journal
    • /
    • 제39권6호
    • /
    • pp.832-840
    • /
    • 2017
  • Recently, deep recurrent neural networks have achieved great success in various machine learning tasks, and have also been applied for sound event detection. The detection of temporally overlapping sound events in realistic environments is much more challenging than in monophonic detection problems. In this paper, we present an approach to improve the accuracy of polyphonic sound event detection in multichannel audio based on gated recurrent neural networks in combination with auditory spectral features. In the proposed method, human hearing perception-based spatial and spectral-domain noise-reduced harmonic features are extracted from multichannel audio and used as high-resolution spectral inputs to train gated recurrent neural networks. This provides a fast and stable convergence rate compared to long short-term memory recurrent neural networks. Our evaluation reveals that the proposed method outperforms the conventional approaches.