• Title/Summary/Keyword: speech source

Search Result 281, Processing Time 0.02 seconds

Korean speakers' perception and production of English word-final voiceless stop release (한국어 화자의 영어 어말 폐쇄음 파열의 인지와 발음 연구)

  • Lee Borim;Lee Sook-hyang;Park Cheon-Bae;Kang Seok-keun
    • MALSORI
    • /
    • no.38
    • /
    • pp.41-70
    • /
    • 1999
  • Researches on perception have, in recent years, been increasingly popular as a means of accounting for cross-linguistic sound patterns (Ohala, 1992; Hemming, 1995; Jun, 1995; Steriade, 1997 among others). In loanword phonology, Silverman(1990, 1992) argues that words from a source language are scanned through the perceptual level and that the features perceived by a speaker are stored in the input to be processed according to his/her native language's phonological constraints. The purpose of this paper is to test the validity of Silverman's proposal by examining the correlation between perception and production of Korean learners of English. We specifically focussed on perception and production of stop release by contrasting English loanwords with English words loarned through education to see if there were any significant differences. The results showed that there was no substantive correlation between the Korean speakers' perception of the loanwords pronounced by English speakers and their own production of those words. In the case of English words, however, the Korean speakers' production was closely related with their perception, although some inter-speaker variations were observed. With Optimality Theory (Prince & Smolenksy, 1993) as a theoretical framework of analysis, it was shown that the theory is a useful means of implementing a phonetics-phonology interface and relating perceptual processes with speech production. Specifically, under the assumption that loanwords with [t]~[t/sup h/] alternation (e.g.,'cut') are originally borrowed into Korean as two different input forms, all the alternations could be straightforwardly accounted for in terms of a unified ranking of constraints.

  • PDF

An efficient transcoding algorithm for AMR and G.723.1 speech coders and performance evaluation (AMR과 G.723.1 음성부호화기를 위한 효율적인 상호부호화 알고리듬 및 성능평가)

  • 최진규;윤성완;강홍구;윤대희
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.4
    • /
    • pp.121-130
    • /
    • 2004
  • In the application requiring the interoperability of different networks such as VoIP and wireless communication system, two speech codecs must work together with the structure of cascaded connection, tandem. Tandem has several problems such as long delay, high complexity and quality degradation due to twice complete encoding/decoding process. Transcoding is one of the best solutions to solve these problems. Transcoding algorithm is varied with the structure of source and target coder. In this paper, transcoding algorithm including the LSP conversion, the pitch estimation and new perceptual weighting filter for reducing complexity and improving qualify is proposed. These algorithms are applied to the pair of AMR md G.723.1. By employing the proposed algorithms in the transcoder, the complexity is reduced by about 20%-58% and quality is improved compared to tandem.

Enhancement of a language model using two separate corpora of distinct characteristics

  • Cho, Sehyeong;Chung, Tae-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.357-362
    • /
    • 2004
  • Language models are essential in predicting the next word in a spoken sentence, thereby enhancing the speech recognition accuracy, among other things. However, spoken language domains are too numerous, and therefore developers suffer from the lack of corpora with sufficient sizes. This paper proposes a method of combining two n-gram language models, one constructed from a very small corpus of the right domain of interest, the other constructed from a large but less adequate corpus, resulting in a significantly enhanced language model. This method is based on the observation that a small corpus from the right domain has high quality n-grams but has serious sparseness problem, while a large corpus from a different domain has more n-gram statistics but incorrectly biased. With our approach, two n-gram statistics are combined by extending the idea of Katz's backoff and therefore is called a dual-source backoff. We ran experiments with 3-gram language models constructed from newspaper corpora of several million to tens of million words together with models from smaller broadcast news corpora. The target domain was broadcast news. We obtained significant improvement (30%) by incorporating a small corpus around one thirtieth size of the newspaper corpus.

The Changes in the Closed Qutient of Trained Singers and Untrained Controls Under Varying Intensity at a Constant Vocal Pitch (음도 고정 시 강도 변화에 따른 일반인과 성악인 발성의 성대접촉률 변화 특성의 비교)

  • Kim, Han-Su;Jeon, Yong-Sun;Chung, Sung-Min;Cho, Kun-Kyung;Park, Eun-Hee
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.16 no.1
    • /
    • pp.28-32
    • /
    • 2005
  • Background and Objectives : The most important two factors of the voice production are the respiratory function which is the power source of voice and the glottic closure that transform the air flow into sound signals. The purpose of this study was to investigate the differences between trained singers and untrained controls under varying intensity at a constant vocal pitch by simulataneous using the airway interruption method and electroglottography(EGG). Materials and Methods : Under two different intensity condition at a constant vocal pitch(/G/), 20(Male 10, Female 10) trained singers were studied. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured with aerodynamic test using the Phonatory function analyzer. Closed quotients(CQ), jitter and shimmer were also investigated by electroglottography using Lx speech studio. These data were compared with that of normal controls. Results : MFR and Psub were increased on high intensity condition in all subject groups but there was no statistically significance. Statistically significant increasing of CQ. were observed in male trained singers on high intensity condition (untrained male : 51.31${\pm}$3.70%, trained male :55.52${\pm}$6.07%, p=.039). Shimmer percent, one of the phonatory stability parameters, was also decreased statistically in all subject groups(p<.001). Conclusion : The trained singers' phonation was more efficient than untrained singers. The result means that the trained singers can increase the loudness with little changing of mean flow rate, subglottic pressure but more increasing of glottic closed quotients.

  • PDF

A Study on the Audio Compensation System (음향 보상 시스템에 관한 연구)

  • Jeoung, Byung-Chul;Won, Chung-Sang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.509-517
    • /
    • 2013
  • In this paper, we researched a method that makes a good acoustic-speech system using a digital signal processing technique with dynamic microphone as a transducer. Good acoustic-speech system should deliver the original sound input to electric signal without distortion. By measuring the frequency response of the microphone, adjustment factors are obtained by comparing measured data and standard frequency response of microphone for each frequency band. The final sound levels are obtained using the developed adjustment factors of frequency responses from the microphone and speaker to match the original sound levels using the digital signal processing technique. Then, we minimize the changes in the frequency response and level due to the variation of the distance from source to microphone, where the frequency responses were measured according to the distance changes.

Part-Of-Speech Tagging using multiple sources of statistical data (이종의 통계정보를 이용한 품사 부착 기법)

  • Cho, Seh-Yeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.501-506
    • /
    • 2008
  • Statistical POS tagging is prone to error, because of the inherent limitations of statistical data, especially single source of data. Therefore it is widely agreed that the possibility of further enhancement lies in exploiting various knowledge sources. However these data sources are bound to be inconsistent to each other. This paper shows the possibility of using maximum entropy model to Korean language POS tagging. We use as the knowledge sources n-gram data and trigger pair data. We show how perplexity measure varies when two knowledge sources are combined using maximum entropy method. The experiment used a trigram model which produced 94.9% accuracy using Hidden Markov Model, and showed increase to 95.6% when combined with trigger pair data using Maximum Entropy method. This clearly shows possibility of further enhancement when various knowledge sources are developed and combined using ME method.

Adaptation Mode Controller for Adaptive Microphone Array System (마이크로폰 어레이를 위한 적응 모드 컨트롤러)

  • Jung Yang-Won;Kang Hong-Goo;Lee Chungyong;Hwang Youngsoo;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1573-1580
    • /
    • 2004
  • In this paper, an adaptation mode controller for adaptive microphone array system is proposed for high-quality speech acquisition in real environments. To ensure proper adaptation of the adaptive array algorithm, the proposed adaptation mode controller uses not only temporal information, but also spatial information. The proposed adaptation mode controller is constructed with two processing stages: an initialization stage and a running stage. In the initialization stage, a sound source localization technique is adopted, and a signal correlation characteristic is used in the running stage. For the adaptive may algorithm, a generalized sidelobe canceller with an adaptive blocking matrix is used. The proposed adaptation mode controller can be used even when the adaptive blocking matrix is not adapted, and is much stable than the power ratio method. The proposed algorithm is evaluated in real environment, and simulation results show 13dB SINR improvement with the speaker sitting 2m distance from the may.

스웨덴어 발음 교육상의 몇 가지 문제점 - 모음을 중심으로 -

  • Byeon Gwang-Su
    • MALSORI
    • /
    • no.4
    • /
    • pp.20-30
    • /
    • 1982
  • The aim of this paper is to analyse difficulties of the pronunciation in swedish vowels encountered by Koreans learners and to seek solutions in order to correct the possible errors. In the course of the analysis the swedish and Korean vowels in question are compared with the purpose of describing differences aha similarities between these two systems. This contrastive description is largely based on the students' articulatory speech level ana the writer's auditory , judgement . The following points are discussed : 1 ) Vowel length as a distinctive feature in Swedish compared with that of Korean. 2) A special attention is paid on the Swedish vowel [w:] that is characterized by its peculiar type of lip rounding. 3) The six pairs of Swedish vowels that are phonologically contrastive but difficult for Koreans to distinguish one from the other: [y:] ~ [w:], [i:] ~ [y:], [e:] ~ [${\phi}$:], [w;] ~ [u:] [w:] ~ [$\theta$], [$\theta$] ~ [u] 4) The r-colored vowel in the case of the postvocalic /r/ that is very common in American English is not allowed in English sound sequences. The r-colored vowel in the American English pattern has to be broken up and replaced hi-segmental vowel-consonant sequences . Korean accustomed to the American pronunciation are warned in this respect. For a more distinct articulation of the postvocalic /r/ trill [r] is preferred to fricative [z]. 5) The front vowels [e, $\varepsilon, {\;}{\phi}$) become opener variants (${\ae}, {\;}:{\ae}$] before / r / or supradentals. The results of the analysis show that difficulties of the pronunciation of the target language (Swedish) are mostly due to the interference from the Learner's source language (Korean). However, the Learner sometimes tends to get interference also from the other foreign language with which he or she is already familiar when he or she finds in that language more similarity to the target language than in his or her own mother tongue. Hence this foreign language (American English) in this case functions as a second language for Koreans in Learning Swedish.

  • PDF

A survey on noise generation and conversation interruption in cafes (카페 공간의 소음과 대화 방해에 대한 설문조사)

  • Jeong, Jeong-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.6
    • /
    • pp.660-670
    • /
    • 2021
  • As various people use the cafe for various purposes, it is difficult to hear conversations with the accompanying people due to the noise and background music of people around the respondents. In addition, there is a need for improvement related to the noise and sound inside the cafe, such as making it easier to hear the conversations of nearby users. 212 adult men and women participated in the questionnaire on the survey on cafe acoustics and noise conditions. As a result of the survey, about two-thirds of the respondents said that they did not prefer noisy cafes, and that the noise of cafes had a negative effect. The major source of noise in cafes is the sound of people around users, and more than 40 % of the respondents said that they could not hear well the sound of conversations with their accompanying people due to the sounds of those around them, or that they were concerned about their own conversations being transmitted to those around them. As a result of the survey on cafe sound and noise, it was found that improvements were needed to secure the voice privacy of cafe users as well as the voice intelligibility.

Investigation of acoustic performances of the creative convergence classrooms in elementary schools (초등학교 창의융합교실의 음향성능 조사)

  • A-Hyeon Jo;Chan-Hoon Haan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.4
    • /
    • pp.285-297
    • /
    • 2023
  • The present study aims to investigate the acoustic performance of the creative convergence classrooms in Korea used by elementary school students under the age of 9 introduced through the school space innovation project. In order to do this, acoustic performances of three creative convergence classrooms were measured. The measured acoustic parameters were background noise levels, Reverberation Time (RT), D50, Speech Transmission Index (STI), and Inter-Aural Cross Correlation (IACC). Also, acoustic parameters including Transmission Loss (TL) and standardized level difference (DnT) have been measured for the analysis of sound insulation performance of walls. In addition, the noise level was measured according to the opening conditions of doors and windows in the classroom. As a result, background noise level was measured at an average of 28.0 dB(A) to 32.8 dB(A) when the air conditioner was not operated, and the RT did not exceed 0.6 s. There were differences in IACC according to various desk layouts, and IACC values were high in the center line and the seats near the sound source. In particular, higher IACC was measured at the seats on the center line facing the source squarely. Regarding noise level in the classroom according to the opening conditions of doors and windows, the standards were exceeded when all windows, or windows and doors front onto the corridor were opened.