• 제목/요약/키워드: synchronous speech

검색결과 23건 처리시간 0.021초

포만트 유사도 측정에 의한 PSOLA 음성 부호화에 관한 연구 (On a study on PSOLA coding technique based on the measurement of formant similarity)

  • 나덕수;이희원;김규홍;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 1998년도 하계종합학술대회논문집
    • /
    • pp.607-610
    • /
    • 1998
  • The major objectives of speech coding include high compression ratio for transmission in the band limited channel, high synthesized speech quality in terms of the intelligibility and the naturalness and fast processing speed. In general, speech coding methods are classified into the following three categories: the wavelform coding, the source coding and the hybird coding. In this paper, we proposed a new waveform coding method using PSOLA(pitch-synchronous overlap add) technique. First, we fixed one basic waveform per pitch and measured the formant similarity between basic and neighbor waveform. Second, if the similairy satisfied threshold values, we compress the neighbor waveform per pitch and then store or transmit. When the comparession is about 45%, we obtained about 4 in MOS.

  • PDF

Error Analysis of the Exponential RLS Algorithms Applied to Speech Signal Processing

  • Yoo, Kyung-Yul
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권3E호
    • /
    • pp.78-85
    • /
    • 1996
  • The set of admissible time-variations in the input signal can be separated into two categories : slow parameter changes and large parameter changes which occur infrequently. A common approach used in the tracking of slowly time-varying parameters is the exponential recursive least-squares(RLS) algorithm. There have been a variety of research works on the error analysis of the exponential RLS algorithm for the slowly time-varying parameters. In this paper, the focus has been given to the error analysis of exponential RLS algorithms for the input data with abrupt property changes. The voiced speech signal is chosen as the principal application. In order to analyze the error performance of the exponential RLS algorithm, deterministic properties of the exponential RLS algorithms is first analyzed for the case of abrupt parameter changes, the impulsive input(or error variance) synchronous to the abrupt change of parameter vectors actually enhances the convergence of the exponential RLS algorithm. The analysis has also been verified through simulations on the synthetic speech signal.

  • PDF

한국 성인 음성의 음도인식에 관한 연구 (A Study on Pitch Perception of Normal Korean)

  • 정옥란;김형순;김영태;서장수
    • 음성과학
    • /
    • 제1권
    • /
    • pp.315-323
    • /
    • 1997
  • This study attempts to determine the fundamental frequency level of male and female voices that Koreans perceive as normal. Seventy-three college students majoring in Speech Pathology participated in the study on a voluntary basis. The subjects listened to a male voice with fundamental frequency of 60 Hz, 80 Hz, 100 Hz, 120 Hz, 140 Hz, 160 Hz, 180 Hz, and 200 Hz, and a female voice with fundamental frequency of 140 Hz, 160 Hz, 180 Hz, 200 Hz, 220 Hz, 240 Hz, 260 Hz, and 280 Hz. The PSOLA (Pitch Synchronous Overlap). method and harmonic modeling method of speech signal were used to change pitch in the 20 Hz interval. The voices were presented in a random order to prevent listener bias. The results were as follows; Firstly, $46.6\%$ judged male voice with 120 Hz as normal, and $19.2\%$ judged 140 Hz as normal, and another $19.2\%$ judged 160 Hz as normal. Secondly, $50.7\%$ perceived female voice with 220 Hz as normal, and $32.9\%\;and\;30.1\%$ responded to 200 Hz and 240 Hz, respectively. The problems and recommendations for a future investigation are discussed.

  • PDF

A Review of Contemporary Teleaudiology: Literature Review, Technology, and Considerations for Practicing

  • Kim, Jinsook;Jeon, Seungik;Kim, Dokyun;Shin, Yerim
    • 대한청각학회지
    • /
    • 제25권1호
    • /
    • pp.1-7
    • /
    • 2021
  • The scope of teleaudiology has been noted with telehealth due to Coronavirus disease (COVID-19) recently. As the notion has been around us for more than 20 years ever since 1999, it is necessary to perceive the knowledge accurately and prepare for the successful implementation of it. Therefore, the literature review including screening and diagnostic audiometry, cochlear implants and hearing aids, and aural rehabilitation, telecommunications technology regarding several fields of teleaudiology, and considerations for practicing were identified. Although overall internet-based audiological services showed benefits in terms of outcome and accessibility, uncertainties of cost-effectiveness, the optimal level of support, and a need for further studies of many aspects for teleaudiology has arisen. In the view of technology, the store-and-forward (asynchronous/hybrid) and a real-time (synchronous) methods were introduced with one applied and nine registered patents recorded from 2004 to 2020 for the invention of teleaudiology in the United States. Also, 10 checklists were suggested for planning teleaudiology practice from prior experience in hosting the teleaudiology program. Conclusively, it is hoped that this review sheds light on recognizing and improving the existing teleaudiology services and helps overcome the challenges faced in the era of pandemic and untact world to come.

A Review of Contemporary Teleaudiology: Literature Review, Technology, and Considerations for Practicing

  • Kim, Jinsook;Jeon, Seungik;Kim, Dokyun;Shin, Yerim
    • Journal of Audiology & Otology
    • /
    • 제25권1호
    • /
    • pp.1-7
    • /
    • 2021
  • The scope of teleaudiology has been noted with telehealth due to Coronavirus disease (COVID-19) recently. As the notion has been around us for more than 20 years ever since 1999, it is necessary to perceive the knowledge accurately and prepare for the successful implementation of it. Therefore, the literature review including screening and diagnostic audiometry, cochlear implants and hearing aids, and aural rehabilitation, telecommunications technology regarding several fields of teleaudiology, and considerations for practicing were identified. Although overall internet-based audiological services showed benefits in terms of outcome and accessibility, uncertainties of cost-effectiveness, the optimal level of support, and a need for further studies of many aspects for teleaudiology has arisen. In the view of technology, the store-and-forward (asynchronous/hybrid) and a real-time (synchronous) methods were introduced with one applied and nine registered patents recorded from 2004 to 2020 for the invention of teleaudiology in the United States. Also, 10 checklists were suggested for planning teleaudiology practice from prior experience in hosting the teleaudiology program. Conclusively, it is hoped that this review sheds light on recognizing and improving the existing teleaudiology services and helps overcome the challenges faced in the era of pandemic and untact world to come.

Inter-speaker and intra-speaker variability on sound change in contemporary Korean

  • Kim, Mi-Ryoung
    • 말소리와 음성과학
    • /
    • 제9권3호
    • /
    • pp.25-32
    • /
    • 2017
  • Besides their effect on the f0 contour of the following vowel, Korean stops are undergoing a sound change in which a partial or complete consonantal merger on voice onset time (VOT) is taking place between aspirated and lax stops. Many previous studies on sound change have mainly focused on group-normative effects, that is, effects that are representative of the population as a whole. Few systematic quantitative studies of change in adult individuals have been carried out. The current study examines whether the sound change holds for individual speakers. It focuses on inter-speaker and intra-speaker variability on sound change in contemporary Korean. Speech data were collected for thirteen Seoul Korean speakers studying abroad in America. In order to minimize the possible effects of speech production, socio-phonetic factors such as age, gender, dialect, speech rate, and L2 exposure period were controlled when recruiting participants. The results showed that, for nine out of thirteen speakers, the consonantal merger is taking place between the aspirated and lax stop in terms of VOT. There were also intra-speaker variations on the merger in three aspects: First, is the consonantal (VOT) merger between the two stops is in progress or not? Second, are VOTs for aspirated stops getting shorter or not (i.e., the aspirated-shortening process)? Third, are VOTs for lax stops getting longer or not (i.e., the lax-lengthening process)? The results of remarkable inter-speaker and intra-speaker variability indicate a synchronous speech sound change of the stop system in contemporary Korean. Some speakers are early adopters or active propagators of sound change whereas others are not. Further study is necessary to see whether the inter-speaker differences exceed intra-speaker differences in sound change.

A Korean Flight Reservation System Using Continuous Speech Recognition

  • Choi, Jong-Ryong;Kim, Bum-Koog;Chung, Hyun-Yeol;Nakagawa, Seiichi
    • The Journal of the Acoustical Society of Korea
    • /
    • 제15권3E호
    • /
    • pp.60-65
    • /
    • 1996
  • This paper describes on the Korean continuous speech recognition system for flight reservation. It adopts a frame-synchronous One-Pass DP search algorithm driven by syntactic constraints of context free grammar(CFG). For recognition, 48 phoneme-like units(PLU) were defined and used as basic units for acoustic modeling of Korean. This modeling was conducted using a HMM technique, where each model has 4-states 3-continuous output probability distributions and 3-discrete-duration distributions. Language modeling by CFG was also applied to the task domain of flight reservation, which consisted of 346 words and 422 rewriting rules. In the tests, the sentence recognition rate of 62.6% was obtained after speaker adaptation.

  • PDF

4 kbps PSI-VSELP 음성 부호화 알고리듬 (A 4 kbps PSI-VSELP Speech Coding Algorithm)

  • 최용수;강홍구;박상욱;윤대희
    • 한국음향학회지
    • /
    • 제15권6호
    • /
    • pp.59-65
    • /
    • 1996
  • 본 논문에서는 기존의 4.8 kbps VSELP에 상응하는 음질을 갖는 4 kbps PSI-VSELP(Pitch Synchrononus Innovation-Vector Sum Excited Linear Prediction) 음성 부호화기를 제안한다. 'Half-rate'는 4kbps 내외에서 지역에 따라 다소 차이가 있으므로 기존의 half-rate 부호화기를 사용하기 위해서는 비트율을 감소시킬 필요가 생긴다. 이때, 비트율 감소에 따른 성능저하를 최소화하기 위해서는 전송 변수가 부호화기의 성능에 미치는 영향을 고려하여 비트 할당을 결정하는 것이 바람직하다. 본 논문에서는 이러한 접근 방식으로 비트율 감소 연구를 수행한 후, 제안된 부호화기에 대한 4 kbps 비트 할당을 결정한다. VSELP 부호화기의 음질 향상을 위해서, 성능에 가장 중요한 영향을 미치는 기저 벡터는 반복 폐회로 훈련 과정을 통해 최적화하며, PSI 기법을 VSELP 부호화기에 도입한다. 제안된 음성 부호화기의 성능을 평가하기 위해 배경 잡음과 채널 에러가 없는 환경에서 실험을 하였다. 실험 결과, 제안된 4 kbps PSI-VSELP는 4.8 kbps VSELP에 비해 객관적 음질은 낮았지만, 주과적 음질은 더 높게 나타났다.

  • PDF

Coordinative movement of articulators in bilabial stop /p/

  • Son, Minjung
    • 말소리와 음성과학
    • /
    • 제10권4호
    • /
    • pp.77-89
    • /
    • 2018
  • Speech articulators are coordinated for the purpose of segmental constriction in terms of a task. In particular, vertical jaw movements repeatedly contribute to consonantal as well as vocalic constriction. The current study explores vertical jaw movements in conjunction with bilabial constriction in bilabial stop /p/ in the context /a/-to-/a/. Revisiting kinematic data of /p/ collected using the electromagenetic midsagittal articulometer (EMMA) method from seven (four female and three male) speakers of Seoul Korean, we examined maximum vertical jaw position, its relative timing with respect to the upper and lower lips, and lip aperture minima. The results of those dependent variables are recapitulated in terms of linguistic (different word boundaries) and paralinguistic (different speech rates) factors as follows. Firstly, maximum jaw height was lower in the across-word boundary condition (across-word < within-word), but it did not differ as a function of different speech rates (comfortable = fast). Secondly, more reduction in the lip aperture (LA) gesture occurred in fast rate, while word-boundary effects were absent. Thirdly, jaw raising was still in progress after the lips' positional extrema were achieved in the within-word condition, while the former was completed before the latter in the across-word condition. Lastly, relative temporal lags between the jaw and the lips (UL and LL) were more synchronous in fast rate, compared to comfortable rate. When these results are considered together, it is possible to posit that speakers are not tolerant of lenition to the extent that it is potentially realized as a labial approximant in either word-boundary condition while jaw height still manifested lower jaw position in the across-word boundary condition. Early termination of vertical jaw maxima before vertical lower lip maxima across-word condition may be partly responsible for the spatial reduction of jaw raising movements. This may come about as a consequence of an excessive number of factors (e.g., upper lip height (UH), lower lip height (LH), jaw angle (JA)) for the representation of a vector with two degrees of freedom (x, y) engaged in a gesture-based task (e.g., lip aperture (LA)). In the task-dynamic application toolkit, the jaw angle parameter can be assigned numerical values for greater weight in the across-word boundary condition, which in turn gives rise to lower jaw position. Speech rate-dependent spatial reduction in lip aperture may be able to be resolved by means of manipulating activation time of an active tract variable in the gestural score level.

지속적으로 발성한 모음에 의한 화자인식 (Automatic Speaker Identification by Sustained Vowel Phonation)

  • 배건성
    • 한국음향학회지
    • /
    • 제11권1호
    • /
    • pp.35-41
    • /
    • 1992
  • 지속적으로 발성한 모음에 대해 각 화자의 특징을 나타내는 벡터양자화 코드북을 만들고 이를 이용해 화자를 인식하는 방법을 제안하고 실험하였다. 특히 벡터로는 모음 /이/로 부터 각각의 피치 주기에 대해 얻어진 선형예측계수를 사용하였으며, 코드북의 크기는 4가 적절함을 실험적으로 보였다. 인식실험에서, 학습에 사용된 데이타를 이용했을 경우에는 99.4%의 인식율을 보였으며, 학습에 사용되지 않은 50개의 피치 주기를 포함하는 음성신호로 부터는 89.4%의 인식율을 보였다.

  • PDF