• Title/Summary/Keyword: Speech rate

Search Result 1,246, Processing Time 0.022 seconds

Recognition of Noise Quantity by Neural Network using Linear Predictive Coefficient (선형예측계수를 사용한 신경회로망에 의한 잡음량의 인식)

  • Choi, Jae-Seung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.379-382
    • /
    • 2008
  • In order to reduce the noise quantity in a conversation under the noisy environment, it is necessary for the signal processing system to process adaptively according to the noise quantity in order to enhance the performance. There fore this paper presents a recognition method for noise quantity by linear predictive coefficient using a three layered neural network, which is trained using three kinds of speech that is degraded by various background noises. In the experiment, the average values of the recognition results were 97.6% or more for various noises using Aurora2 database.

  • PDF

Fast short length running FIR structure in discrete wavelet adaptive algorithm

  • Lee, Chae-Wook
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.13 no.1
    • /
    • pp.19-25
    • /
    • 2012
  • An adaptive system is a well-known method for removing noise from noise-corrupted speech. In this paper, we perform a least mean square (LMS) based on wavelet adaptive algorithm. It establishes the faster convergence rate of as compared to time domain because of eigenvalue distribution width. And this paper provides the basic tool required for the FIR algorithm whose algorithm reduces the arithmetic complexity. We consider a new fast short-length running FIR structure in discrete wavelet adaptive algorithm. We compare FIR algorithm and short-length fast running FIR algorithm (SFIR) to the proposed fast short-length running FIR algorithm(FSFIR) for arithmetic complexities.

Experimental Phonetic Study of the Syllable Duration of Korean with Respect to the Positional Effect

  • Lee Hyunbok;Seong Cheol-jae
    • MALSORI
    • /
    • no.31_32
    • /
    • pp.195-205
    • /
    • 1996
  • The aim of this paper is to describe the prosodic structure of Korean related to the syllable duration varying with its positional difference. An attempt is made in this study to analyze and describe the concrete correlation between the syllable lengthening and its position in the utterance at the initial and final positions. Using the syllable [na] at the final and initial position of a prosodic phrase in the Korean version of 'the North Wind and the Sun', it has found that the ratio of phrase final versus phrase initial syllable lengthening was approximately 1.8:1 for 4 subjects taking part in the test. In the case of nonsense data, we found that the ratio was approximately 1.6:1 for 2 out of 3 subjects. The results of this study might indicate that Korean tends to have a high rate of final lengthening. We can tentatively classify it, therefore, as a stress-timed language. Still, there is no denying that further studies should be done before we can be absolutely certain about the classification of languages along the dichotomy scale.

  • PDF

Ellis-van Creveld syndrome in an Indian child: a case report

  • Veena, K.M.;Jagadishchandra, H.;Rao, Prasanna Kumar;Chatra, Laxmikanth
    • Imaging Science in Dentistry
    • /
    • v.41 no.4
    • /
    • pp.167-170
    • /
    • 2011
  • Ellis-van Creveld syndrome is a rare congenital genetic disorder having autosomal recessive inheritance. It is a syndrome affecting the Amish population of Pennsylvania in USA with prevalence rate of 1/5,000 live at birth. In non-Amish population, the birth prevalence is 7/1,000,000. The syndrome is characterized by bilateral postaxial polydactyly of the hands, chondrodysplasia of long bones resulting in acromesomelic dwarfism, ectodermal dysplasia affecting nails as well as teeth and congenital heart malformation. There were very rare reports of this syndrome in dentistry. The present case focuses on the striking and constant oral findings of these patients, which are the main diagnostic features of this syndrome. Since the oral manifestations affect the esthetic, speech, and jaw growth of the child, the dentists have an important role to play in proper management of such case.

Isolated-Word Recognition Using Adaptively Partitioned Multisection Codebooks (음성적응(音聲適應) 구간분할(區間分割) 멀티섹션 코드북을 이용(利用)한 고립단어인식(孤立單語認識))

  • Ha, Kyeong-Min;Jo, Jeong-Ho;Hong, Jae-Kuen;Kim, Soo-Joong
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.10-13
    • /
    • 1988
  • An isolated-word recognition method using adaptively partitioned multisection codebooks is proposed. Each training utterance was divided into several sections according to its pattern extracted by labeling technique. For each pattern, reference codebooks were generated by clustering the training vectors of the same section. In recognition procedure, input speech was divided into the sections by the same method used in codebook generation procedure, and recognized to the reference word whose codebook represented the smallest average distortion. The proposed method was tested for 100 Korean words and attained recognition rate about 96 percent.

  • PDF

A Comparative Study of Recognition Rate According to the Variance of Speech Bandwidth (대역폭 변화에 따른 음성 인식률 비교연구)

  • Sohn, Il-Hyun;Doh, Sam-Joo;Koo, Myoung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 1992.10a
    • /
    • pp.193-199
    • /
    • 1992
  • 이 논문에서는 123개 단어의 한국어 음성에 대하여 음성의 대역폭 변화에 따른 인식률을 비교하였다. 인식률 비교실험을 위해 hidden Markov model과 음소와 유사한 131개의 한국어 subword 유니트를 사용한 화자독립 격리단어 인식 시스팀을 사용하였다. 이 실험은 대역폭이 각각 0 - 4.5kHz 및 0.3 - 3.3kHz인 두가지 종류의 음성 데이타베이스를 사용하였다. 훈련과정에서 corrective training의 반복회수를 2로 하고 state transition duration 정보를 사용하였을 때, 0 - 4.5kHz 와 0.3 - 3.3kHz 대역폭에 대해 각각 98.8 % 및 98.2 % 의 최고 인식률을 얻었다. 이로부터 전화대역폭에서도 음성인식률은 크게 저하되지 않음을 알 수 있다.

  • PDF

A Low Rate VQ Speech Coding Algorithm with Variable Transmission Frame Length (가변 전송 Frame 길이를 갖는 저 전송속도 VQ 음성부호화 알고리즘에 대한 연구)

  • 좌정우;이성로;이황수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.12 no.1E
    • /
    • pp.32-38
    • /
    • 1993
  • 본 논문에서는 저 전송속도의 음성 부호화기를 제안하였고 컴퓨터 시뮬레이션을 통하여 성능분석과 유연성을 입증하였다. 제안된 부호화 방식은 입력 음성신호의 Stationarity에 따라 전송 프레임의 길이를 가변하고, 전송 프레임의 대표적인 특징 벡터를 Vector Quatization으로 부호화하였다. 제안된 부호화 방식에서 특징 벡터열은 입력 음성신호를 샘플단위로 Prewindowed RLS Lattice 알고리즘을 통해 구한 PARCOR 계수로 구성된다. 입력 음성신호는 Subsegment로 분할되고, 각 Subsegment에서 대표적인 PARCOR 계수를 구한다. Likelihood Ratio Distortion Measure를 사용하여 유사도에 따라 Subsegment를 병합함으로써 전송프레임을 결정한다. 컴퓨터 시뮬레이션 결과로부터 제안된 VTEL 음성 부호화 방식은 좋은 음질을 유지하면서 전체 전송속도를 크게 줄일 수 있다.

  • PDF

Endpoint Detection of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 끝점검출)

  • 석종원;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.57-64
    • /
    • 1999
  • In this paper, we investigated the robust endpoint detection algorithm in noisy environment. A new feature parameter based on a discrete wavelet transform is proposed for word boundary detection of isolated utterances. The sum of standard deviation of wavelet coefficients in the third coarse and weighted first detailed scale is defined as a new feature parameter for endpoint detection. We then developed a new and robust endpoint detection algorithm using the feature found in the wavelet domain. For the performance evaluation, we evaluated the detection accuracy and the average recognition error rate due to endpoint detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions.

  • PDF

Design of Voice Activity Detection Algorithm for Variable Rate Speech Coders (가변전송률 음성부호화기 적용을 위한 음성활성도 측정 알고리즘 설계)

  • 김재원
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.9A
    • /
    • pp.1451-1458
    • /
    • 2001
  • 디지털 이동통신 시스템에서 가장 빈번하게 발생하는 음성 서비스의 궁극적인 목표는 양호한 음성 품질과 높은 주파수 효율의 제공에 있다. 음성은 묵음 구간에 의하여 구분되어진 짧고 간헐적인 음성 에너지의 반복으로 표현 가능하며 실제 음성 통화중 활성 음성이 존재하는 구간은 약 40%, 나머지 60% 구간은 묵음 또는 상대방의 음성을 듣는 구간이다. 이 묵음 구간을 효율적으로 활용함에 의해 시스템의 스펙트럼 이득을 얻을 수 있다. 본 논문에서는 디지털 이동통신 시스템과 같이 다양하게 변화하는 주변 잡음 환경에서도 강건하게 동작 가능하여 10msec 프레임 크기를 갖는 음성부호화기에 적용 가능한 음성 활성도 측정 방안을 설계하였다. 설계된 알고리즘은 음성에너지, 스펙트럼 분포, 영교차율, 그리고 LPC 잔여신호의 Peakiness 측정값을 이용하였다.

  • PDF

Text-Independent Speaker Verification Using Variational Gaussian Mixture Model

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • v.33 no.6
    • /
    • pp.914-923
    • /
    • 2011
  • This paper concerns robust and reliable speaker model training for text-independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text-independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%.