• Title/Summary/Keyword: Speech spectrum

Search Result 309, Processing Time 0.02 seconds

A Study on Word Recognition Using Neural-Fuzzy Pattern Matching (뉴럴-퍼지패턴매칭에 의한 단어인식에 관한 연구)

  • 이기영;최갑석
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.29B no.11
    • /
    • pp.130-137
    • /
    • 1992
  • This paper presents the word recognition method using a neural-fuzzy pattern matching, in order to make a proper speech pattern for a spectrum sequence and to improve a recognition rate. In this method, a frequency variation is reduced by generating binary spectrum patterns through associative memory using a neural network, and a time variation is decreased by measuring the simillarity using a fuzzy pattern matching. For this method using binary spectrum patterns and logic algebraic operations to measure the simillarity, memory capacity and computation requirements are far less than those of DTW using a conventional distortion measure. To show the validity of the recognition performance for this method, word recognition experiments are carried out using 28 DDD city names and compared with DTW and a fuzzy pattern matching. The results show that our presented method is more excellent in the recognition performance than the other methods.

  • PDF

A Study on Reduction of Computation Time through Adjustment the Frequency Interval Information in the G.723.1 Vocoder (G.723.1 보코더에서 주파수 간격 정보조절을 통한 계산량 감소에 관한 연구)

  • 민소연;김영규;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.405-408
    • /
    • 2002
  • LSP(Line Spectrum Pairs) Parameter is used for speech analysis in vocoders or recognizers since it has advantages of constant spectrum sensitivity. low spectrum distortion and easy linear interpolation. However the method of transforming LPC(Linear Predictive Coding) into LSP is so complex that it takes much time to compute. Among conventional methods, the real root method is considerably simpler than others, but nevertheless, it still suffers from its jndeterministic computation time because the root searching is processed sequentially in frequency region. We suggest a method of reducing the LSP transformation time using voice characteristics The proposed method is to apply search order and interval differently according to the distribution of LSP parameters. in comparison with the conventional real root method, the proposed method results in about 46.5% reduction. And, the total computation time is reduce to about 5% in the G.723.1 vocoder.

  • PDF

A Reduction Method of Computational Complexity through Adjustment the Non-Uniform Interval in the Vocoder (음성 부호화기에서 불균등 간격조절을 통한 계산량 단축법)

  • Jun, Woo-Jin
    • Proceedings of the KAIS Fall Conference
    • /
    • 2010.05a
    • /
    • pp.277-280
    • /
    • 2010
  • LSP(Line Spectrum Pairs) Parameter is used for speech analysis in vocoders or recognizers since it has advantages of constant spectrum sensitivity, low spectrum distortion and easy linear interpolation. However the method of transforming LPC(Linear Predictive Coding) into LSP is so complex that it takes much time to compute. Among conventional methods, the real root method is considerably simpler than others, but nevertheless, it still suffers from its indeterministic computation time because the root searching is processed sequentially in frequency region. We suggest a method of reducing the LSP transformation time using voice characteristics.

  • PDF

A Study on Stable Motion Control of Humanoid Robot with 24 Joints Based on Voice Command

  • Lee, Woo-Song;Kim, Min-Seong;Bae, Ho-Young;Jung, Yang-Keun;Jung, Young-Hwa;Shin, Gi-Soo;Park, In-Man;Han, Sung-Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.21 no.1
    • /
    • pp.17-27
    • /
    • 2018
  • We propose a new approach to control a biped robot motion based on iterative learning of voice command for the implementation of smart factory. The real-time processing of speech signal is very important for high-speed and precise automatic voice recognition technology. Recently, voice recognition is being used for intelligent robot control, artificial life, wireless communication and IoT application. In order to extract valuable information from the speech signal, make decisions on the process, and obtain results, the data needs to be manipulated and analyzed. Basic method used for extracting the features of the voice signal is to find the Mel frequency cepstral coefficients. Mel-frequency cepstral coefficients are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The reliability of voice command to control of the biped robot's motion is illustrated by computer simulation and experiment for biped walking robot with 24 joint.

The Technique of Spectrum Flattening by Algorithm for Minimized Harmonics Variance Value (Harmonic 분산값 최소화 알고리즘에 의한 주파수 영역 평탄화 기법)

  • Min, So-Yeon;Kim, Young-Kyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3558-3562
    • /
    • 2010
  • The exact fundamental frequency (pitch) extraction is important in speech signal processing. However the exact pitch extraction from speech signal is very difficult due to the effect of formant and transitional amplitude. So in this paper, the pitch is detected after flattening the spectrum in frequency region by proposed algorithm for minimized harmonics variance value. Experimental result showed the proposed method appeared an outstanding performance in compared with LPC, Cepstrum. Also, the results show the proposed method is better than conventional method.

Segmentation of continuous Korean Speech Based on Boundaries of Voiced and Unvoiced Sounds (유성음과 무성음의 경계를 이용한 연속 음성의 세그먼테이션)

  • Yu, Gang-Ju;Sin, Uk-Geun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2246-2253
    • /
    • 2000
  • In this paper, we show that one can enhance the performance of blind segmentation of phoneme boundaries by adopting the knowledge of Korean syllabic structure and the regions of voiced/unvoiced sounds. eh proposed method consists of three processes : the process to extract candidate phoneme boundaries, the process to detect boundaries of voiced/unvoiced sounds, and the process to select final phoneme boundaries. The candidate phoneme boudaries are extracted by clustering method based on similarity between two adjacent clusters. The employed similarity measure in this a process is the ratio of the probability density of adjacent clusters. To detect he boundaries of voiced/unvoiced sounds, we first compute the power density spectrum of speech signal in 0∼400 Hz frequency band. Then the points where this paper density spectrum variation is greater than the threshold are chosen as the boundaries of voiced/unvoiced sounds. The final phoneme boundaries consist of all the candidate phoneme boundaries in voiced region and limited number of candidate phoneme boundaries in unvoiced region. The experimental result showed about 40% decrease of insertion rate compared to the blind segmentation method we adopted.

  • PDF

Laryngeal height and voice characteristics in children with autism spectrum disorders (자폐스펙트럼장애 아동의 후두 높이 및 음성 특성)

  • Lee, Jung-Hun;Kim, Go-Woon;Kim, Seong-Tae
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.91-101
    • /
    • 2021
  • The purpose of this study was to investigate laryngeal characteristics in children with autism spectrum disorders (ASD). A total of 50 children participated, including eight children aged 2 to 4 years old diagnosed with ASD and 42 normal controls at the same age. All children recorded X-ray images of the midsagittal plane of the cervical spine and larynx, and compared the laryngeal positions of ASD and control. In addition, samples of children with vowel prolongation were collected and analyzed for acoustic parameters. X-rays showed that the height of the hyoid bone in the normal group was the lowest at 3 years of age, and ascended at 4 years of age. Nevertheless, the distance from the external acoustic meatus to the hyoid bone was longest at age 4. 4-year-olds with explosive language development showed laryngeal height elevation and anteriorization. In contrast, the hyoid height of the ASD group of all ages was lower than that of the control group, and there was no difference in the hyoid position between the ages. As a result of acoustic evaluation, PFR, vFo, and vAm were significantly higher ASD than control. Low laryngeal height of ASD children may be associated with delayed language development. PFR, vFo, and vAm seem to be voice markers showing the difference between normal and ASD children.

Frame Reliability Weighting for Robust Speech Recognition (프레임 신뢰도 가중에 의한 강인한 음성인식)

  • 조훈영;김락용;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.323-329
    • /
    • 2002
  • This paper proposes a frame reliability weighting method to compensate for a time-selective noise that occurs at random positions of speech signal contaminating certain parts of the speech signal. Speech frames have different degrees of reliability and the reliability is proportional to SNR (signal-to noise ratio). While it is feasible to estimate frame Sl? by using the noise information from non-speech interval under a stationary noisy situation, it is difficult to obtain noise spectrum for a time-selective noise. Therefore, we used statistical models of clean speech for the estimation of the frame reliability. The proposed MFR (model-based frame reliability) approximates frame SNR values using filterbank energy vectors that are obtained by the inverse transformation of input MFCC (mal-frequency cepstral coefficient) vectors and mean vectors of a reference model. Experiments on various burnt noises revealed that the proposed method could represent the frame reliability effectively. We could improve the recognition performance by using MFR values as weighting factors at the likelihood calculation step.

An Efficient Transcoding Algorithm For G.723.1 and EVRC Speech Coders (G.723.1 음성부호화기와 EVRC 음성부호화기의 상호 부호화 알고리듬)

  • 김경태;정성교;윤성완;박영철;윤대희;최용수;강태익
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.5C
    • /
    • pp.548-554
    • /
    • 2003
  • Interoperability is ole the most important factors for a successful integration of the speech network. To accomplish communication between endpoints employing different speech coders, decoder and encoder of each endpoint coder should be placed in tandem. However, tandem coder often produces problems such as poor speech quality, high computational load, and additional transmission delay. In this paper, we propose an efficient transcoding algorithm that can provide interoperability to the networks employing ITU-T G.723.1[1]and TIA IS-127 EVRC[2]speech coders. The proposed transcoding algorithm is composed of four parts: LSP conversion, open-loop pitch conversion, fast adaptive codebook search, and fast fixed codebook search. Subjective and objective quality evaluation confirmed that the speech quality produced by the proposed transcoding algorithm was equivalent to, or better than the tandem coding, while it had shorter processing delay and less computational complexity, which is certified implementing on TMS320C62x.

Artificial speech bandwidth extension technique based on opus codec using deep belief network (심층 신뢰 신경망을 이용한 오푸스 코덱 기반 인공 음성 대역 확장 기술)

  • Choi, Yoonsang;Li, Yaxing;Kang, Sangwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.1
    • /
    • pp.70-77
    • /
    • 2017
  • Bandwidth extension is a technique to improve speech quality, intelligibility and naturalness, extending from the 300 ~ 3,400 Hz narrowband speech to the 50 ~ 7,000 Hz wideband speech. In this paper, an Artificial Bandwidth Extension (ABE) module embedded in the Opus audio decoder is designed using the information of narrowband speech to reduce the computational complexity of LPC (Linear Prediction Coding) and LSF (Line Spectral Frequencies) analysis and the algorithm delay of the ABE module. We proposed a spectral envelope extension method using DBN (Deep Belief Network), one of deep learning techniques, and the proposed scheme produces better extended spectrum than the traditional codebook mapping method.