• 제목/요약/키워드: Speech Processing

검색결과 961건 처리시간 0.024초

정상 음성의 목소리 특성의 정성적 분류와 음성 특징과의 상관관계 도출 (Qualitative Classification of Voice Quality of Normal Speech and Derivation of its Correlation with Speech Features)

  • 김정민;권철홍
    • 말소리와 음성과학
    • /
    • 제6권1호
    • /
    • pp.71-76
    • /
    • 2014
  • In this paper voice quality of normal speech is qualitatively classified by five components of breathy, creaky, rough, nasal, and thin/thick voice. To determine whether a correlation exists between a subjective measure of voice and an objective measure of voice, each voice is perceptually evaluated using the 1/2/3 scale by speech processing specialists and acoustically analyzed using speech analysis tools such as the Praat, MDVP, and VoiceSauce. The speech parameters include features related to speech source and vocal tract filter. Statistical analysis uses a two-independent-samples non-parametric test. Experimental results show that statistical analysis identified a significant correlation between the speech feature parameters and the components of voice quality.

음성기술을 이용한 십자말 게임 (Crossword Game Using Speech Technology)

  • 유일수;김동주;홍광석
    • 정보처리학회논문지B
    • /
    • 제10B권2호
    • /
    • pp.213-218
    • /
    • 2003
  • 본 논문에서는 음성으로 동작하는 십자말 게임을 구현하였다. 십자말 게임에 사용되는 문제의 배열은 본 논문에서 제안한 CAA(Cross Array Algorithm)에 의해 생성된다. CAA는 영역별 사전을 이용하여 십자말 배열을 매번 랜덤하게 자동으로 생성한다. CAA에 의한 배열 생성을 위해 본 논문에서는 7개 영역에 대한 사전을 구축하였다. 구현된 십자말 게임은 마우스나 키보드뿐만 아니라 음성으로 동작하도록 설계되었다. 음성에 의한 인터페이스는 음성인식 및 합성 기술이 사용되었으며, 사용자에게 보다 편리한 기능을 제공한다. CAA의 성능평가는 십자말 배열을 생성하는데 소요되는 연산시간의 측정과, 십자말 배열의 단어 생성율을 측정함으로써 수행되었다. CAA의 성능 평가 결과, 모든 창에 대하여 연산시간은 약 10ms 내외였으며, 단어 생성율은 약 50%를 보였다. 또한, 음성인식 실험 결과는 각 창의 크기가 "$7{\times}7$, "$9{\times}9$", "$11{\times}11$"일 때, 각각 98.5%, 97.6%, 96.2%의 인식률을 보였다., 97.6%, 96.2%의 인식률을 보였다.

이산 웨이브렛 변환을 이용한 유효 음성 추출에 관한 연구 (A Study on Extracting Valid Speech Sounds by the Discrete Wavelet Transform)

  • 김진옥;황대준;백한욱;정진현
    • 정보처리학회논문지B
    • /
    • 제9B권2호
    • /
    • pp.231-236
    • /
    • 2002
  • 유효한 무성음이 시스템 노이즈와 합성됐을 경우 유효한 무성음 추출에 많은 어려움이 있으나 본 논문에서는 유효한 무성음 추출에 있어 이산 웨이브렛 변환을 이용한 신호 해석 내용을 기반으로 주파수와 그 위치를 블록별로 머징 규칙으로 유효 여부를 결정하기 때문에 노이즈가 많은 환경에서도 유효한 무성음 추출이 가능하다. 머징 알고리즘은 음성만으로도 처리 매개변수를 결정할 수 있고 시스템 잡음에 대하여서도 독립적이기 때문에 유효한 음성을 추출하는데 매우 효과적이다. 실험 결과를 통하여 유효한 음성 추출 처리 과정에서 보다 향상된 결과를 보이고 있으며 특히 고주파 노이즈에 대한 강한 적응력을 제시하고 시스템 구현에도 용이한 시스템 튜닝을 가능케 한다.

Implementation of HMM-Based Speech Recognizer Using TMS320C6711 DSP

  • Bae Hyojoon;Jung Sungyun;Son Jongmok;Kwon Hongseok;Kim Siho;Bae Keunsung
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
    • /
    • pp.391-394
    • /
    • 2004
  • This paper focuses on the DSP implementation of an HMM-based speech recognizer that can handle several hundred words of vocabulary size as well as speaker independency. First, we develop an HMM-based speech recognition system on the PC that operates on the frame basis with parallel processing of feature extraction and Viterbi decoding to make the processing delay as small as possible. Many techniques such as linear discriminant analysis, state-based Gaussian selection, and phonetic tied mixture model are employed for reduction of computational burden and memory size. The system is then properly optimized and compiled on the TMS320C6711 DSP for real-time operation. The implemented system uses 486kbytes of memory for data and acoustic models, and 24.5kbytes for program code. Maximum required time of 29.2ms for processing a frame of 32ms of speech validates real-time operation of the implemented system.

  • PDF

라플라시안 피라미드 프로세싱과 백터 양자화 방법을 이용한 영상 데이타 압축 (Image Data Compression Using Laplacian Pyramid Processing and Vector Quantization)

  • 박광훈;차일환;윤대희
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1987년도 전기.전자공학 학술대회 논문집(II)
    • /
    • pp.1347-1351
    • /
    • 1987
  • This thesis aims at studying laplacian pyramid vector quantization which keeps a simple compression algorithm and stability against various kinds of image data. To this end, images are devied into two groups according to their statistical characteristics. At 0.860 bits/pixel and 0.360 bits/pixel respectively, laplacian pyramid vector quantization is compared to the existing spatial domain vector quantization and transform coding under the same condition in both objective and subjective value. The laplacian pyramid vector quantization is much more stable against the statistical characteristics of images than the existing vector quantization and transform coding.

  • PDF

연속음 처리를 위한 프랙탈 차원 방법 고찰 (Fractal Dimension Method for Connected-digit Recognition)

  • 김태식
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.45-55
    • /
    • 2003
  • Strange attractor can be used as a presentation method for signal processing. Fractal dimension is well known method that extract features from attractor. Even though the method provides powerful capabilities for speech processing, there is drawback which should be solved in advance. Normally, the size of the raw signal should be long enough for processing if we use the fractal dimension method. However, in the area of connected-digits problem, normally, syllable or semi-syllable based processing is applied. In this case, there is no evidence that we have sufficient data or not to extract characteristics of attractor. This paper discusses the relationship between the size of the signal data and the calculation result of fractal dimension, and also discusses the efficient way to be applied to connected-digit recognition.

  • PDF

Speech Processing System Using a Noise Reduction Neural Network Based on FFT Spectrums

  • Choi, Jae-Seung
    • Journal of information and communication convergence engineering
    • /
    • 제10권2호
    • /
    • pp.162-167
    • /
    • 2012
  • This paper proposes a speech processing system based on a model of the human auditory system and a noise reduction neural network with fast Fourier transform (FFT) amplitude and phase spectrums for noise reduction under background noise environments. The proposed system reduces noise signals by using the proposed neural network based on FFT amplitude spectrums and phase spectrums, then implements auditory processing frame by frame after detecting voiced and transitional sections for each frame. The results of the proposed system are compared with the results of a conventional spectral subtraction method and minimum mean-square error log-spectral amplitude estimator at different noise levels. The effectiveness of the proposed system is experimentally confirmed based on measuring the signal-to-noise ratio (SNR). In this experiment, the maximal improvement in the output SNR values with the proposed method is approximately 11.5 dB better for car noise, and 11.0 dB better for street noise, when compared with a conventional spectral subtraction method.

보청기를 위한 배경 잡음 제거 기법의 성능 평가 (Performance Evaluation of Environmental Noise Reduction Techniques or Hearing Aids)

  • 박선준;도원;신승우;윤대희;김동욱;박영철
    • 대한의용생체공학회:학술대회논문집
    • /
    • 대한의용생체공학회 1997년도 추계학술대회
    • /
    • pp.83-86
    • /
    • 1997
  • To provide ameliorated aided environment to hearing impaired listeners, background noise reduction techniques are investigated as a front-end of conventional hearing aids, and their effects are tested in a subjective manner. Several speech enhancement schemes were implemented and preference tests or normal listeners are performed to select the best possible scheme or hearing impaired listeners. Results indicated that SDT scores without the speech enhancement scheme drop more sharply as SNR decreases than those with the speech enhancement techniques. SDT scores obtained or hearing impaired listeners with hearing aids showed large variability. However, all impaired listeners preferred noise suppressed sounds to unsuppressed ones.

  • PDF

Research on Noise Reduction Algorithm Based on Combination of LMS Filter and Spectral Subtraction

  • Cao, Danyang;Chen, Zhixin;Gao, Xue
    • Journal of Information Processing Systems
    • /
    • 제15권4호
    • /
    • pp.748-764
    • /
    • 2019
  • In order to deal with the filtering delay problem of least mean square adaptive filter noise reduction algorithm and music noise problem of spectral subtraction algorithm during the speech signal processing, we combine these two algorithms and propose one novel noise reduction method, showing a strong performance on par or even better than state of the art methods. We first use the least mean square algorithm to reduce the average intensity of noise, and then add spectral subtraction algorithm to reduce remaining noise again. Experiments prove that using the spectral subtraction again after the least mean square adaptive filter algorithm overcomes shortcomings which come from the former two algorithms. Also the novel method increases the signal-to-noise ratio of original speech data and improves the final noise reduction performance.

에코제거기와 MAP 추정에 기초한 핸즈프리 음성 인식 (Hands-free Speech Recognition based on Echo Canceller and MAP Estimation)

  • Sung-ill Kim;Wee-jae Shin
    • 융합신호처리학회논문지
    • /
    • 제4권3호
    • /
    • pp.15-20
    • /
    • 2003
  • 핸즈프리 마이크를 이용한 원격회의나 원격 통신 시스템과 같은 몇 가지의 응용분야에서, 음성 신호는 주위 잡음뿐만 아니라 마이크와 스피커사이의 결합에 의해 발생하는 에코에 의해서 왜곡되기 쉽다. 게다가 채널 왜곡이나 부가적인 잡음을 포함한 환경 잡음들은 원래의 입력 음성신호에 영향을 미치리라 고려된다. 본 논문에서는, 이러한 핸즈프리 음성에 있어서의 음성 인식률을 향상시키기 위해 에코 제거기와 최대 사후 추정(MAP)을 이용한 새로운 접근방식을 소개한다. 이 접근방식에서, 제안된 시스템이 에코를 포함한 주위 잡음 환경에서의 핸즈프리 음성인식에 효과적이라는 것을 보여준다 또한, 실험 결과는 에코 제거기와 MAP 환경적응 기술의 결합 시스템이 에코와 잡음 환경에 잘 적응하는 것을 보여준다.

  • PDF