Search | Korea Science

Constructing a Noise-Robust Speech Recognition System using Acoustic and Visual Information (청각 및 시가 정보를 이용한 강인한 음성 인식 시스템의 구현)

Lee, Jong-Seok;Park, Cheol-Hoon
- Journal of Institute of Control, Robotics and Systems
- /
- v.13 no.8
- /
- pp.719-725
- /
- 2007
In this paper, we present an audio-visual speech recognition system for noise-robust human-computer interaction. Unlike usual speech recognition systems, our system utilizes the visual signal containing speakers' lip movements along with the acoustic signal to obtain robust speech recognition performance against environmental noise. The procedures of acoustic speech processing, visual speech processing, and audio-visual integration are described in detail. Experimental results demonstrate the constructed system significantly enhances the recognition performance in noisy circumstances compared to acoustic-only recognition by using the complementary nature of the two signals.
https://doi.org/10.5302/J.ICROS.2007.13.8.719 인용 PDF KSCI

On a detecting the transition segments of speech signal by energ approximatio degree of the synchronized pitch (피치 동기된 에너지 유사도에 의한 음성신호의 전이구간 검출)

김종득;박형빈;김대호;배명진
- Proceedings of the IEEK Conference
- /
- 1998.06a
- /
- pp.603-606
- /
- 1998
In a large number of words and the continued speech recognition system using a phoneme as teh recognition unit, it is necessary to segment processing. In this paper, a normalized AMDF new method. The suggested parameter represents a degree of sharpness at valley point. This method can detect the speech segment between the steady state and transient region to the continued speech without a prior information of speech signal.
PDF

A Study on a New Pre-emphasis Method Using the Short-Term Energy Difference of Speech Signal (음성 신호의 다구간 에너지 차를 이용한 새로운 프리엠퍼시스 방법에 관한 연구)

Kim, Dong-Jun;Kim, Ju-Lee
- The Transactions of the Korean Institute of Electrical Engineers D
- /
- v.50 no.12
- /
- pp.590-596
- /
- 2001
The pre-emphasis is an essential process for speech signal processing. Widely used two methods are the typical method using a fixed value near unity and te optimal method using the autocorrelation ratio of the signal. This study proposes a new pre-emphasis method using the short-term energy difference of speech signal, which can effectively compensate the glottal source characteristics and lip radiation characteristics. Using the proposed pre-emphasis, speech analysis, such as spectrum estimation, formant detection, is performed and the results are compared with those of the conventional two pre-emphasis methods. The speech analysis with 5 single vowels showed that the proposed method enhanced the spectral shapes and gave nearly constant formant frequencies and could escape the overlapping of adjacent two formants. comparison with FFT spectra had verified the above results and showed the accuracy of the proposed method. The computational complexity of the proposed method reduced to about 50% of the optimal method.
PDF

A Research on Speech Processing and Coding Strategy for Cochlear Implants (청각 장애인을 위한 음성 신호의 자극패턴 추출에 관한 연구)

Chae, D.;Byun, J.;Choi, D.;Baeck, S.;Park, S.
- Proceedings of the KOSOMBE Conference
- /
- v.1993 no.11
- /
- pp.175-179
- /
- 1993
A Study on the speech processing and coding strategy for cochlear implants have been developed to create a speech signal processing system which extracts stimulus parameter including formants, pitch, amplitude information. In this study we have presented the method which extracts characteristic information of speech signal and adapt patients with hearing handicap.
PDF

An Analysis Method of Strange Attractor for the Feature Extraction (음성 특징 추출을 위한 스트레인지 어트랙터의 분석 방법)

Kim, Tae-Sik
- Speech Sciences
- /
- v.9 no.2
- /
- pp.147-155
- /
- 2002
In the area of speech processing, raw signals used to be presented into 2D format. However, such kind of presentation methods have limitation to extract characteristics from the signal because of the presentation method. Generally, not much information can be detected from the 2D signal. Strange attractor in the field of chaos theory provides a 3D presentation method. In the area of recognition problem, signal presentation method is very important because good features can be detected from a good presentation. This paper discusses a new feature extraction method that extracts features from a cycle of the strange attractor. A neural network is used to check whether the method extracts suitable features or not. The result shows very good points that can be applied to some areas of signal processing.
PDF

A Study of Energy Parameter without Windowing Influence in Speech Signal (윈도우의 영향이 제거된 에너지 파라미터에 관한 연구)

조태수;신동성;배명진
- Proceedings of the IEEK Conference
- /
- 2001.06d
- /
- pp.277-280
- /
- 2001
The preprocessing is very important course in speech signal processing. It influence the compression-rate in speech coding and the recognition-rate in speech recognition etc. In this paper, we propose that minimizing window-influence method with pitch period and start points. The proposed method is available for voiced detection and word labeling.
PDF

The Pitch Detection Using Variable LPF (Variable LPF에 의한 피치검출)

백금란
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1993.06a
- /
- pp.88-92
- /
- 1993
In speech signal processing, it is necessary to detect exactly the pitch. The algorithms of pitch extraction which have been proposed until now are difficult to detect pitches over wide range speech signals. Thus we propose a new algorithm which uses the G-peak extraction to do it. It is the method that finds the most MZI(maximum zero-crossing interval) at each frame and convolve it with speech signal ; this is the same with passing speech signals to variable LPF. Finally we obtained the pitch, improve the accuracy of pitch detection and extract it with the high speed.
PDF

Performance Improvement of Speech Recognition Based on Independent Component Analysis (독립성분분석법을 이용한 음성인식기의 성능향상)

김창근;한학용;허강인
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2001.06a
- /
- pp.285-288
- /
- 2001
In this paper, we proposed new method of speech feature extraction using ICA(Independent Component Analysis) which minimized the dependency and correlation among speech signals on purpose to separate each component in the speech signal. ICA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. We verified improvement of speech recognition ability with training and recognition experiments when ICA compared with conventional mel-cepstrum features using HMM. Also, we can see that ICA dealt with the situation of recognition ability decline that is caused by environmental noise.
PDF

A Study on TSIUVC Approximate-Synthesis Method using Least Mean Square (최소 자승법을 이용한 TSIUVC 근사합성법에 관한 연구)

Lee, See-Woo
- The KIPS Transactions:PartB
- /
- v.9B no.2
- /
- pp.223-230
- /
- 2002
In a speech coding system using excitation source of voiced and unvoiced, it would be involves a distortion of speech waveform in case coexist with a voiced and an unvoiced consonants in a frame. This paper present a new method of TSIUVC (Transition Segment Including Unvoiced Consonant) approximate-synthesis by using Least Mean Square. The TSIUVC extraction is based on a zero crossing rate and IPP (Individual Pitch Pulses) extraction algorithm using residual signal of FIR-STREAK Digital Filter. As a result, This method obtain a high Quality approximation-synthesis waveform by using Least Mean Square. The important thing is that the frequency signals in a maximum error signal can be made with low distortion approximation-synthesis waveform. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and speech synthesis.
https://doi.org/10.3745/KIPSTB.2002.9B.2.223 인용 PDF KSCI

A Study on Speech Signal Processing of TSIUVC using Least Mean Square (LMS를 이용한 TSIUVC의 음성신호처리에 관한 연구)

Lee, See-Woo
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.7 no.6
- /
- pp.1175-1179
- /
- 2006
In a speech coding system using excitation source of voiced and unvoiced, it would be a distortion of speech waveform in case of exist a voiced and an unvoiced consonants in a frame. In this paper, I propose a new method of TSIUVC(Transition Segment Including Unvoiced Consonant) approximate-synthesis by using Least Mean Square. As a result, a method by using Least Mean Square was obtained a high quality approximation-synthesis waveform . The important thing is that the frequency signals in a maximum error signal can be made with low distortion approximation-synthesis waveform. This method has the capability of being applied to a new speech coding of Voiced/Silence/TSIUVC, speech analysis and synthesis.
PDF

Search Result 331, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)