Search | Korea Science

A study on real-time implementation of speech recognition and speech control system using dSPACE board (dSPACE 보드를 이용한 음성인식 명령처리시스템 실시간 구현에 관한 연구)

김재웅;정원용
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2000.12a
- /
- pp.173-176
- /
- 2000
음성은 인간이 가진 가장 편리한 제어전송수단으로 이를 통한 제어는 인간에게 많은 편리함을 제공할 것이다. 본 논문에서는 다층구조 신경망(Multi-Layer Perceptron)을 이용하여 간단한 음성인식 명령처리시스템을 Matlab 상에서 구성해 보았다. 음성인식을 통한 제어의 목적을 위해 화자종속, 고립단어인식기를 목표로 설정하여 연구를 수행하였다. 음성의 시작점과 끝점을 검출하기 위해 단구간 에너지와 영교차율(ZCR)을 이용하였고 인식기의 특징파라미터로는 12차 LPC켑스트럼 계수를 사용하였다. 그리고 신경망의 출력값을 기동, 정지시에 활성화되도록 3개의 계층으로 하였고, 신경망의 뉴런의 개수를 각각 12, 12, 2으로 설정하였다. 먼저 기준음성패턴으로 학습시킨 후에 Matlab 환경하에 동작하는 dSPACE 실시간처리보드에 변환된 C프로그램을 다운로드하고, 음성을 입력하여 인식 후 dSPACE보드의 D/A컨버터의 출력단에 연결된 DC모터를 기동, 정지제어를 수행하였다. 실시간 음성인식 명령처리 시스템 구현을 통하여 원격제어와 같은 음성명령을 통한 제어가 가능함을 확인할 수 있었다.
PDF

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
- Phonetics and Speech Sciences
- /
- v.5 no.1
- /
- pp.11-16
- /
- 2013
This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.
https://doi.org/10.13064/KSSS.2013.5.1.011 인용 PDF

Design and Implementation of Speech-Training System for Voice Disorders (발성장애아동을 위한 발성훈련시스템 설계 및 구현)

정은순;김봉완;양옥렬;이용주
- Journal of Internet Computing and Services
- /
- v.2 no.1
- /
- pp.97-106
- /
- 2001
In this paper, we design and implement complement based speech training system for voice disorder. The system consists of three level of training: precedent training, training for speech apprehension and training for speech enhancement. To analyze speech of voice disorder, we extracted speech features as loudness, amplitude, pitch using digital signal processing technique. Extracted features are converted to graphic interface for visual feedback of speech by the system.
PDF

Speech Feature Extraction Based on the Human Hearing Model

Chung, Kwang-Woo;Kim, Paul;Hong, Kwang-Seok
- Proceedings of the KSPS conference
- /
- 1996.10a
- /
- pp.435-447
- /
- 1996
In this paper, we propose the method that extracts the speech feature using the hearing model through signal processing techniques. The proposed method includes the following procedure ; normalization of the short-time speech block by its maximum value, multi-resolution analysis using the discrete wavelet transformation and re-synthesize using the discrete inverse wavelet transformation, differentiation after analysis and synthesis, full wave rectification and integration. In order to verify the performance of the proposed speech feature in the speech recognition task, korean digit recognition experiments were carried out using both the DTW and the VQ-HMM. The results showed that, in the case of using DTW, the recognition rates were 99.79% and 90.33% for speaker-dependent and speaker-independent task respectively and, in the case of using VQ-HMM, the rate were 96.5% and 81.5% respectively. And it indicates that the proposed speech feature has the potential for use as a simple and efficient feature for recognition task
PDF

Enhancement of Speech Using the Adaptive Signal Processing (적응신호처리를 이용한 음질 개선)

Shin, Yoon-Ki
- Speech Sciences
- /
- v.9 no.4
- /
- pp.275-287
- /
- 2002
In man-machine communication by speech under the noisy environment, the quality of speech may be degraded severely for the machine to recognize correctly. Especially when the corrupting noise occupies the same band as the speech, the conventional fixed filters cannot filter out the noise effectively. In recent, to resolve such a problem adaptive noise canceller (ANC) is frequently used, which is based upon adaptive filters. The Adaptive recursive filters perform better than adaptive nonrecursive filters due to the added poles, but the stability may be severely threatened. In this paper an ANC system employing the adaptive recursive filter is proposed to enhance the speech corrupted by noise. And the stability of the adaptive recursive filter is guaranteed by employing the adaptive compensator.
PDF

Robust Speech Segmentation Method in Noise Environment for Speech Recognizer (음성인식기 구현을 위한 잡음에 강인한 음성구간 검출기법)

김창근;박정원;권호민;허강인
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.2
- /
- pp.18-24
- /
- 2003
One of the most important subjects in the implementation of real time speech recognizer is to design both reliable VAD(Voice Activity Detection) and suitable speech feature vector. But, because it is difficult to calculate reliable VAD in the environment having surrounding noise, designed suitable speech feature vector may not be obtained. Solving this problem, in this paper, we implement not only short time power spectrum which is generally used but also two additive parameters, the comparison measure of spectrum density having robust property in noise and linear discriminant function using linear regression, then perform VAD by using the combination of each parameter having apt weight in other magnitudes of surrounding noise and confirm that proposed parameters show a robust characteristic in circumstances having surrounding noise by using DTW(Dynamic Time Waning) in recognition experiment.
PDF

Home Network Speech Interface Using VoiceXML (VoiceXML을 이용한 홈 네트워크 음성 인터페이스)

Roh, Yong-Wan;Kim, Dong-Gyu;Shin, Jeong-Hoon;Chung, Kwang-Woo;Hong, Kwang-Seok
- Journal of the Institute of Convergence Signal Processing
- /
- v.6 no.3
- /
- pp.127-133
- /
- 2005
In this paper, we propose speech interlace using VoiceXML in home network system Existing home network uses Bluetooth, IrDA, wireless LAN and Home RF but these was able to use a long distance such as outdoors or these was difficult to using method. The proposing VoiceXML speech interlace is supported with home network services more than other interface technology in a long distance also speech interlace controls home server using a wire and a wireless phone and is informed of problems to direct calling for user through VoiceXML server. In this paper, such speech interlace is able to use the aspect of home network and supports to practical remote gauge examination, remote control services. And on the basic of that, we evaluate efficiency of purposed method.
PDF

Preliminary study of Korean Electro-palatography (EPG) for Articulation Treatment of Persons with Communication Disorders (의사소통장애인의 조음치료를 위한 한국형 전자구개도의 구현)

Woo, Seong Tak;Park, Young Bin;Oh, Da Hee;Ha, Ji-wan
- Journal of Sensor Science and Technology
- /
- v.28 no.5
- /
- pp.299-304
- /
- 2019
Recently, the development of rehabilitation medical technology has resulted in an increased interest in speech therapy equipment. In particular, research on articulation therapy for communication disorders is being actively conducted. Existing methods for the diagnosis and treatment of speech disorders have many limitations, such as traditional tactile perception tests and methods based on empirical judgment of speech therapists. Moreover, the position and tension of the tongue are key factors of speech disorders with regards to articulation. This is a very important factor in the distinction of Korean characters such as lax, fortis, and aspirated consonants. In this study, we proposed a Korean electropalatography (EPG) system to easily measure and monitor the position and tension of the tongue in articulation treatment and diagnosis. In the proposed EPG system, a sensor was fabricated using an AgCl electrode and biocompatible silicon. Furthermore, the measured signal was analyzed by implementing the bio-signal processing module and monitoring program. In particular, the bio-signal was measured by inserting it into the palatal from an experimental control group. As a result, it was confirmed that it could be applied to clinical treatment in speech therapy.
https://doi.org/10.5369/JSST.2019.28.5.299 인용 PDF KSCI

Pitch Detection Using Variable LPF

Hong KEUM
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06a
- /
- pp.963-970
- /
- 1994
In speech signal processing, it is very important to detect the pitch exactly. The algorithms for pitch extraction that have been proposed until now are not enough to detect the fine pitch in speech signal. Thus we propose the new algorithm which takes advantage of the G-peak extraction. It is the method to find MZCI(maximum zer-crossing interval) which is defined as cut-off bandwidth rate of LPF (low pass filter)and detect the pitch period of the voiced signals. This algorithm performs robustly with a gross error rate of 3.63% even in 0 dB SNR environment. The gross error rate for clean speech is only 0.18%. Also it is able to process all course with speed.
PDF

A Study on Connected Digits Recognition Using the K-L Expansion (K-L 전개를 이용한 연속 숫자음 인식에 관한 연구)

김주곤;오세진;황철준;김범국;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.2 no.3
- /
- pp.24-31
- /
- 2001
The K-L expansion is a method for compressing dimensions of features and thus reduces computational cost in recognition process. Also This is well known that features can be extracted without much loss of information in the statistical pattern recognition. In this paper, the method that effectively applies K-L(Karhunen-Loeve) expansion to feature parameters of speech is proposed to improve the recognition accuracy of the Korean speech recognition system. The recognition performance of a novel feature parameters obtained by the proposed method(K-L coefficients) is compared with those of conventional Mel-cepstrum and regressive coefficients through speaker independent connected digits recognition experiments. Experimental results showed that average recognition rates using the K-L coefficients with regression coefficients obtained higher accuracy than conventional Mel-cepstrum with their regression coefficients.
PDF

Search Result 331, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)