Search | Korea Science

A Spoken Korean-Digits Recognition System Based on Linear Prdiction Spectra (선형예측에 의한 숫자음성 자동인식)

;安居院猛
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.17 no.3
- /
- pp.12-19
- /
- 1980
A speech recognition system for separately pronounced Korean digits is described. The system is composed of four stages ; parameter extraction, segmentation by voiced-unovied analysis, formant tracking and pattern matching. Digit speech is segmented into an unvoiced segment and/or a voiced one using ZCR and energy measurements, then to estimate the first three formant frequencies a relatively simple formant tracking scheme is applied to the raw formant data extracted from linear prediction spectra. Finally, pattern matching is made using dynamic programmig method. Recognition experiment is carried out for 150 digit utterences spoken by three male speakers, and recgnition rate 94 % is obtained.
PDF

The Study on Asymmetry between Acoustics and Perception of the Temporal Cues of English Plosives (영어파열음 시구간신호의 음향과 지각 비대칭성 연구)

Kang Seok-Han
- MALSORI
- /
- v.55
- /
- pp.15-31
- /
- 2005
This study tests the hypothesis that the voiced-voiceless distinction is influenced by the relationship between acoustics and perception. Production and perception tests are conducted with temporal cues in different environments(CV, VCV, VC). The result showed that acoustic cues indicating significant difference between voiceless/voiced plosives do not behave just as do in perception. The result also showed that there existed an asymmetry between acoustics and perception.
PDF

An Explicit Voiced Speech Classification by using the Fluctuation of Maximum Magitudes (최대진폭의 Fluctuation에 의한 유성음구간 Explicit 검출)

배명진
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1987.11a
- /
- pp.86-88
- /
- 1987
Accurate detection of the voicved segment in speech signals is important for robust pitch extraction. This paper describes an explicit detection algorithmfor detecting the voiced segment in speech signals. Thsi algoithm is based on the fluctuation properties of maximum magnitudes in each frame of speech signals. The performance of this detector is evaluated and compared to that obtained from manually classifying 150 recorded digit utterances.
PDF

Recognize vowel using self organizing map

Jang, Sung-Hwan;Lee, Ja-Yong;Kang, Hoon
- 제어로봇시스템학회:학술대회논문집
- /
- 2001.10a
- /
- pp.115.4-115
- /
- 2001
This paper deals with recognizing ten korean voiced vowels using Self Organizing Map. SOM is a good classifier. The output layer is composed of two dimensions. The input vector is the frequency values having the characteristic of voiced vowels. The short time frequency transform is used getting input vector. The final neural networks is attached SOM output layer.
PDF

Improving The Excitation Signal for Low-rate CELP Speech Coding (저전송속도 CELP 부호화기에서 여기신호의 개선)

권철홍
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.136-141
- /
- 1998
In order to enhance the performance of a CELP coder at low bit rates, it would be necessary to make the CELP excitation have the peaky pulse characteristic. In this paper we introduce an excitation signal with peaky pulse characteristic. It is obtained by using a two-tap pitch predictor. Samples of the signal have different gains according to their amplitudes by the predictor. In voiced sound the signal has the desirable peaky pulse characteristic, and its periodicity is well reproduced. Particularly, peaky pulses at voiced onset and a burst of plosive sound are clearly reconstructed.
PDF

Improved Excitation Modeling for Low-Rate CELP Speech Coding

Kwon, Chul-Hong
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.2E
- /
- pp.24-30
- /
- 1999
In this paper, we propose a weighting dependent mixed source model (WD-MSM) coder that is an improved version of a CELP-based mixed source model (C-MSM) coder. The coder classifies speech segments into three types : voiced, unvoiced and mixed. The excitation for a voiced frame is an adaptive source, and the excitation for an unvoiced frame is a stochastic source. The coder has a modified mixed source for a mixed frame. We apply different weighting functions for three classes. Simulation results show that the proposed coder at 4 kbits/s yields very good performance both subjectively and objectively.
PDF

Speech Transition Detection and approximate-synthesis Method for Speech Signal Compression and Recovery (음성신호 압축 및 복원을 위한 음성 천이구간 검출과 근사합성 방식)

Lee, Kwang-Seok;Kim, Bong-Gi;Kang, Seong-Soo;Kim, Hyun-Deok
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2008.05a
- /
- pp.763-767
- /
- 2008
In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. So, We proposed TS(Transition Segment) including unvoiced consonant searching and extraction method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This research present a new method of TS approximate-synthesis by using Least Mean Square and frequency band division. As a result, this method obtain a high quality approximation-synthesis waveforms within TS by using frequency information of 0.547kHz below and 2.813kHz above. The important thing is that the maximum error signal can be made with low distortion approximation-synthesis waveform within TS. This method has the capability of being applied to a new speech coding of Voiced/Silence/TS, speech analysis and speech synthesis.
PDF

L1-L2 Transfer in VOT and f0 Production by Korean English Learners: L1 Sound Change and L2 Stop Production

Kim, Mi-Ryoung
- Phonetics and Speech Sciences
- /
- v.4 no.3
- /
- pp.31-41
- /
- 2012
Recent studies have shown that the stop system of Korean is undergoing a sound change in terms of the two acoustic parameters, voice onset time (VOT) and fundamental frequency (f0). Because of a VOT merger of a consonantal opposition and onset-f0 interaction, the relative importance of the two parameters has been changing in Korean where f0 is a primary cue and VOT is a secondary cue in distinguishing lax from aspirated stops in speech production as well as perception. In English, however, VOT is a primary cue and f0 is a secondary cue in contrasting voiced and voiceless stops. This study examines how Korean English learners use the two acoustic parameters of L1 in producing L2 English stops and whether the sound change of acoustic parameters in L1 affects L2 speech production. The data were collected from six adult Korean English learners. Results show that Korean English learners use not only VOT but also f0 to contrast L2 voiced and voiceless stops. However, unlike VOT variations among speakers, the magnitude effect of onset consonants on f0 in L2 English was steady and robust, indicating that f0 also plays an important role in contrasting the [voice] contrast in L2 English. The results suggest that the important role of f0 in contrasting lax and aspirated stops in L1 Korean is transferred to the contrast of voiced and voiceless stops in L2 English. The results imply that, for Korean English learners, f0 rather than VOT will play an important perceptual cue in contrasting voiced and voiceless stops in L2 English.
https://doi.org/10.13064/KSSS.2012.4.3.031 인용 PDF

Speech Signal Compression and Recovery Using Transition Detection and Approximate-Synthesis (천이구간 추출 및 근사합성에 의한 음성신호 압축과 복원)

Lee, Kwang-Seok;Lee, Byeong-Ro
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.13 no.2
- /
- pp.413-418
- /
- 2009
In a speech coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech qualify in case coexist with a voiced and an unvoiced consonants in a frame. So, We proposed TS(Transition Segment) including unvoiced consonant searching and extraction method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This research present a new method of TS approximate-synthesis by using Least Mean Square and frequency band division. As a result, this method obtain a high qualify approximation-synthesis waveforms within TS by using frequency information of 0.547kHz below and 2.813kHz above. The important thing is that the maximum error signal can be made with low distortion approximation-synthesis waveform within TS. This method has the capability of being applied to a new speech coding of Voiced/Silence/TS, speech analysis and speech synthesis.
https://doi.org/10.6109/JKIICE.2009.13.2.413 인용 PDF KSCI

An Efficient Pitch Estimation for IMBE (Improved Multi-band Excitation) Speech Coder (개량형 다중대역 여기 (IMBE: Improved Multi-band Excitation) 음성 부호기의 피치 예측 개선)

Na, Hoon;Jeong, Dae-Gwon
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.34-41
- /
- 2001
In an IMBE (Improved Multi-band Excitation) speech coder, initial pitch estimation occupies most of the total computing time for the coder due to complex cost function and exhaustive search over candidate pitches. Future frames in initial pitch estimation cause inevitable time delay. Therefore, it is difficult to implement a real-time coder. Furthermore, unvoiced frames use the unnecessary pitch estimation as in the voiced frames. In this paper, each frame is determined voiced or unvoiced by Dyadic Wavelet Transform (DyWT) and, then, initial pitch estimation is performed only for voiced frame. Therefore different pitch estimation algorithms are employed between voiced and unvoiced frames incurring reduced time delay at transmitter and receiver. Simulation result show that the relative complexity of initial pitch estimation is reduced by 23％, and the processing time decreases down to 1/10 ∼ 1/1l of the IMBE coder while speech quality is almost maintained.
PDF

Search Result 282, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)