Search | Korea Science

On Wavelet Transform Based Feature Extraction for Speech Recognition Application

Kim, Jae-Gil
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.2E
- /
- pp.31-37
- /
- 1998
This paper proposes a feature extraction method using wavelet transform for speech recognition. Speech recognition system generally carries out the recognition task based on speech features which are usually obtained via time-frequency representations such as Short-Time Fourier Transform (STFT) and Linear Predictive Coding(LPC). In some respects these methods may not be suitable for representing highly complex speech characteristics. They map the speech features with same may not frequency resolutions at all frequencies. Wavelet transform overcomes some of these limitations. Wavelet transform captures signal with fine time resolutions at high frequencies and fine frequency resolutions at low frequencies, which may present a significant advantage when analyzing highly localized speech events. Based on this motivation, this paper investigates the effectiveness of wavelet transform for feature extraction of wavelet transform for feature extraction focused on enhancing speech recognition. The proposed method is implemented using Sampled Continuous Wavelet Transform (SCWT) and its performance is tested on a speaker-independent isolated word recognizer that discerns 50 Korean words. In particular, the effect of mother wavelet employed and number of voices per octave on the performance of proposed method is investigated. Also the influence on the size of mother wavelet on the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is compared with the most prevalent conventional method, MFCC (Mel0frequency Cepstral Coefficient). The experiments show that the recognition performance of the proposed method is better than that of MFCC. But the improvement is marginal while, due to the dimensionality increase, the computational loads of proposed method is substantially greater than that of MFCC.
PDF

Interaction Intent Analysis of Multiple Persons using Nonverbal Behavior Features (인간의 비언어적 행동 특징을 이용한 다중 사용자의 상호작용 의도 분석)

Yun, Sang-Seok;Kim, Munsang;Choi, Mun-Taek;Song, Jae-Bok
- Journal of Institute of Control, Robotics and Systems
- /
- v.19 no.8
- /
- pp.738-744
- /
- 2013
According to the cognitive science research, the interaction intent of humans can be estimated through an analysis of the representing behaviors. This paper proposes a novel methodology for reliable intention analysis of humans by applying this approach. To identify the intention, 8 behavioral features are extracted from the 4 characteristics in human-human interaction and we outline a set of core components for nonverbal behavior of humans. These nonverbal behaviors are associated with various recognition modules including multimodal sensors which have each modality with localizing sound source of the speaker in the audition part, recognizing frontal face and facial expression in the vision part, and estimating human trajectories, body pose and leaning, and hand gesture in the spatial part. As a post-processing step, temporal confidential reasoning is utilized to improve the recognition performance and integrated human model is utilized to quantitatively classify the intention from multi-dimensional cues by applying the weight factor. Thus, interactive robots can make informed engagement decision to effectively interact with multiple persons. Experimental results show that the proposed scheme works successfully between human users and a robot in human-robot interaction.
https://doi.org/10.5302/J.ICROS.2013.13.1893 인용 PDF KSCI

An Experimental Study on Flame Propagation along Non-premixed Vortex Tube (비예혼합 선형 와환에서의 화염 전파 특성에 관한 실험적 연구)

Yang, Seung-Yeon;Roh, Yoon-Jong;Chung, Suk-Ho
- Proceedings of the KSME Conference
- /
- 2001.06d
- /
- pp.864-870
- /
- 2001
Flame propagation along vortex tube was experimentally investigated. The vortex tube was generated by the ejection of propane from a nozzle through a single stroke motion of a speaker and the ignition was induced from a single pulse laser. Non-reactive flow fields were visualized using shadow technique. From these images, vortex ring size and translational velocity were measured in order to determine the ignition time and position. Flame structure and flame speed were measured using high speed CCD camera. Flame speed was accelerated during the initial stage of flame kernel growth, and reached near constant value during steady propagation period. Near the completion of propagation, flame speed was decelerated and then extinguished. Flame speed along the non-premixed vortex tube was found to be linearly proportional to circulation, which was similar to that of the flame propagation along premixed vortex ring. Ignition position minimally affects the propagation characteristics. These imply that flame is propagating along the maximum speed locus expected to be along stoichiometric contour and also support the existence of tribrachial flames.
PDF

A Research on Characteristics of Semi-active Muffler Using Difference of Transmission Paths (전달경로의 차이를 이용한 차량용반능동형 머플러의 특성에 관한 연구)

이종민;김경목;손동구;이장현;황요하
- Journal of KSNVE
- /
- v.11 no.3
- /
- pp.401-409
- /
- 2001
Passive type mufflers installed on every car haute inherent problem of lowering engine power and fuel efficiency caused by backpressure which is byproduct of complex internal structure. Recent improvements like installing a calve to change exhaust gas path depending on power requirement and rpm have only marginally improved performance. Tremendous amount of recent research works on active exhaust noise control have failed to commercialize because of numerous physical and economical reasons. In this paper, a unique seal-active muffler using difference of transmission paths is presented. In this system exhaust pipe is divided into two and joined again downstream. Exhaust noise is reduced by destructive interference when two-divided noise join again with transmission paths'difference which is half of the wavelength of a main noise frequency. One divided path has a sliding mechanism to change length thereby transmission path length difference is adjusted to entwine rpm change. The proposed system has minimal backpressure and does not need a secondary sound source like a speaker so it can overcome many problems of failed active noise control methods. We have verified proposed system's superior performance by simulation and comparison experiment with passive mufflers.
PDF

A Study on Duration Length and Place of Feature Extraction for Phoneme Recognition (음소 인식을 위한 특징 추출의 위치와 지속 시간 길이에 관한 연구)

Kim, Bum-Koog;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.4
- /
- pp.32-39
- /
- 1994
As a basic research to realize Korean speech recognition system, phoneme recognition was carried out to find out ; 1) the best place which represents each phoneme's characteristics, and 2) the reasonable length of duration for obtaining the best recognition rates. For the recognition experiments, multi-speaker dependent recognition with Bayesian decision rule using 21 order of cepstral coefficient as a feature parameter was adopted. It turned out that the best place of feature extraction for the highest recognition rates were 10~50ms in vowels, 40~100ms in fricatives and affricates, 10~50ms in nasals and liquids, and 10~50ms in plosives. And about 70ms of duration was good enough for the recognition of all 35 phonemes.
PDF

A Noise-Robust Adaptive NLMS Algorithm with Variable Convergence Factor for Acoustic Echo Cancellation (음향 반향 제어를 위한 가변수렴인자를 갖는 잡음에 강건한 적응 NLMS 알고리즘)

박장식;손경식
- Journal of Korea Multimedia Society
- /
- v.2 no.1
- /
- pp.99-108
- /
- 1999
In this paper, a new robust adaptive algorithm is proposed to improve the performance of AEC without computational burden. The proposed adaptive algorithm is based on NLMS algorithm, and its step-size is varied with the reference input signal power and the desired signal power. Its step-size is normalized by the sum of the powers of the reference input signal and the desired signal. When the near-end speaker's speech and noise are applied into the microphone, the step-size becomes small and the misalignment of coefficients are reduced. The convergence speed is comparable to NLMS algorithm at AEC application because the echo signals are attenuated about 10∼20 dBSPL. The characteristics of this algorithm is also analyzed and compared with conventional ones in this paper.
PDF

Sound Detection Characteristics Using Fabry-Perot Fiber Optic Sensor which Simply Supported in Structure (양단이 지지된 Fabry-Perot 광섬유센서의 음압 감지 특성 연구)

이종길;이진우;이준호
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.7
- /
- pp.585-591
- /
- 2003
In this paper, fiber optic sensor using Fabry-Perot interferometer which had benefit of minimize and light-weight was used. The sensor head has 1cm in length, total length of fiber is 9.5 chi and the sensor supported at both ends, simply. To analyze the acoustic characteristic non-directional speaker is used as a sound source. Acoustic applied in lateral direction and detected two signals were compared each other. Below 1㎑ fiber optic sensor has more sensitive than microphone, but in 2㎑ fiber optic sensor has less sensitive than microphone. This characteristic varies to the supporting system of fiber optic sensor. It was confirmed that the Fabry-Perot interferometric sensor detected acoustic signal, effectively. This kind of sensor can be applied to the structural health monitoring field of intellectual structure.
PDF KSCI

A Study on Dynamic Characteristics of Gas Centered Swirl Coaxial Injector with Acoustic Excitation by Varying Momentum Flux Ratio (운동량 플럭스 비의 변화에 따른 기체 중심 스월 동축형 분사기의 기체 가진 동특성 연구)

Lee, Jungho;Park, Gujeong;Yoon, Youngbin
- Journal of ILASS-Korea
- /
- v.20 no.3
- /
- pp.168-174
- /
- 2015
Combustion instability is critical problem in developing liquid rocket engine. There have been many efforts to solve this problem. In this study, the method was sought through the injector as part of these efforts to suppress combustion instability. If the injector can suppress the disturbance coming from the supply line as a kind of buffer it will serve to reduce combustion instability. Especially we target at gas propellant oscillation in gas-centered swirl coaxial injector. The phenomenon is simulated with acoustic excitation of speaker. The film thickness response at injector exit was measured by using a liquid film electrode. Also the response of spray to the disturbance was observed by high-speed photography. Gas-liquid momentum flux ratio and the frequency of feeding gas oscillation were changed to investigate the effect of these experimental parameters. The trend of response by varying these parameters and the cause of weak points was studied to suggest the better design of injector for suppressing combustion instability.
https://doi.org/10.15435/JILASSKR.2015.20.3.168 인용 PDF KSCI

An acoustical analysis of synchronous English speech using automatic intonation contour extraction (영어 동시발화의 자동 억양궤적 추출을 통한 음향 분석)

Yi, So Pae
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.97-105
- /
- 2015
This research mainly focuses on intonational characteristics of synchronous English speech. Intonation contours were extracted from 1,848 utterances produced in two different speaking modes (solo vs. synchronous) by 28 (12 women and 16 men) native speakers of English. Synchronous speech is found to be slower than solo speech. Women are found to speak slower than men. The effect size of speech rate caused by different speaking modes is greater than gender differences. However, there is no interaction between the two factors (speaking modes vs. gender differences) in terms of speech rate. Analysis of pitch point features has it that synchronous speech has smaller Pt (pitch point movement time), Pr (pitch point pitch range), Ps (pitch point slope) and Pd (pitch point distance) than solo speech. There is no interaction between the two factors (speaking modes vs. gender differences) in terms of pitch point features. Analysis of sentence level features reveals that synchronous speech has smaller Sr (sentence level pitch range), Ss (sentence slope), MaxNr (normalized maximum pitch) and MinNr (normalized minimum pitch) but greater Min (minimum pitch) and Sd (sentence duration) than solo speech. It is also shown that the higher the Mid (median pitch), the MaxNr and the MinNr in solo speaking mode, the more they are reduced in synchronous speaking mode. Max, Min and Mid show greater speaker discriminability than other features.
https://doi.org/10.13064/KSSS.2015.7.1.097 인용 PDF KSCI

The effect of word length on f0 intervals: Evidence from North Kyungsang children

Kim, Jungsun
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.107-116
- /
- 2015
The present experiment investigated the effect of word length on the length of f0 intervals for North Kyungsang children. In order to find out the lengths of the f0 intervals, the f0 values at the midpoints of vowels in words were measured. F0 estimates were computed as intervals consistent with the logarithmic scale corresponding to the number of syllables in the words. The results indicated that the mean f0 intervals in words of different lengths showed a significant difference for the HH in HH vs. HHL and the LH in LH vs. LLH for North Kyungsang children. Adult speakers from the North Kyungsang region significantly differed only within the HH in HH vs. HHL. Adult speakers made a noticeable contribution in this characteristic from the children. The result of the adult study was presented to confirm whether the children used a North Kyungsang dialect. With respect to individual speaker differences, the North Kyungsang children showed more or less consistent patterns in quantile-quantile plots for the HH vs. HHL, but for the HL vs. LHL and LH vs. LLH, there were more variations than for the HH vs. HHL. The individual speakers' variation was the largest for the HL vs. LHL and the smallest for HH vs. HHL. Considering these results, the effect of word length on f0 intervals tended to show pitch accent-type-specific characteristics in the process of prosodic acquisition.
https://doi.org/10.13064/KSSS.2015.7.1.107 인용 PDF KSCI

Search Result 255, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)