Search | Korea Science

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.3
- /
- pp.116-121
- /
- 2006
In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.
PDF

Discrimination of Emotional States In Voice and Facial Expression

Kim, Sung-Ill;Yasunari Yoshitomi;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.2E
- /
- pp.98-104
- /
- 2002
The present study describes a combination method to recognize the human affective states such as anger, happiness, sadness, or surprise. For this, we extracted emotional features from voice signals and facial expressions, and then trained them to recognize emotional states using hidden Markov model (HMM) and neural network (NN). For voices, we used prosodic parameters such as pitch signals, energy, and their derivatives, which were then trained by HMM for recognition. For facial expressions, on the other hands, we used feature parameters extracted from thermal and visible images, and these feature parameters were then trained by NN for recognition. The recognition rates for the combined parameters obtained from voice and facial expressions showed better performance than any of two isolated sets of parameters. The simulation results were also compared with human questionnaire results.
PDF KSCI

Classification of pathological and normal voice based on dimension reduction of feature vectors (피처벡터 축소방법에 기반한 장애음성 분류)

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- Proceedings of the KSPS conference
- /
- 2007.05a
- /
- pp.123-126
- /
- 2007
This paper suggests a method to improve the performance of the pathological/normal voice classification. The effectiveness of the mel frequency-based filter bank energies using the fisher discriminant ratio (FDR) is analyzed. And mel frequency cepstrum coefficients (MFCCs) and the feature vectors through the linear discriminant analysis (LDA) transformation of the filter bank energies (FBE) are implemented. This paper shows that the FBE LDA-based GMM is more distinct method for the pathological/normal voice classification than the MFCC-based GMM.
PDF

The Study for Advancing the Performance of Speaker Verification Algorithm Using Individual Voice Information (개별 음향 정보를 이용한 화자 확인 알고리즘 성능향상 연구)

Lee, Je-Young;Kang, Sun-Mee
- Speech Sciences
- /
- v.9 no.4
- /
- pp.253-263
- /
- 2002
In this paper, we propose new algorithm of speaker recognition which identifies the speaker using the information obtained by the intensive speech feature analysis such as pitch, intensity, duration, and formant, which are crucial parameters of individual voice, for candidates of high percentage of wrong recognition in the existing speaker recognition algorithm. For testing the power of discrimination of individual parameter, DTW (Dynamic Time Warping) is used. We newly set the range of threshold which affects the power of discrimination in speech verification such that the candidates in the new range of threshold are finally discriminated in the next stage of sound parameter analysis. In the speaker verification test by using voice DB which consists of secret words of 25 males and 25 females of 8 kHz 16 bit, the algorithm we propose shows about 1% of performance improvement to the existing algorithm.
PDF

A Weighted Feature Voting Approach for Robust and Real-Time Voice Activity Detection

Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
- ETRI Journal
- /
- v.33 no.1
- /
- pp.99-109
- /
- 2011
This paper concerns a robust real-time voice activity detection (VAD) approach which is easy to understand and implement. The proposed approach employs several short-term speech/nonspeech discriminating features in a voting paradigm to achieve a reliable performance in different environments. This paper mainly focuses on the performance improvement of a recently proposed approach which uses spectral peak valley difference (SPVD) as a feature for silence detection. The main issue of this paper is to apply a set of features with SPVD to improve the VAD robustness. The proposed approach uses a weighted voting scheme in order to take the discriminative power of the employed feature set into account. The experiments show that the proposed approach is more robust than the baseline approach from different points of view, including channel distortion and threshold selection. The proposed approach is also compared with some other VAD techniques for better confirmation of its achievements. Using the proposed weighted voting approach, the average VAD performance is increased to 89.29% for 5 different noise types and 8 SNR levels. The resulting performance is 13.79% higher than the approach based only on SPVD and even 2.25% higher than the not-weighted voting scheme.
https://doi.org/10.4218/etrij.11.1510.0158 인용 PDF KSCI

Robust Entropy Based Voice Activity Detection Using Parameter Reconstruction in Noisy Environment

Han, Hag-Yong;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
- Journal of information and communication convergence engineering
- /
- v.1 no.4
- /
- pp.205-208
- /
- 2003
Voice activity detection is a important problem in the speech recognition and speech communication. This paper introduces new feature parameter which are reconstructed by spectral entropy of information theory for robust voice activity detection in the noise environment, then analyzes and compares it with energy method of voice activity detection and performance. In experiments, we confirmed that spectral entropy and its reconstructed parameter are superior than the energy method for robust voice activity detection in the various noise environment.
PDF KSCI

An Ultrasonic Wave Encoder and Decoder for Indoor Positioning of Mobile Marketing System

Kim, Young-Mo;Jang, Se-Young;Park, Byeong-Chan;Bang, Kyung-Sik;Kim, Seok-Yoon
- Journal of the Korea Society of Computer and Information
- /
- v.24 no.7
- /
- pp.93-100
- /
- 2019
In this paper, we propose an intelligent marketing service system that can provide custom advertisements and events to both businesses and customers by identifying the location and contents using the ultrasonic signals and feature information in voice signals. We also develop the encoding and decoding algorithm of ultrasonic signals for this system and analyze the performance evaluation results. With the development of the hyper-connected society, the on-line marketing has been activated and is growing in size. Existing store marketing applications have disadvantages that customers have to find out events or promotional materials that the headquarters or stores throughusing the corresponding applications whenever they visit them. To solve these problems, there are attempts to create intelligent marketing tools using GPS technology and voice recognition technology. However, this approach has difficulties in technology development due to accuracy of location and speed of comparison and retrieval of voice recognition technology, and marketing services for customer relation are also much simplified.
https://doi.org/10.9708/jksci.2019.24.07.093 인용 PDF KSCI HTML

Non-Surgical Management for Benign Vocal Fold Lesions (양성 성대 병변의 비수술적 치료)

Lee, Sang Hyuk
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.26 no.2
- /
- pp.97-100
- /
- 2015
Benign vocal fold lesions, such as vocal nodules, polyps and Reinke's edema, usually result from chronic voice overuse. Conservative management such as voice therapy and pharmacotherapy are used as the primary treatment techniques. The main purpose of voice therapy is to identify and reduce voice misuse to achieve the optimal voice. But complete resolution may not be possible in all patients after voice therapy. Furthermore, some patients with voice-related occupations, voice rest and voice therapy are sometimes difficult, which makes it hard to carry out the treatment. When conservative therapy is ineffective, laryngeal microsurgery can be performed under general anesthesia. However, potential complications following laryngeal suspension and violation of the layered structure of the vocal fold during surgery should be considered before surgery. In recent decades, emerging literatures have demonstrated the potential usefulness of vocal fold steroid injection as an alternative treatment option for benign vocal fold lesions. The most advantageous feature of vocal fold steroid injection is the maintenance of regional anti-inflammatory effects while preventing the potential systemic adverse effects of the steroid. Many non-surgical treatment methods can be conducted using different approaches in the office setting. It can be applied as an alternative treatment modality for the management of various benign vocal fold lesions.
PDF

A Study on Formants of Vowels for Speaker Recognition (화자 인식을 위한 모음의 포만트 연구)

Ahn Byoung-seob;Shin Jiyoung;Kang Sunmee
- MALSORI
- /
- no.51
- /
- pp.1-16
- /
- 2004
The aim of this paper is to analyze vowels in voice imitation and disguised voice, and to find the invariable phonetic features of the speaker. In this paper we examined the formants of monophthongs /a, u, i, o, {$\omega},{\;}{\varepsilon},{\;}{\Lambda}$/. The results of the present are as follows : $\circled1$ Speakers change their vocal tract features. $\circled2$ Vowels /a, ${\varepsilon}$, i/ appear to be proper for speaker recognition since they show invariable acoustic feature during voice modulation. $\circled3$ F1 does not change easily compared to higher formants. $\circled4$ F3-F2 appears to be constituent for a speaker identification in vowel /a/ and /$\varepsilon$/, and F4-F2 in vowel /i/. $\circled5$ Resulting of F-ratio, differences of each formants were more useful than individual formant of a vowel to speaker recognition.
PDF

Detection of Pathological Voice Using Linear Discriminant Analysis

Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
- MALSORI
- /
- no.64
- /
- pp.77-88
- /
- 2007
Nowadays, mel-frequency cesptral coefficients (MFCCs) and Gaussian mixture models (GMMs) are used for the pathological voice detection. This paper suggests a method to improve the performance of the pathological/normal voice classification based on the MFCC-based GMM. We analyze the characteristics of the mel frequency-based filterbank energies using the fisher discriminant ratio (FDR). And the feature vectors through the linear discriminant analysis (LDA) transformation of the filterbank energies (FBE) and the MFCCs are implemented. An accuracy is measured by the GMM classifier. This paper shows that the FBE LDA-based GMM is a sufficiently distinct method for the pathological/normal voice classification, with a 96.6% classification performance rate. The proposed method shows better performance than the MFCC-based GMM with noticeable improvement of 54.05% in terms of error reduction.
PDF

Search Result 232, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)