• Title/Summary/Keyword: Feature vectors

Search Result 814, Processing Time 0.022 seconds

Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition (남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.6
    • /
    • pp.568-574
    • /
    • 2010
  • In an effort to improve the speech recognition rate, we investigated performance comparison between speaker-independent and gender-specific speech recognitions. For this purpose, 20 male and 20 female speakers each pronounced 300 isolated Korean words and the speeches were divided into 4 groups: female, male, and two mixed genders. To examine the validity for the gender-specific speech recognition, Fourier spectrum and MFCC feature vectors averaged over male and female speakers separately were examined. The result showed distinction between the two genders, which supports the motivation for the gender-specific speech recognition. In experiments of speech recognition rate, the error rate for the gender-specific case was shown to be less than50% compared to that of the speaker-independent case. From the obtained results, it might be suggested that hierarchical recognition of gender and speech recognition might yield better performance over the current method of speech recognition.

Color Image Segmentation Based on Morphological Operation and a Gaussian Mixture Model (모폴로지 연산과 가우시안 혼합 모형에 기반한 컬러 영상 분할)

  • Lee Myung-Eun;Park Soon-Young;Cho Wan-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.3 s.309
    • /
    • pp.84-91
    • /
    • 2006
  • In this paper, we present a new segmentation algorithm for color images based on mathematical morphology and a Gaussian mixture model(GMM). We use the morphological operations to determine the number of components in a mixture model and to detect their modes of each mixture component. Next, we have adopted the GMM to represent the probability distribution of color feature vectors and used the deterministic annealing expectation maximization (DAEM) algorithm to estimate the parameters of the GMM that represents the multi-colored objects statistically. Finally, we segment the color image by using posterior probability of each pixel computed from the GMM. The experimental results show that the morphological operation is efficient to determine a number of components and initial modes of each component in the mixture model. And also it shows that the proposed DAEM provides a global optimal solution for the parameter estimation in the mixture model and the natural color images are segmented efficiently by using the GMM with parameters estimated by morphological operations and the DAEM algorithm.

Emotion Recognition of Korean and Japanese using Facial Images (얼굴영상을 이용한 한국인과 일본인의 감정 인식 비교)

  • Lee, Dae-Jong;Ahn, Ui-Sook;Park, Jang-Hwan;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.197-203
    • /
    • 2005
  • In this paper, we propose an emotion recognition using facial Images to effectively design human interface. Facial database consists of six basic human emotions including happiness, sadness, anger, surprise, fear and dislike which have been known as common emotions regardless of nation and culture. Emotion recognition for the facial images is performed after applying the discrete wavelet. Here, the feature vectors are extracted from the PCA and LDA. Experimental results show that human emotions such as happiness, sadness, and anger has better performance than surprise, fear and dislike. Expecially, Japanese shows lower performance for the dislike emotion. Generally, the recognition rates for Korean have higher values than Japanese cases.

Efficient 3D Model Retrieval using Discriminant Analysis (판별분석을 이용한 효율적인 3차원 모델 검색)

  • Song, Ju-Whan;Choi, Seong-Hee;Gwun, Ou-Bong
    • 전자공학회논문지 IE
    • /
    • v.45 no.2
    • /
    • pp.34-39
    • /
    • 2008
  • This study established the efficient system that retrieves the 3D model by using a statistical technique called the function of discriminant analysis. This method was suggested to search index, which was formed by the statistics of 128 feature vectors including those scope, minimum value, average, standard deviation, skewness and scale. All of these were sampled with Osada's D2 method and the statistics as a factor effecting a change turned the value of discriminant analytic function into that of index. Through the primary retrieval on the model of query, the class above the top 2% was drawn out by comparing the query with the index of previously saved class from the group of same models. This method was proved an efficient retrieval technique that saved its procedural time. It shortened the retrieval time for 3D model by 57% faster than the existing Osada's method, and the precision that similar models were found in the first place was recorded 0.362, which revealed it more efficient by 44.8%.

Enhancement of the k-Means Clustering Speed by Emulation of Birds' Motion in Flock (새떼 이동의 모방에 의한 k-평균 군집 속도의 향상)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.9
    • /
    • pp.965-970
    • /
    • 2014
  • In an effort to improve the convergence speed in k-means clustering, we introduce the notion of the birds' movement in a flock. Their motion is characterized by the observation that each bird runs after his nearest neighbor. We utilize this feature in clustering procedure. Once the class of a vector is determined, then a number of vectors in the vicinity of it are assigned to the same class. Experiments have shown that the required number of iterations for termination is significantly lower in the proposed method than in the conventional one. Furthermore, the time of calculation per iteration is more than 5% shorter in the proposed case. The quality of the clustering, as determined from the total accumulated distance between the vector and its centroid vector, was found to be practically the same. It might be phrased that we may acquire practically the same clustering result with shorter computational time.

A Method for Motion Artifact Compensation of PPG Signal (광혈류량 신호의 움직임 훼손 보상 기법)

  • Kim, Hansol;Lee, Eui Chul
    • Journal of Broadcast Engineering
    • /
    • v.18 no.4
    • /
    • pp.543-549
    • /
    • 2013
  • Motion artifacts of central and autonomic nervous system signals degrades the performance of the bio-signal based human factor analysis. Firstly, we propose a defining method of motion artifact section by analyzing successive image frames. Motion artifact section is defined when the amount of motion is greater than the pre-defined threshold. In here, the amount of motion is estimated by first derivation of image frames at temporal domain. Secondly, we propose another defining method of motion artifact section through designing 2D Gaussian probability density function model by analyzing feature vectors of one cycle of signal such as length and amplitude. The defined motion artifact sections are interpolated on the basis of 1D Gaussian function. At result of applying the method into photoplethysmography signal, we confirmed that the calculated heartbeat rate from the restored photoplethysmography came up to the one from electrocardiography. Also, we found that the video based method generated relatively more false acceptance of motion artifact section and the probability density function based method generated relatively more false rejection of motion artifact section.

Quantization Based Speaker Normalization for DHMM Speech Recognition System (DHMM 음성 인식 시스템을 위한 양자화 기반의 화자 정규화)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.4
    • /
    • pp.299-307
    • /
    • 2003
  • There have been many studies on speaker normalization which aims to minimize the effects of speaker's vocal tract length on the recognition performance of the speaker independent speech recognition system. In this paper, we propose a simple vector quantizer based linear warping speaker normalization method based on the observation that the vector quantizer can be successfully used for speaker verification. For this purpose, we firstly generate an optimal codebook which will be used as the basis of the speaker normalization, and then the warping factor of the unknown speaker will be extracted by comparing the feature vectors and the codebook. Finally, the extracted warping factor is used to linearly warp the Mel scale filter bank adopted in the course of MFCC calculation. To test the performance of the proposed method, a series of recognition experiments are conducted on discrete HMM with thirteen mono-syllabic Korean number utterances. The results showed that about 29% of word error rate can be reduced, and that the proposed warping factor extraction method is useful due to its simplicity compared to other line search warping methods.

Vector Quantizer Based Speaker Normalization for Continuos Speech Recognition (연속음성 인식기를 위한 벡터양자화기 기반의 화자정규화)

  • Shin Ok-keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.8
    • /
    • pp.583-589
    • /
    • 2004
  • Proposed is a speaker normalization method based on vector quantizer for continuous speech recognition (CSR) system in which no acoustic information is made use of. The proposed method, which is an improvement of the previously reported speaker normalization scheme for a simple digit recognizer, builds up a canonical codebook by iteratively training the codebook while the size of codebook is increased after each iteration from a relatively small initial size. Once the codebook established, the warp factors of speakers are estimated by comparing exhaustively the warped versions of each speaker's utterance with the codebook. Two sets of phones are used to estimate the warp factors: one, a set of vowels only. and the other, a set composed of all the Phonemes. A Piecewise linear warping function which corresponds to the estimated warp factor is adopted to warp the power spectrum of the utterance. Then the warped feature vectors are extracted to be used to train and to test the speech recognizer. The effectiveness of the proposed method is investigated by a set of recognition experiments using the TIMIT corpus and HTK speech recognition tool kit. The experimental results showed comparable recognition rate improvement with the formant based warping method.

A Real-Time Implementation of Isolated Word Recognition System Based on a Hardware-Efficient Viterbi Scorer (효율적인 하드웨어 구조의 Viterbi Scorer를 이용한 실시간 격리단어 인식 시스템의 구현)

  • Cho, Yun-Seok;Kim, Jin-Yul;Oh, Kwang-Sok;Lee, Hwang-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.2E
    • /
    • pp.58-67
    • /
    • 1994
  • Hidden Markov Model (HMM)-based algorithms have been used successfully in many speech recognition systems, especially large vocabulary systems. Although general purpose processors can be employed for the system, they inevitably suffer from the computational complexity and enormous data. Therefore, it is essential for real-time speech recognition to develop specialized hardware to accelerate the recognition steps. This paper concerns with a real-time implementation of an isolated word recognition system based on HMM. The speech recognition system consists of a host computer (PC), a DSP board, and a prototype Viterbi scoring board. The DSP board extracts feature vectors of speech signal. The Viterbi scoring board has been implemented using three field-programmable gate array chips. It employs a hardware-efficient Viterbi scoring architecture and performs the Viterbi algorithm for HMM-based speech recognition. At the clock rate of 10 MHz, the system can update about 100,000 states within a single frame of 10ms.

  • PDF

Musical Genre Classification System based on Multiple-Octave Bands (다중 옥타브 밴드 기반 음악 장르 분류 시스템)

  • Byun, Karam;Kim, Moo Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.12
    • /
    • pp.238-244
    • /
    • 2013
  • For musical genre classification, various types of feature vectors are utilized. Mel-frequency cepstral coefficient (MFCC), decorrelated filter bank (DFB), and octave-based spectral contrast (OSC) are widely used as short-term features, and their long-term variations are also utilized. In this paper, OSC features are extracted not only in the single-octave band domain, but also in the multiple-octave band one to capture the correlation between octave bands. As a baseline system, we select the genre classification system that won the fourth place in the 2012 music information retrieval evaluation exchange (MIREX) contest. By applying the OSC features based on multiple-octave bands, we obtain the better classification accuracy by 0.40% and 3.15% for the GTZAN and Ballroom databases, respectively.