• 제목/요약/키워드: Feature matrix

검색결과 499건 처리시간 0.03초

특징행렬 데이터베이스를 이용한 NMF 기반 음악전사 (NMF Based Music Transcription Using Feature Vector Database)

  • 신옥근;류다현
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제36권8호
    • /
    • pp.1129-1135
    • /
    • 2012
  • NMF를 이용하여 음악을 전사할 때 표적음악에서 특징행렬과 가중치 행렬을 동시에 추출해 내는 방법을 이용하기 위해서는 특징행렬의 크기(특징벡터의 수)를 미리 알아야 할 뿐 아니라, 추출된 각각의 특징벡터들의 음고를 결정하는 어려운 과정을 거쳐야한다. 또 이 방법은 음악에 포함된 음고의 수가 커질수록 특징행렬을 정확하게 추출해 내기 어려워진다는 단점도 있다. 본 연구에서는 이러한 단점을 피하기 위해 특징행렬 데이터베이스를 미리 준비한 다음 이를 실제 음악에 이용하는 방법을 실험한다. 먼저 특징행렬 데이터베이스를 만든 다음, 특징행렬을 추출한 피아노에서 연주된 음악, 그리고 제 3의 피아노에서 연주된 같은 음악을 각각 전사하여 성능을 비교한다. 또 이들 결과와 비교하기 위하여 특징행렬과 가중치행렬을 동시에 추출하는 방법도 실험하여 결과를 비교하였다. 특징행렬 데이터베이스를 이용하는 방법이 특징행렬과 가중치행렬을 동시에 추출하는 방법보다 좋은 성능을 가짐을 확인하였다.

Matrix Factorization을 이용한 음성 특징 파라미터 추출 및 인식 (Feature Parameter Extraction and Speech Recognition Using Matrix Factorization)

  • 이광석;허강인
    • 한국정보통신학회논문지
    • /
    • 제10권7호
    • /
    • pp.1307-1311
    • /
    • 2006
  • 본 연구에서는 행렬 분해 (Matrix Factorization)를 이용하여 음성 스펙트럼의 부분적 특정을 나타낼 수 있는 새로운 음성 파라마터를 제안한다. 제안된 파라미터는 행렬내의 모든 원소가 음수가 아니라는 조건에서 행렬분해 과정을 거치게 되고 고차원의 데이터가 효과적으로 축소되어 나타남을 알 수 있다. 차원 축소된 데이터는 입력 데이터의 부분적인 특성을 표현한다. 음성 특징 추출 과정에서 일반적으로 사용되는 멜 필터뱅크 (Mel-Filter Bank)의 출력 을 Non-Negative 행렬 분해(NMF:Non-Negative Matrix Factorization) 알고리즘의 입 력으로 사용하고, 알고리즘을 통해 차원 축소된 데이터를 음성인식기의 입력으로 사용하여 멜 주파수 캡스트럼 계수 (MFCC: Mel Frequency Cepstral Coefficient)의 인식결과와 비교해 보았다. 인식결과를 통하여 일반적으로 음성인식기의 성능평가를 위해 사용되는 MFCC에 비하여 제안된 특정 파라미터가 인식 성능이 뛰어남을 알 수 있었다.

Speaker Adaptation Using ICA-Based Feature Transformation

  • Jung, Ho-Young;Park, Man-Soo;Kim, Hoi-Rin;Hahn, Min-Soo
    • ETRI Journal
    • /
    • 제24권6호
    • /
    • pp.469-472
    • /
    • 2002
  • Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA-based feature transformation matrix, it is necessary to adjust the ICA-based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.

  • PDF

도착시간지연 특성행렬을 이용한 휴머노이드 로봇의 공간 화자 위치측정 (Spatial Speaker Localization for a Humanoid Robot Using TDOA-based Feature Matrix)

  • 김진성;김의현;김도익;유범재
    • 로봇학회논문지
    • /
    • 제3권3호
    • /
    • pp.237-244
    • /
    • 2008
  • Nowadays, research on human-robot interaction has been getting increasing attention. In the research field of human-robot interaction, speech signal processing in particular is the source of much interest. In this paper, we report a speaker localization system with six microphones for a humanoid robot called MAHRU from KIST and propose a time delay of arrival (TDOA)-based feature matrix with its algorithm based on the minimum sum of absolute errors (MSAE) for sound source localization. The TDOA-based feature matrix is defined as a simple database matrix calculated from pairs of microphones installed on a humanoid robot. The proposed method, using the TDOA-based feature matrix and its algorithm based on MSAE, effortlessly localizes a sound source without any requirement for calculating approximate nonlinear equations. To verify the solid performance of our speaker localization system for a humanoid robot, we present various experimental results for the speech sources at all directions within 5 m distance and the height divided into three parts.

  • PDF

의미 특징 행렬과 의미 가변행렬을 이용한 질의 기반의 문서 요약 (Query-Based Summarization using Semantic Feature Matrix and Semantic Variable Matrix)

  • 박선
    • 한국항행학회논문지
    • /
    • 제12권4호
    • /
    • pp.372-377
    • /
    • 2008
  • 본 논문은 의미특징행렬(semantic feature matrix)과 의미변수행령(semantic variable matrix)을 이용하는 질의 기반의 새로운 문서를 요약방법을 제안한다. 제안된 방법은 비지도 학습 방법으로 질의와 문장 간에 사전학습이 필요 없고, 의미 특징(semantic feature)과 의미변수(semantic variable)를 이용하여 질의에 적합한 하위 주제를 잘 반영하여서 정확한 문서를 요약 할 수 있다. 이것은 비음수 행렬 분해가 주제들로 구성된 문서의 내부구조를 나타내는 의미특징을 자연스럽게 추출할 수 있기 때문이다. 실험결과 제안방법이 다른 방법에 비하여 좋은 성능을 보인다.

  • PDF

Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination

  • Zhang, Qiu-yu;Xu, Fu-jiu;Bai, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권2호
    • /
    • pp.522-539
    • /
    • 2021
  • In order to solve the problems of the existing audio fingerprint method when extracting audio fingerprints from long speech segments, such as too large fingerprint dimension, poor robustness, and low retrieval accuracy and efficiency, a robust audio fingerprint retrieval method based on feature dimension reduction and feature combination is proposed. Firstly, the Mel-frequency cepstral coefficient (MFCC) and linear prediction cepstrum coefficient (LPCC) of the original speech are extracted respectively, and the MFCC feature matrix and LPCC feature matrix are combined. Secondly, the feature dimension reduction method based on information entropy is used for column dimension reduction, and the feature matrix after dimension reduction is used for row dimension reduction based on energy feature dimension reduction method. Finally, the audio fingerprint is constructed by using the feature combination matrix after dimension reduction. When speech's user retrieval, the normalized Hamming distance algorithm is used for matching retrieval. Experiment results show that the proposed method has smaller audio fingerprint dimension and better robustness for long speech segments, and has higher retrieval efficiency while maintaining a higher recall rate and precision rate.

Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

  • Park, Jeong-Won;Kim, Chang-Keun;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • 제1권4호
    • /
    • pp.209-212
    • /
    • 2003
  • In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.

화자식별을 위한 전역 공분산에 기반한 주성분분석 (Global Covariance based Principal Component Analysis for Speaker Identification)

  • 서창우;임영환
    • 말소리와 음성과학
    • /
    • 제1권1호
    • /
    • pp.69-73
    • /
    • 2009
  • This paper proposes an efficient global covariance-based principal component analysis (GCPCA) for speaker identification. Principal component analysis (PCA) is a feature extraction method which reduces the dimension of the feature vectors and the correlation among the feature vectors by projecting the original feature space into a small subspace through a transformation. However, it requires a larger amount of training data when performing PCA to find the eigenvalue and eigenvector matrix using the full covariance matrix by each speaker. The proposed method first calculates the global covariance matrix using training data of all speakers. It then finds the eigenvalue matrix and the corresponding eigenvector matrix from the global covariance matrix. Compared to conventional PCA and Gaussian mixture model (GMM) methods, the proposed method shows better performance while requiring less storage space and complexity in speaker identification.

  • PDF

Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출 (Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization)

  • 박정원;김창근;허강인
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 신호처리소사이어티 추계학술대회 논문집
    • /
    • pp.49-52
    • /
    • 2003
  • In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.

  • PDF

Facial Feature Recognition based on ASNMF Method

  • Zhou, Jing;Wang, Tianjiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권12호
    • /
    • pp.6028-6042
    • /
    • 2019
  • Since Sparse Nonnegative Matrix Factorization (SNMF) method can control the sparsity of the decomposed matrix, and then it can be adopted to control the sparsity of facial feature extraction and recognition. In order to improve the accuracy of SNMF method for facial feature recognition, new additive iterative rules based on the improved iterative step sizes are proposed to improve the SNMF method, and then the traditional multiplicative iterative rules of SNMF are transformed to additive iterative rules. Meanwhile, to further increase the sparsity of the basis matrix decomposed by the improved SNMF method, a threshold-sparse constraint is adopted to make the basis matrix to a zero-one matrix, which can further improve the accuracy of facial feature recognition. The improved SNMF method based on the additive iterative rules and threshold-sparse constraint is abbreviated as ASNMF, which is adopted to recognize the ORL and CK+ facial datasets, and achieved recognition rate of 96% and 100%, respectively. Meanwhile, from the results of the contrast experiments, it can be found that the recognition rate achieved by the ASNMF method is obviously higher than the basic NMF, traditional SNMF, convex nonnegative matrix factorization (CNMF) and Deep NMF.