• Title/Summary/Keyword: Speech Recognition Technology

Search Result 527, Processing Time 0.028 seconds

Performance Comparison of Automatic Detection of Laryngeal Diseases by Voice (후두질환 음성의 자동 식별 성능 비교)

  • Kang Hyun Min;Kim Soo Mi;Kim Yoo Shin;Kim Hyung Soon;Jo Cheol-Woo;Yang Byunggon;Wang Soo-Geun
    • MALSORI
    • /
    • no.45
    • /
    • pp.35-45
    • /
    • 2003
  • Laryngeal diseases cause significant changes in the quality of speech production. Automatic detection of laryngeal diseases by voice is attractive because of its nonintrusive nature. In this paper, we apply speech recognition techniques to detection of laryngeal cancer, and investigate which feature parameters and classification methods are appropriate for this purpose. Linear Predictive Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficients (MFCC) are examined as feature parameters, and parameters reflecting the periodicity of speech and its perturbation are also considered. As for classifier, multilayer perceptron neural networks and Gaussian Mixture Models (GMM) are employed. According to our experiments, higher order LPCC with the periodic information parameters yields the best performance.

  • PDF

Speaker Identification in Small Training Data Environment using MLLR Adaptation Method (MLLR 화자적응 기법을 이용한 적은 학습자료 환경의 화자식별)

  • Kim, Se-hyun;Oh, Yung-Hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.159-162
    • /
    • 2005
  • Identification is the process automatically identify who is speaking on the basis of information obtained from speech waves. In training phase, each speaker models are trained using each speaker's speech data. GMMs (Gaussian Mixture Models), which have been successfully applied to speaker modeling in text-independent speaker identification, are not efficient in insufficient training data environment. This paper proposes speaker modeling method using MLLR (Maximum Likelihood Linear Regression) method which is used for speaker adaptation in speech recognition. We make SD-like model using MLLR adaptation method instead of speaker dependent model (SD). Proposed system outperforms the GMMs in small training data environment.

  • PDF

Speaker Identification Using Greedy Kernel PCA (Greedy Kernel PCA를 이용한 화자식별)

  • Kim, Min-Seok;Yang, Il-Ho;Yu, Ha-Jin
    • MALSORI
    • /
    • no.66
    • /
    • pp.105-116
    • /
    • 2008
  • In this research, we propose a speaker identification system using a kernel method which is expected to model the non-linearity of speech features well. We have been using principal component analysis (PCA) successfully, and extended to kernel PCA, which is used for many pattern recognition tasks such as face recognition. However, we cannot use kernel PCA for speaker identification directly because the storage required for the kernel matrix grows quadratically, and the computational cost grows linearly (computing eigenvector of $l{\times}l$ matrix) with the number of training vectors I. Therefore, we use greedy kernel PCA which can approximate kernel PCA with small representation error. In the experiments, we compare the accuracy of the greedy kernel PCA with the baseline Gaussian mixture models using MFCCs and PCA. As the results with limited enrollment data show, the greedy kernel PCA outperforms conventional methods.

  • PDF

N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors (음소인식 오류에 강인한 N-gram 기반 음성 문서 검색)

  • Lee, Su-Jang;Park, Kyung-Mi;Oh, Yung-Hwan
    • MALSORI
    • /
    • no.67
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

Design of Robust Speech Recognition System Using Tandem Architecture (탠덤 구조를 이용한 강인한 음성 인식 시스템 설계)

  • Yun, Young-Sun;Lee, Yun-Keun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.323-326
    • /
    • 2007
  • The various studies of combining neural network and hidden Markov models within a single system are done with expectations that it may potentially combine the advantages of both systems. With the influence of these studies, tandem approach was presented to use neural network as the classifier and hidden Markov models as the decoder. In this paper, we applied the trend information of segmental features to tandem architecture and used posterior probabilities, which are the output of neural network, as inputs of recognition system. The experiments are performed on Aurora2 database to examine the potentiality of the trend feature based tandem architecture. The proposed method shows the better results than the baseline system on very low SNR environments.

  • PDF

Unseen Model Prediction using an Optimal Decision Tree (Optimal Decision Tree를 이용한 Unseen Model 추정방법)

  • Kim Sungtak;Kim Hoi-Rin
    • MALSORI
    • /
    • no.45
    • /
    • pp.117-126
    • /
    • 2003
  • Decision tree-based state tying has been proposed in recent years as the most popular approach for clustering the states of context-dependent hidden Markov model-based speech recognition. The aims of state tying is to reduce the number of free parameters and predict state probability distributions of unseen models. But, when doing state tying, the size of a decision tree is very important for word independent recognition. In this paper, we try to construct optimized decision tree based on the average of feature vectors in state pool and the number of seen modes. We observed that the proposed optimal decision tree is effective in predicting the state probability distribution of unseen models.

  • PDF

Reduction of Dimension of HMM parameters in MLLR Framework for Speaker Adaptation (화자적응시스템을 위한 MLLR 알고리즘 연산량 감소)

  • Kim Ji Un;Jeong Jae Ho
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.123-126
    • /
    • 2003
  • We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we employ PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. The dimension of HMM parameters is reduced to about 1/3 ~ 2/7 dimensions of SI(speaker independent) model parameter with which speech recognition system represents word recognition rate as much as ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O($n^4$). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced to about 1/80~1/150.

  • PDF

Synchronization of VOD Content and Captions Using Speech Recognition and Modified Dynamic Programming (음성인식과 변경된 동적계획법을 이용한 VOD 콘텐트와 자막의 동기화)

  • Oh, Juhyun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2021.06a
    • /
    • pp.131-134
    • /
    • 2021
  • 지상파 방송에서는 청각장애인을 위해 폐쇄자막(closed caption) 서비스가 제공되고 있지만, 이를 저장하여 VOD 서비스 등에 제공하고자 할 때는 영상과의 비동기화(desynchronization) 문제로 인해 활용할 수 없는 문제가 있다. 본 논문에서는 이를 해결하기 위해 자동 음성인식(automatic speech recognition)과, 자막 동기화 문제에 맞게 변경된 동적계획법(modified dynamic programming)을 이용하는 방법을 제안한다. 문자열 정렬에서 삽입과 삭제 등 간격(gap)의 발생을 제어하는 제약조건과 그에 따른 점수 구조를 적용함으로써 문자열 정렬 성능을 개선한다. 또한 정렬된 폐쇄자막과 음성인식 문자열로부터 시간 동기정보를 복원하고 동기화된 자막을 생성하는 방법을 제안한다. 실제 TV 프로그램과 자막에 적용하여 기존 방법에 비해 성능의 향상이 있음을 확인하였다.

  • PDF

Character Recognition Algorithm in Low-Quality Legacy Contents Based on Alternative End-to-End Learning (대안적 통째학습 기반 저품질 레거시 콘텐츠에서의 문자 인식 알고리즘)

  • Lee, Sung-Jin;Yun, Jun-Seok;Park, Seon-hoo;Yoo, Seok Bong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1486-1494
    • /
    • 2021
  • Character recognition is a technology required in various platforms, such as smart parking and text to speech, and many studies are being conducted to improve its performance through new attempts. However, with low-quality image used for character recognition, a difference in resolution of the training image and test image for character recognition occurs, resulting in poor accuracy. To solve this problem, this paper designed an end-to-end learning neural network that combines image super-resolution and character recognition so that the character recognition model performance is robust against various quality data, and implemented an alternative whole learning algorithm to learn the whole neural network. An alternative end-to-end learning and recognition performance test was conducted using the license plate image among various text images, and the effectiveness of the proposed algorithm was verified with the performance test.

Implementation of Interface to Support Mobile Accessibility Using Speech I/O APIs (음성 입출력 API를 이용한 모바일 접근성 지원 인터페이스 구현)

  • Oh, Seungchur;Yun, Young-Sun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.71-80
    • /
    • 2013
  • Due to the increased use of mobile devices, there is a lot of discussion on mobile accessibility. Mobile accessibility means that everyone, who includes the disabled, the elderly people, can easily use the functions of mobile devices. In this paper, we presented and implemented a mobile interface using a speech I/O APIs to improve the accessibility. The proposed interfaces are implemented on Android platforms and they used speech recognition and text-to-speech APIs supported as built-in services. In addition, to facilitate the internet access for visually impaired or blind people, we also implemented the web browsing application (web reader).