통합 검색 | Korea Science

Speech Recognition in Car Noise Environments Using Multiple Models Based on a Hybrid Method of Spectral Subtraction and Residual Noise Masking

Song, Myung-Gyu;Jung, Hoi-In;Shim, Kab-Jong;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- 제18권3E호
- /
- pp.3-8
- /
- 1999
In speech recognition for real-world applications, the performance degradation due to the mismatch introduced between training and testing environments should be overcome. In this paper, to reduce this mismatch, we provide a hybrid method of spectral subtraction and residual noise masking. We also employ multiple model approach to obtain improved robustness over various noise environments. In this approach, multiple model sets are made according to several noise masking levels and then a model set appropriate for the estimated noise level is selected automatically in recognition phase. According to speaker independent isolated word recognition experiments in car noise environments, the proposed method using model sets with only two masking levels reduced average word error rate by 60% in comparison with spectral subtraction method.
PDF

Feature Extraction Based on Speech Attractors in the Reconstructed Phase Space for Automatic Speech Recognition Systems

Shekofteh, Yasser;Almasganj, Farshad
- ETRI Journal
- /
- 제35권1호
- /
- pp.100-108
- /
- 2013
In this paper, a feature extraction (FE) method is proposed that is comparable to the traditional FE methods used in automatic speech recognition systems. Unlike the conventional spectral-based FE methods, the proposed method evaluates the similarities between an embedded speech signal and a set of predefined speech attractor models in the reconstructed phase space (RPS) domain. In the first step, a set of Gaussian mixture models is trained to represent the speech attractors in the RPS. Next, for a new input speech frame, a posterior-probability-based feature vector is evaluated, which represents the similarity between the embedded frame and the learned speech attractors. We conduct experiments for a speech recognition task utilizing a toolkit based on hidden Markov models, over FARSDAT, a well-known Persian speech corpus. Through the proposed FE method, we gain 3.11% absolute phoneme error rate improvement in comparison to the baseline system, which exploits the mel-frequency cepstral coefficient FE method.
https://doi.org/10.4218/etrij.13.0112.0074 인용 PDF KSCI

계기판 벌브 인식 알고리즘 ((Algorithm for Recognizing Bulb in Cluster))

이철헌;설성욱;김효성
- 대한전자공학회논문지TE
- /
- 제39권1호
- /
- pp.37-45
- /
- 2002
본 논문은 차량계기판에서 벌브를 인식하기 위한 새로운 특징을 제안한다. 대부분의 모델기반 물체 인식에서 사용되는 특징으로는 물체의 다각형 근사점이 있다. 이러한 특징을 이용한 정합방식을 차량계기판의 벌브와 같은 작은 물체에 적용하며, 정합율이 낮다. 이러한 정합율을 높이기 위해서 본 논문에서는 새로운 특징을 제안한다. 제안된 특징은 물체화소의 원분포와 물체의 중심에서 경계선까지의 거리비이다. 본 논문에서는 이러한 세 개의 특징을 모두 같이 이용하기 위해서 새로운 결정함수를 정의한다. 실험 결과는 다각형 근사점을 이용한 정합방식과 3개의 특징을 모두 이용한 정합방식에서의 정합이 되지 않은 물체수로 비교를 한다.
PDF KSCI

채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교 (Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters)

정성윤;김민성;손종목;배건성;김상훈
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2002년도 11월 학술대회지
- /
- pp.201-204
- /
- 2002
As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.
PDF

부분 손상된 음성의 인식성능 향상을 위한 가중 필터뱅크 분석 및 모델 적응 (Weighted filter bank analysis and model adaptation for improving the recognition performance of partially corrupted speech)

조훈영;오영환
- 대한음성학회지:말소리
- /
- 제44호
- /
- pp.157-169
- /
- 2002
We propose a weighted filter bank analysis and model adaptation (WFBA-MA) scheme to improve the utilization of uncorrupted or less severely corrupted frequency regions for robust speech recognition. A weighted met frequency cepstral coefficient is obtained by weighting log filter bank energies with reliability coefficients and hidden Markov models are also modified to reflect the local reliabilities. Experimental results on TIDIGITS database corrupted by band-limited noises and car noise indicated that the proposed WFBA-MA scheme utilizes the uncorrupted speech information well, significantly improving recognition performance in comparison to multi-band speech recognition systems.
PDF

대학교 3학년의 치매 인식과 보건교육 요구도: 보건계열과 비보건계열 비교 (The Third Year Students' Recognition Level for Dementia and Health Education Needs in Universities: Comparison between Health Major and Non-health Major)

이준우
- 한국학교ㆍ지역보건교육학회지
- /
- 제10권1호
- /
- pp.35-46
- /
- 2009
Background & Objectives: The purpose of this study was to offer basic materials for the correct comprehension of dementia and of health education needs by comparing the students' recognition level of dementia. Methods: Three health major departments(the department of nursing science, physical therapy and occupational therapy) and three non-health major departments(the department of English, early childhood education and biology) were randomized in universities. And the 180 juniors students involved in this study and their level of educational experience and of recognition of dementia was analyzed. Results: There weas no difference about recognition of social welfare services between the students of health departments and non-health departments, but there were differences between them about other health education needs. Conclusion: Students of non-health majors who learn the subjects unrelated to dementia should get an education on dementia so that they can understand and recognize health education needs on dementia.
PDF

Convolutional Neural Networks for Character-level Classification

Ko, Dae-Gun;Song, Su-Han;Kang, Ki-Min;Han, Seong-Wook
- IEIE Transactions on Smart Processing and Computing
- /
- 제6권1호
- /
- pp.53-59
- /
- 2017
Optical character recognition (OCR) automatically recognizes text in an image. OCR is still a challenging problem in computer vision. A successful solution to OCR has important device applications, such as text-to-speech conversion and automatic document classification. In this work, we analyze character recognition performance using the current state-of-the-art deep-learning structures. One is the AlexNet structure, another is the LeNet structure, and the other one is the SPNet structure. For this, we have built our own dataset that contains digits and upper- and lower-case characters. We experiment in the presence of salt-and-pepper noise or Gaussian noise, and report the performance comparison in terms of recognition error. Experimental results indicate by five-fold cross-validation that the SPNet structure (our approach) outperforms AlexNet and LeNet in recognition error.
https://doi.org/10.5573/IEIESPC.2017.6.1.053 인용 PDF KSCI

자동차 잡음환경에서의 음성인식에 적용된 두 종류의 일반화된 감마분포 기반의 음성추정 알고리즘 비교 (Comparison of Two Speech Estimation Algorithms Based on Generalized-Gamma Distribution Applied to Speech Recognition in Car Noisy Environment)

김형국;이진호
- 한국ITS학회 논문지
- /
- 제8권4호
- /
- pp.28-32
- /
- 2009
본 논문은 DFT기반의 단일마이크 음성향상 방식에 적용된 두 종류의 generalized-Gamma 분포기반의 음성추정 알고리즘을 비교한다. 음성향상 방식으로서는 최소잡음성분에 의한 회귀적인 평균스펙트럼 값으로부터 유도되는 잡음 추정을 각각 $\kappa$=1인 경우와 $\kappa$=2인 경우의 Gamma 분포를 이용한 음성추정 기법에 결합하여 음질을 향상시켰다. 각 방식에 의해 향상된 음성신호를 자동차 환경에서의 음성인식에 적용하여 그 성능을 비교하였다.
PDF

VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교 (Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM)

김성종;정훈;정익주
- 음성과학
- /
- 제7권2호
- /
- pp.235-244
- /
- 2000
This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.
PDF

Improvements on MFCC by Elaboration of the Filter Banks and Windows

Lee, Chang-Young
- 음성과학
- /
- 제14권4호
- /
- pp.131-144
- /
- 2007
In an effort to improve the performance of mel frequency cepstral coefficients (MFCC), we investigate the effects of varying the parameters for the filter banks and their associated windows on speech recognition rates. Specifically, the mel and bark scales are combined with various types of filter bank windows. Comparison and evaluation of the suggested methods are performed by two independent ways of speech recognition and the Fisher discriminant objective function. It is shown that the Hanning window based on the bark scale yields 28.1% relative performance improvements over the triangular window with the mel scale in speech recognition error rate. Further work on incorporating PCA and/or LDA would be desirable as a postprocessor to MFCC extraction.
PDF

검색결과 854건 처리시간 0.032초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)