통합 검색 | Korea Science

ON IMPROVING THE PERFORMANCE OF CODED SPECTRAL PARAMETERS FOR SPEECH RECOGNITION

Choi, Seung-Ho;Kim, Hong-Kook;Lee, Hwang-Soo
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.250-253
- /
- 1998
In digital communicatioin networks, speech recognition systems conventionally reconstruct speech followed by extracting feature [parameters. In this paper, we consider a useful approach by incorporating speech coding parameters into the speech recognizer. Most speech coders employed in the networks represent line spectral pairs as spectral parameters. In order to improve the recognition performance of the LSP-based speech recognizer, we introduce two different ways: one is to devise weighed distance measures of LSPs and the other is to transform LSPs into a new feature set, named a pseudo-cepstrum. Experiments on speaker-independent connected-digit recognition showed that the weighted distance measures significantly improved the recognition accuracy than the unweighted one of LSPs. Especially we could obtain more improved performance by using PCEP. Compared to the conventional methods employing mel-frequency cepstral coefficients, the proposed methods achieved higher performance in recognition accuracies.
PDF

영상 정규화 및 얼굴인식 알고리즘에 따른 거리별 얼굴인식 성능 분석 (Performance Analysis of Face Recognition by Distance according to Image Normalization and Face Recognition Algorithm)

문해민;반성범
- 정보보호학회논문지
- /
- 제23권4호
- /
- pp.737-742
- /
- 2013
최근 감시시스템은 휴먼인식 기술을 활용하여 스스로 판단하고 대처할 수 있는 지능형으로 발전하고 있다. 기존 얼굴인식 기술은 근거리에서 인식성능이 우수하지만 원거리로 갈수록 인식률이 떨어진다. 본 논문에서는 원거리 휴먼인식을 위해 거리별 얼굴영상을 학습으로 사용한 얼굴인식에서 보간법 및 얼굴인식 알고리즘에 따른 얼굴인식률의 성능을 분석한다. 영상 정규화에는 최근접 이웃, 양선형, 양3차회선, Lanczos3 보간법을 사용하고, 얼굴인식 알고리즘은 PCA와 LDA를 사용한다. 실험결과, 영상 정규화로 양선형 보간법과 얼굴인식 알고리즘으로 LDA를 사용했을 때 우수한 성능을 나타냄을 확인하였다.
https://doi.org/10.13089/JKIISC.2013.23.4.737 인용 PDF KSCI HTML

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

이길호;윤재삼;오유리;김홍국
- 대한음성학회지:말소리
- /
- 제54호
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

RASTA 필터를 이용한 립리딩 성능향상에 관한 연구 (A Study on Lip-reading enhancement using RATSTA fileter)

신도성;김진영;최승호;김상훈
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2002년도 11월 학술대회지
- /
- pp.191-194
- /
- 2002
Lip-reading technology that is studied them is used to compensate speech recognition degradation in noise environment in bi-modal's form. The most important thing is that search for correct lips area in this lip-reading. But, it is hard to forecast stable performance in dynamic environment. Used RASTA filter that show good performance to remove noise in the speech to compensate. This filter shows that improve performance of using time domain of digital filter. To this experiment observes performance of speech recognition only using image information, service chooses possible 22 words and did recognition experiment in car. We used hidden Markov model by speech recognition algorithm to compare this words' recognition performance.
PDF

Effects of Smart Factory Quality Characteristics and Dynamic Capabilities on Business Performance: Mediating Effect of Recognition Response

CHO, Ik-Jun;KIM, Jin-Kwon;YANG, Hoe-Chang;AHN, Tony-DongHui
- 산경연구논집
- /
- 제11권12호
- /
- pp.17-28
- /
- 2020
Purpose: The purpose of this study is to confirm the strategic direction of the firm regarding the capabilities of the organization and its employees in order to increase the utilization and business performance of employees by that introduce smart factories in the domestic manufacturing industry. Research design, data, and methodology: This study derived a structured research model to confirm the mediating effect of recognition responses between the quality characteristics of smart factories and dynamic capabilities. For the analysis, a total of 143 valid questionnaires were used for 200 companies that introduced smart factories from domestic SME's. Results: Quality Characteristics of Smart Factory and Dynamic Capabilities had a statistically significant effect on Usefulness. Recognition Response had a statistically mediating on the relationship between quality characteristics of smart factory and business performance. Recognition Response had a statistically significant effect on business performance. Conclusions: It suggests that firms introducing smart factory reflect them in their empowerment strategic because the recognition responses of its employees differ according to the quality characteristics and dynamic capabilities of smart factories. It also means that the information derived from the smart factory system is useful and effective to business performance and employees.
https://doi.org/10.13106/jidb.2020.vol11.no12.17 인용 PDF KSCI HTML

On Wavelet Transform Based Feature Extraction for Speech Recognition Application

Kim, Jae-Gil
- The Journal of the Acoustical Society of Korea
- /
- 제17권2E호
- /
- pp.31-37
- /
- 1998
This paper proposes a feature extraction method using wavelet transform for speech recognition. Speech recognition system generally carries out the recognition task based on speech features which are usually obtained via time-frequency representations such as Short-Time Fourier Transform (STFT) and Linear Predictive Coding(LPC). In some respects these methods may not be suitable for representing highly complex speech characteristics. They map the speech features with same may not frequency resolutions at all frequencies. Wavelet transform overcomes some of these limitations. Wavelet transform captures signal with fine time resolutions at high frequencies and fine frequency resolutions at low frequencies, which may present a significant advantage when analyzing highly localized speech events. Based on this motivation, this paper investigates the effectiveness of wavelet transform for feature extraction of wavelet transform for feature extraction focused on enhancing speech recognition. The proposed method is implemented using Sampled Continuous Wavelet Transform (SCWT) and its performance is tested on a speaker-independent isolated word recognizer that discerns 50 Korean words. In particular, the effect of mother wavelet employed and number of voices per octave on the performance of proposed method is investigated. Also the influence on the size of mother wavelet on the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is compared with the most prevalent conventional method, MFCC (Mel0frequency Cepstral Coefficient). The experiments show that the recognition performance of the proposed method is better than that of MFCC. But the improvement is marginal while, due to the dimensionality increase, the computational loads of proposed method is substantially greater than that of MFCC.
PDF

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
- ETRI Journal
- /
- 제46권1호
- /
- pp.22-34
- /
- 2024
Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.
https://doi.org/10.4218/etrij.2023-0266 인용 PDF

위탁급식소 영양사와 조리종사원의 HACCP 적용지식 및 직무수행수준에 대한 인지도 평가 (Assessment of Hygiene Knowledge and Recognition on Job Performance Levels for HACCP Implementation for Dieticians and Employees at Contract Foodservices)

문혜경;전지영;류은순
- 대한영양사협회학술지
- /
- 제10권3호
- /
- pp.261-271
- /
- 2004
The purpose of this study was to provide basic data for practical HACCP training. A survey was conducted and analysed on 46 contract foodservices: 13 "Appointed" foodservices (appointed by Korean Food & Drug Administration), 17 "Voluntary Applying" foodservices (voluntarily applied HACCP, but not appointed), 16 "Non-applying" foodservices (not applied HACCP). Hygiene knowledge and recognition on job performance levels for HACCP application for 46 dieticians and 361 employees were surveyed. According to the survey, 61.5% of the "Appointed" dieticians took HACCP training from outside the company, 58.8% of "Voluntary Applying" dieticians took in-house HACCP training, and 62.4% of "Non-applying" dieticians have not taken any HACCP training. As for the comparison of hygiene knowledge, total mean of employees (6.38) showed significantly lower average than that of the dieticians (7.82) (p<0.001). From the result for recognition on job performance levels, total mean of dieticians (3.91) indicated generally good performance while employees (3.41) (p<0.001) showed considerably lower recognition. Hygiene knowledge and recognition on job performance levels of both dieticians and employees showed considerably close correlation (p<0.01 or p<0.05).
PDF

Use of Word Clustering to Improve Emotion Recognition from Short Text

Yuan, Shuai;Huang, Huan;Wu, Linjing
- Journal of Computing Science and Engineering
- /
- 제10권4호
- /
- pp.103-110
- /
- 2016
Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.
https://doi.org/10.5626/JCSE.2016.10.4.103 인용 PDF KSCI

AI 멀티모달 센서 기반 보행자 영상인식 알고리즘 (AI Multimodal Sensor-based Pedestrian Image Recognition Algorithm)

신성윤;조승표;조광현
- 한국컴퓨터정보학회:학술대회논문집
- /
- 한국컴퓨터정보학회 2023년도 제67차 동계학술대회논문집 31권1호
- /
- pp.407-408
- /
- 2023
In this paper, we intend to develop a multimodal algorithm that secures recognition performance of over 95% in daytime illumination environments and secures recognition performance of over 90% in bad weather (rainfall and snow) and night illumination environments.
PDF

검색결과 3,870건 처리시간 0.043초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)