통합 검색 | Korea Science

ON IMPROVING THE PERFORMANCE OF CODED SPECTRAL PARAMETERS FOR SPEECH RECOGNITION

Choi, Seung-Ho;Kim, Hong-Kook;Lee, Hwang-Soo
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 1998년도 제15회 음성통신 및 신호처리 워크샵(KSCSP 98 15권1호)
- /
- pp.250-253
- /
- 1998
In digital communicatioin networks, speech recognition systems conventionally reconstruct speech followed by extracting feature [parameters. In this paper, we consider a useful approach by incorporating speech coding parameters into the speech recognizer. Most speech coders employed in the networks represent line spectral pairs as spectral parameters. In order to improve the recognition performance of the LSP-based speech recognizer, we introduce two different ways: one is to devise weighed distance measures of LSPs and the other is to transform LSPs into a new feature set, named a pseudo-cepstrum. Experiments on speaker-independent connected-digit recognition showed that the weighted distance measures significantly improved the recognition accuracy than the unweighted one of LSPs. Especially we could obtain more improved performance by using PCEP. Compared to the conventional methods employing mel-frequency cepstral coefficients, the proposed methods achieved higher performance in recognition accuracies.
PDF

동적 도시 환경에서 의미론적 시각적 장소 인식 (Semantic Visual Place Recognition in Dynamic Urban Environment)

사바 아르샤드;김곤우
- 로봇학회논문지
- /
- 제17권3호
- /
- pp.334-338
- /
- 2022
In visual simultaneous localization and mapping (vSLAM), the correct recognition of a place benefits in relocalization and improved map accuracy. However, its performance is significantly affected by the environmental conditions such as variation in light, viewpoints, seasons, and presence of dynamic objects. This research addresses the problem of feature occlusion caused by interference of dynamic objects leading to the poor performance of visual place recognition algorithm. To overcome the aforementioned problem, this research analyzes the role of scene semantics in correct detection of a place in challenging environments and presents a semantics aided visual place recognition method. Semantics being invariant to viewpoint changes and dynamic environment can improve the overall performance of the place matching method. The proposed method is evaluated on the two benchmark datasets with dynamic environment and seasonal changes. Experimental results show the improved performance of the visual place recognition method for vSLAM.
https://doi.org/10.7746/jkros.2022.17.3.334 인용 PDF KSCI

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

이길호;윤재삼;오유리;김홍국
- 대한음성학회지:말소리
- /
- 제54호
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

RASTA 필터를 이용한 립리딩 성능향상에 관한 연구 (A Study on Lip-reading enhancement using RATSTA fileter)

신도성;김진영;최승호;김상훈
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2002년도 11월 학술대회지
- /
- pp.191-194
- /
- 2002
Lip-reading technology that is studied them is used to compensate speech recognition degradation in noise environment in bi-modal's form. The most important thing is that search for correct lips area in this lip-reading. But, it is hard to forecast stable performance in dynamic environment. Used RASTA filter that show good performance to remove noise in the speech to compensate. This filter shows that improve performance of using time domain of digital filter. To this experiment observes performance of speech recognition only using image information, service chooses possible 22 words and did recognition experiment in car. We used hidden Markov model by speech recognition algorithm to compare this words' recognition performance.
PDF

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
- ETRI Journal
- /
- 제46권1호
- /
- pp.22-34
- /
- 2024
Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.
https://doi.org/10.4218/etrij.2023-0266 인용 PDF

On Wavelet Transform Based Feature Extraction for Speech Recognition Application

Kim, Jae-Gil
- The Journal of the Acoustical Society of Korea
- /
- 제17권2E호
- /
- pp.31-37
- /
- 1998
This paper proposes a feature extraction method using wavelet transform for speech recognition. Speech recognition system generally carries out the recognition task based on speech features which are usually obtained via time-frequency representations such as Short-Time Fourier Transform (STFT) and Linear Predictive Coding(LPC). In some respects these methods may not be suitable for representing highly complex speech characteristics. They map the speech features with same may not frequency resolutions at all frequencies. Wavelet transform overcomes some of these limitations. Wavelet transform captures signal with fine time resolutions at high frequencies and fine frequency resolutions at low frequencies, which may present a significant advantage when analyzing highly localized speech events. Based on this motivation, this paper investigates the effectiveness of wavelet transform for feature extraction of wavelet transform for feature extraction focused on enhancing speech recognition. The proposed method is implemented using Sampled Continuous Wavelet Transform (SCWT) and its performance is tested on a speaker-independent isolated word recognizer that discerns 50 Korean words. In particular, the effect of mother wavelet employed and number of voices per octave on the performance of proposed method is investigated. Also the influence on the size of mother wavelet on the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is discussed. Throughout the experiments, the performance of proposed method is compared with the most prevalent conventional method, MFCC (Mel0frequency Cepstral Coefficient). The experiments show that the recognition performance of the proposed method is better than that of MFCC. But the improvement is marginal while, due to the dimensionality increase, the computational loads of proposed method is substantially greater than that of MFCC.
PDF

위탁급식소 영양사와 조리종사원의 HACCP 적용지식 및 직무수행수준에 대한 인지도 평가 (Assessment of Hygiene Knowledge and Recognition on Job Performance Levels for HACCP Implementation for Dieticians and Employees at Contract Foodservices)

문혜경;전지영;류은순
- 대한영양사협회학술지
- /
- 제10권3호
- /
- pp.261-271
- /
- 2004
The purpose of this study was to provide basic data for practical HACCP training. A survey was conducted and analysed on 46 contract foodservices: 13 "Appointed" foodservices (appointed by Korean Food & Drug Administration), 17 "Voluntary Applying" foodservices (voluntarily applied HACCP, but not appointed), 16 "Non-applying" foodservices (not applied HACCP). Hygiene knowledge and recognition on job performance levels for HACCP application for 46 dieticians and 361 employees were surveyed. According to the survey, 61.5% of the "Appointed" dieticians took HACCP training from outside the company, 58.8% of "Voluntary Applying" dieticians took in-house HACCP training, and 62.4% of "Non-applying" dieticians have not taken any HACCP training. As for the comparison of hygiene knowledge, total mean of employees (6.38) showed significantly lower average than that of the dieticians (7.82) (p<0.001). From the result for recognition on job performance levels, total mean of dieticians (3.91) indicated generally good performance while employees (3.41) (p<0.001) showed considerably lower recognition. Hygiene knowledge and recognition on job performance levels of both dieticians and employees showed considerably close correlation (p<0.01 or p<0.05).
PDF

Effects of Smart Factory Quality Characteristics and Dynamic Capabilities on Business Performance: Mediating Effect of Recognition Response

CHO, Ik-Jun;KIM, Jin-Kwon;YANG, Hoe-Chang;AHN, Tony-DongHui
- 산경연구논집
- /
- 제11권12호
- /
- pp.17-28
- /
- 2020
Purpose: The purpose of this study is to confirm the strategic direction of the firm regarding the capabilities of the organization and its employees in order to increase the utilization and business performance of employees by that introduce smart factories in the domestic manufacturing industry. Research design, data, and methodology: This study derived a structured research model to confirm the mediating effect of recognition responses between the quality characteristics of smart factories and dynamic capabilities. For the analysis, a total of 143 valid questionnaires were used for 200 companies that introduced smart factories from domestic SME's. Results: Quality Characteristics of Smart Factory and Dynamic Capabilities had a statistically significant effect on Usefulness. Recognition Response had a statistically mediating on the relationship between quality characteristics of smart factory and business performance. Recognition Response had a statistically significant effect on business performance. Conclusions: It suggests that firms introducing smart factory reflect them in their empowerment strategic because the recognition responses of its employees differ according to the quality characteristics and dynamic capabilities of smart factories. It also means that the information derived from the smart factory system is useful and effective to business performance and employees.
https://doi.org/10.13106/jidb.2020.vol11.no12.17 인용 PDF KSCI HTML

Use of Word Clustering to Improve Emotion Recognition from Short Text

Yuan, Shuai;Huang, Huan;Wu, Linjing
- Journal of Computing Science and Engineering
- /
- 제10권4호
- /
- pp.103-110
- /
- 2016
Emotion recognition is an important component of affective computing, and is significant in the implementation of natural and friendly human-computer interaction. An effective approach to recognizing emotion from text is based on a machine learning technique, which deals with emotion recognition as a classification problem. However, in emotion recognition, the texts involved are usually very short, leaving a very large, sparse feature space, which decreases the performance of emotion classification. This paper proposes to resolve the problem of feature sparseness, and largely improve the emotion recognition performance from short texts by doing the following: representing short texts with word cluster features, offering a novel word clustering algorithm, and using a new feature weighting scheme. Emotion classification experiments were performed with different features and weighting schemes on a publicly available dataset. The experimental results suggest that the word cluster features and the proposed weighting scheme can partly resolve problems with feature sparseness and emotion recognition performance.
https://doi.org/10.5626/JCSE.2016.10.4.103 인용 PDF KSCI

연결 숫자음 인식 시스템의 구현과 성능 변화 (A Study on the Implementation of Connected-Digit Recognition System and Changes of its Performance)

윤영선;박윤상;채의근
- 대한음성학회지:말소리
- /
- 제45호
- /
- pp.47-61
- /
- 2003
In this paper, we consider the implementation of connected digit recognition system and the several approaches to improve its performance. To implement efficiently the fixed or variable length digit recognition system, finite state network (FSN) is required. We merge the word network algorithm that implements the FSN with one pass dynamic programming search algorithm that is used for general speech recognition system for fast search. To find the efficient modeling of digit recognition system, we perform some experiments along the various conditions to affect the performance and summarize the results.
PDF

검색결과 3,859건 처리시간 0.035초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)