Search | Korea Science

Robust Speech Recognition Using Weighted Auto-Regressive Moving Average Filter (가중 ARMA 필터를 이용한 강인한 음성인식)

Ban, Sung-Min;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.2 no.4
- /
- pp.145-151
- /
- 2010
In this paper, a robust feature compensation method is proposed for improving the performance of speech recognition. The proposed method is incorporated into the auto-regressive moving average (ARMA) based feature compensation. We employ variable weights for the ARMA filter according to the degree of speech activity, and pass the normalized cepstral sequence through the weighted ARMA filter. Additionally when normalizing the cepstral sequences in training, the cepstral means and variances are estimated from total training utterances. Experimental results show the proposed method significantly improves the speech recognition performance in the noisy and reverberant environments.
PDF

An Optimality Theoretic Approach to the Feature Model for Speech Understanding

Kim, Kee-Ho
- Speech Sciences
- /
- v.2
- /
- pp.109-124
- /
- 1997
This paper shows how a distinctive feature model can effectively be implemented into speech understanding within the framework of the Optimality Theory(OT); i.e., to show how distinctive features can optimally be extracted from given speech signals, and how segments can be chosen as the optimal ones among plausible candidates. This paper will also show how the sequence of segments can successfully be matched with optimal words in a lexicon.
PDF

Target Speech Segregation Using Non-parametric Correlation Feature Extraction in CASA System (CASA 시스템의 비모수적 상관 특징 추출을 이용한 목적 음성 분리)

Choi, Tae-Woong;Kim, Soon-Hyub
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.1
- /
- pp.79-85
- /
- 2013
Feature extraction of CASA system uses time continuity and channel similarity and makes correlogram of auditory elements for the use. In case of using feature extraction with cross correlation coefficient for channel similarity, it has much computational complexity in order to display correlation quantitatively. Therefore, this paper suggests feature extraction method using non-parametric correlation coefficient in order to reduce computational complexity when extracting the feature and tests to segregate target speech by CASA system. As a result of measuring SNR (Signal to Noise Ratio) for the performance evaluation of target speech segregation, the proposed method shows a slight improvement of 0.14 dB on average over the conventional method.
https://doi.org/10.7776/ASK.2013.32.1.079 인용 PDF KSCI

Effective Feature Extraction in the Individual frequency Sub-bands for Speech Recognition (음성인식을 위한 주파수 부대역별 효과적인 특징추출)

지상문
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.7 no.4
- /
- pp.598-603
- /
- 2003
This paper presents a sub-band feature extraction approach in which the feature extraction method in the individual frequency sub-bands is determined in terms of speech recognition accuracy. As in the multi-band paradigm, features are extracted independently in frequency sub-regions of the speech signal. Since the spectral shape is well structured in the low frequency region, the all pole model is effective for feature extraction. But, in the high frequency region, the nonparametric transform, discrete cosine transform is effective for the extraction of cepstrum. Using the sub-band specific feature extraction method, the linguistic information in the individual frequency sub-bands can be extracted effectively for automatic speech recognition. The validity of the proposed method is shown by comparing the results of speech recognition experiments for our method with those obtained using a full-band feature extraction method.
PDF KSCI

Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters (채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교)

Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.201-204
- /
- 2002
As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.
PDF

The Role of L1 Phonological Feature in the L2 Perception and Production of Vowel Length Contrast in English

Chang, Woo-Hyeok
- Speech Sciences
- /
- v.15 no.1
- /
- pp.37-51
- /
- 2008
The main goal of this study is to examine if there is a difference in the utilization of a vowel length cue between Korean and Japanese L2 learners of English in their perception and production of postvocalic coda contrast in English. Given that Japanese subjects' performances on the identification and production tasks were much better than Korean subjects' performance, we may support the prediction based on the Feature Hypothesis which maintains that L1 phonological features can facilitate the perception of L2 acoustic cue. Since vowel length contrast is a phonological feature in Japanese but not in Korean, the tasks, which assess L2 leaners' ability to discriminate vowel length contrast in English, are much easier for the Japanese group than for the Korean group. Although the Japanese subjects demonstrated a better performance than the Korean subjects, the performance of the Japanese group was worse than that of the English control group. This finding implies that L2 learners, even Japanese learners, should be taught that the durational difference of the preceding vowels is the most important cue to differentiate postvocalic contrastive codas in English.
PDF

Speaker Adaptation Using ICA-Based Feature Transformation

Jung, Ho-Young;Park, Man-Soo;Kim, Hoi-Rin;Hahn, Min-Soo
- ETRI Journal
- /
- v.24 no.6
- /
- pp.469-472
- /
- 2002
Speaker adaptation techniques are generally used to reduce speaker differences in speech recognition. In this work, we focus on the features fitted to a linear regression-based speaker adaptation. These are obtained by feature transformation based on independent component analysis (ICA), and the feature transformation matrices are estimated from the training data and adaptation data. Since the adaptation data is not sufficient to reliably estimate the ICA-based feature transformation matrix, it is necessary to adjust the ICA-based feature transformation matrix estimated from a new speaker utterance. To cope with this problem, we propose a smoothing method through a linear interpolation between the speaker-independent (SI) feature transformation matrix and the speaker-dependent (SD) feature transformation matrix. From our experiments, we observed that the proposed method is more effective in the mismatched case. In the mismatched case, the adaptation performance is improved because the smoothed feature transformation matrix makes speaker adaptation using noisy speech more robust.
PDF

Non-Intrusive Speech Intelligibility Estimation Using Autoencoder Features with Background Noise Information

Jeong, Yue Ri;Choi, Seung Ho
- International Journal of Internet, Broadcasting and Communication
- /
- v.12 no.3
- /
- pp.220-225
- /
- 2020
This paper investigates the non-intrusive speech intelligibility estimation method in noise environments when the bottleneck feature of autoencoder is used as an input to a neural network. The bottleneck feature-based method has the problem of severe performance degradation when the noise environment is changed. In order to overcome this problem, we propose a novel non-intrusive speech intelligibility estimation method that adds the noise environment information along with bottleneck feature to the input of long short-term memory (LSTM) neural network whose output is a short-time objective intelligence (STOI) score that is a standard tool for measuring intrusive speech intelligibility with reference speech signals. From the experiments in various noise environments, the proposed method showed improved performance when the noise environment is same. In particular, the performance was significant improved compared to that of the conventional methods in different environments. Therefore, we can conclude that the method proposed in this paper can be successfully used for estimating non-intrusive speech intelligibility in various noise environments.
https://doi.org/10.7236/IJIBC.2020.12.3.220 인용 PDF KSCI

Speech Recognition Error Compensation using MFCC and LPC Feature Extraction Method (MFCC와 LPC 특징 추출 방법을 이용한 음성 인식 오류 보정)

Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.11 no.6
- /
- pp.137-142
- /
- 2013
Speech recognition system is input of inaccurate vocabulary by feature extraction case of recognition by appear result of unrecognized or similar phoneme recognized. Therefore, in this paper, we propose a speech recognition error correction method using phoneme similarity rate and reliability measures based on the characteristics of the phonemes. Phonemes similarity rate was phoneme of learning model obtained used MFCC and LPC feature extraction method, measured with reliability rate. Minimize the error to be unrecognized by measuring the rate of similar phonemes and reliability. Turned out to error speech in the process of speech recognition was error compensation performed. In this paper, the result of applying the proposed system showed a recognition rate of 98.3%, error compensation rate 95.5% in the speech recognition.
https://doi.org/10.14400/JDPM.2013.11.6.137 인용 PDF

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface) (인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법)

Park, Jin-Soo;Ko, Han-Seok
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.2
- /
- pp.147-156
- /
- 2013
In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.
https://doi.org/10.7776/ASK.2013.32.2.147 인용 PDF KSCI

Search Result 711, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)