Search | Korea Science

Speech Recognition Accuracy Measure using Deep Neural Network for Effective Evaluation of Speech Recognition Performance (효과적인 음성 인식 평가를 위한 심층 신경망 기반의 음성 인식 성능 지표)

Ji, Seung-eun;Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.12
- /
- pp.2291-2297
- /
- 2017
This paper describe to extract speech measure algorithm for evaluating a speech database, and presents generating method of a speech quality measure using DNN(Deep Neural Network). In our previous study, to produce an effective speech quality measure, we propose a combination of various speech measures which are highly correlated with WER(Word Error Rate). The new combination of various types of speech quality measures in this study is more effective to predict the speech recognition performance compared to each speech measure alone. In this paper, we describe the method of extracting measure using DNN, and we change one of the combined measure from GMM(Gaussican Mixture Model) score used in the previous study to DNN score. The combination with DNN score shows a higher correlation with WER compared to the combination with GMM score.
https://doi.org/10.6109/jkiice.2017.21.12.2291 인용 PDF KSCI

A Content-Based Image Retrieval Technique Using the Shape and Color Features of Objects (객체의 모양과 색상특징을 이용한 내용기반 영상검색 기법)

박종현;박순영;오일환
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.24 no.10B
- /
- pp.1902-1911
- /
- 1999
In this paper we present a content-based image retrieval algorithm using the visual feature vectors which describe the spatial characteristics of objects. The proposed technique uses the Gaussian mixture model(GMM) to represent multi-colored objects and the expectation maximization(EM) algorithm is employed to estimate the maximum likelihood(ML) parameters of the model. After image segmentation is performed based on GMM, the shape and color features are extracted from each object using Fourier descriptors and color histograms, respectively. Image retrieval consists of two steps: first, the shape-based query is carried out to find the candidate images whose objects have the similar shapes with the query image and second, the color-based query is followed. The experimental results show that the proposed algorithm is effective in image retrieving by using the spatial and visual features of segmented objects.
PDF

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.3-10
- /
- 2015
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
https://doi.org/10.13064/KSSS.2015.7.1.003 인용 PDF KSCI

A Statistically Model-Based Adaptive Technique to Unsupervised Segmentation of MR Images (자기공명영상의 비지도 분할을 위한 통계적 모델기반 적응적 방법)

Kim, Tae-Woo
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.1
- /
- pp.286-295
- /
- 2000
We present a novel statistically adaptive method using the Minimum Description Length(MDL) principle for unsupervised segmentation of magnetic resonance(MR) images. In the method, Markov random filed(MRF) modeling of tissue region accounts for random noise. Intensity measurements on the local region defined by a window are modeled by a finite Gaussian mixture, which accounts for image inhomogeneities. The segmentation algorithm is based on an iterative conditional modes(ICM) algorithm, approximately finds maximum ${\alpha}$ posteriori(MAP) estimation, and estimates model parameters on the local region. The size of the window for parameter estimation and segmentation is estimated from the image using the MDL principle. In the experiments, the technique well reflected image characteristic of the local region and showed better results than conventional methods in segmentation of MR images with inhomogeneities, especially.
PDF

Phoneme segmentation and Recognition using Support Vector Machines (Support Vector Machines에 의한 음소 분할 및 인식)

Lee, Gwang-Seok;Kim, Deok-Hyun
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2010.05a
- /
- pp.981-984
- /
- 2010
In this paper, we used Support Vector Machines(SVMs) as the learning method, one of Artificial Neural Network, to segregated from the continuous speech into phonemes, an initial, medial, and final sound, and then, performed continuous speech recognition from it. A Decision boundary of phoneme is determined by algorithm with maximum frequency in a short interval. Speech recognition process is performed by Continuous Hidden Markov Model(CHMM), and we compared it with another phoneme segregated from the eye-measurement. From the simulation results, we confirmed that the method, SVMs, we proposed is more effective in an initial sound than Gaussian Mixture Models(GMMs).
PDF

Regionalization using cluster probability model and copula based drought frequency analysis (클러스터 확률 모형에 의한 지역화와 코풀라에 의한 가뭄빈도분석)

Azam, Muhammad;Choi, Hyun Su;Kim, Hyeong San;Hwang, Ju Ha;Maeng, Seungjin
- Proceedings of the Korea Water Resources Association Conference
- /
- 2017.05a
- /
- pp.46-46
- /
- 2017
지역가뭄빈도분석의 분위산정에 대한 신뢰성은 수문학적으로 균일한 지역으로 구분하기 위해 사용된 장기간의 과거 자료와 분석절차에 의해 결정된다. 그러나 극심한 가뭄은 매우 드물게 발생하며 신뢰 할 수 있는 지역빈도분석을 위한 지속기간이 충분치 않는 경우가 많이 발생한다. 이 외에도 우리나라의 복잡한 지형적 및 기후적 특징은 동질한 지역으로 구분하기 위한 통계적인 처리방법이 필요하였다. 본 연구에서 적용한 지역빈도분석은 여러 지역의 다양한 변수인 수문기상 특성을 분석하여 동질한 지역을 확인하고, 주요 가뭄변수(지속 시간 및 심각도)를 통합 적용하여 각각의 동질한 지역 분위를 추정함으로써 동질한 지역을 구분하는 해결책을 제시하였다. 본 연구에서는 가우시안 혼합 모형(Gaussian Mixture Model)을 기반으로 기반 군집분석 방법을 적용하여 최적의 동질한 지역을 구분하고 그 결과를 우도비검정 및 다른 유효성 검사 지수를 이용해서 확인하였다. 가우시안 혼합 모델에서 산정했던 매개변수를 방향저감 공간으로 표현하기 위해서 가우시안 혼합 모델방향 저감(GMMDR)방법을 적용하였다. 이 변수는 가뭄빈도분석을 위해 다양한 분포와 코풀라(copula) 적합도를 이용하여 추정 비교하였다. 그 결과 우리나라를 4개의 동질한 지역으로 나누게 되었다. 가우시안과 Frank copula를 이용한 Pearson type III(PE3) 분포는 우리나라의 가뭄 기간과 심각도의 공동 분포를 추정하는데 적합한 것으로 나타났다.
PDF

Detection of Gradual Transitions in MPEG Compressed Video using Hidden Markov Model (은닉 마르코프 모델을 이용한 MPEG 압축 비디오에서의 점진적 변환의 검출)

Choi, Sung-Min;Kim, Dai-Jin;Bang, Sung-Yang
- Journal of KIISE:Software and Applications
- /
- v.31 no.3
- /
- pp.379-386
- /
- 2004
Video segmentation is a fundamental task in video indexing and it includes two kinds of shot change detections such as the abrupt transition and the gradual transition. The abrupt shot boundaries are detected by computing the image-based distance between adjacent frames and comparing this distance with a pre-determined threshold value. However, the gradual shot boundaries are difficult to detect with this approach. To overcome this difficulty, we propose the method that detects gradual transition in the MPEG compressed video using the HMM (Hidden Markov Model). We take two different HMMs such as a discrete HMM and a continuous HMM with a Gaussian mixture model. As image features for HMM's observations, we use two distinct features such as the difference of histogram of DC images between two adjacent frames and the difference of each individual macroblock's deviations at the corresponding macroblock's between two adjacent frames, where deviation means an arithmetic difference of each macroblock's DC value from the mean of DC values in the given frame. Furthermore, we obtain the DC sequences of P and B frame by the first order approximation for a fast and effective computation. Experiment results show that we obtain the best detection and classification performance of gradual transitions when a continuous HMM with one Gaussian model is taken and two image features are used together.
PDF KSCI

Text Independent Speaker Verficiation Using Dominant State Information of HMM-UBM (HMM-UBM의 주 상태 정보를 이용한 음성 기반 문맥 독립 화자 검증)

Shon, Suwon;Rho, Jinsang;Kim, Sung Soo;Lee, Jae-Won;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.34 no.2
- /
- pp.171-176
- /
- 2015
We present a speaker verification method by extracting i-vectors based on dominant state information of Hidden Markov Model (HMM) - Universal Background Model (UBM). Ergodic HMM is used for estimating UBM so that various characteristic of individual speaker can be effectively classified. Unlike Gaussian Mixture Model(GMM)-UBM based speaker verification system, the proposed system obtains i-vectors corresponding to each HMM state. Among them, the i-vector for feature is selected by extracting it from the specific state containing dominant state information. Relevant experiments are conducted for validating the proposed system performance using the National Institute of Standards and Technology (NIST) 2008 Speaker Recognition Evaluation (SRE) database. As a result, 12 % improvement is attained in terms of equal error rate.
https://doi.org/10.7776/ASK.2015.34.2.171 인용 PDF KSCI

HMM-Based Bandwidth Extension Using Baum-Welch Re-Estimation Algorithm (Baum-Welch 학습법을 이용한 HMM 기반 대역폭 확장법)

Song, Geun-Bae;Kim, Austin
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.6
- /
- pp.259-268
- /
- 2007
This paper contributes to an improvement of the statistical bandwidth extension(BWE) system based on Hidden Markov Model(HMM). First, the existing HMM training method for BWE, which is suggested originally by Jax, is analyzed in comparison with the general Baum-Welch training method. Next, based on this analysis, a new HMM-based BWE method is suggested which adopts the Baum-Welch re-estimation algorithm instead of the Jax's to train HMM model. Conclusionally speaking, the Baum-Welch re-estimation algorithm is a generalized form of the Jax's training method. It is flexible and adaptive in modeling the statistical characteristic of training data. Therefore, it generates a better model to the training data, which results in an enhanced BWE system. According to experimental results, the new method performs much better than the Jax's BWE systemin all cases. Under the given test conditions, the RMS log spectral distortion(LSD) scores were improved ranged from 0.31dB to 0.8dB, and 0.52dB in average.
https://doi.org/10.7776/ASK.2007.26.6.259 인용 PDF KSCI

Graph Cut-based Automatic Color Image Segmentation using Mean Shift Analysis (Mean Shift 분석을 이용한 그래프 컷 기반의 자동 칼라 영상 분할)

Park, An-Jin;Kim, Jung-Whan;Jung, Kee-Chul
- Journal of KIISE:Software and Applications
- /
- v.36 no.11
- /
- pp.936-946
- /
- 2009
A graph cuts method has recently attracted a lot of attentions for image segmentation, as it can globally minimize energy functions composed of data term that reflects how each pixel fits into prior information for each class and smoothness term that penalizes discontinuities between neighboring pixels. In previous approaches to graph cuts-based automatic image segmentation, GMM(Gaussian mixture models) is generally used, and means and covariance matrixes calculated by EM algorithm were used as prior information for each cluster. However, it is practicable only for clusters with a hyper-spherical or hyper-ellipsoidal shape, as the cluster was represented based on the covariance matrix centered on the mean. For arbitrary-shaped clusters, this paper proposes graph cuts-based image segmentation using mean shift analysis. As a prior information to estimate the data term, we use the set of mean trajectories toward each mode from initial means randomly selected in $L^*u^*{\upsilon}^*$ color space. Since the mean shift procedure requires many computational times, we transform features in continuous feature space into 3D discrete grid, and use 3D kernel based on the first moment in the grid, which are needed to move the means to modes. In the experiments, we investigate the problems of mean shift-based and normalized cuts-based image segmentation methods that are recently popular methods, and the proposed method showed better performance than previous two methods and graph cuts-based automatic image segmentation using GMM on Berkeley segmentation dataset.
PDF KSCI

Search Result 417, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)