Search | Korea Science

Acoustic Modeling and Energy-Based Postprocessing for Automatic Speech Segmentation (자동 음성 분할을 위한 음향 모델링 및 에너지 기반 후처리)

Park Hyeyoung;Kim Hyungsoon
- MALSORI
- /
- no.43
- /
- pp.137-150
- /
- 2002
Speech segmentation at phoneme level is important for corpus-based text-to-speech synthesis. In this paper, we examine acoustic modeling methods to improve the performance of automatic speech segmentation system based on Hidden Markov Model (HMM). We compare monophone and triphone models, and evaluate several model training approaches. In addition, we employ an energy-based postprocessing scheme to make correction of frequent boundary location errors between silence and speech sounds. Experimental results show that our system provides 71.3% and 84.2% correct boundary locations given tolerance of 10 ms and 20 ms, respectively.
PDF

A Study on Improving Prediction Accuracy by Modeling Multiple Similar Time Series (다중 유사 시계열 모델링 방법을 통한 예측정확도 개선에 관한 연구)

Cho, Young-Hee;Lee, Gye-Sung
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.10 no.6
- /
- pp.137-143
- /
- 2010
A method for improving prediction accuracy through processing time series data has been studied in this research. We have designed techniques to model multiple similar time series data and avoided the shortcomings of single prediction model. We predicted the future changes by effective rules derived from these models. The methods for testing prediction accuracy consists of three types: fixed interval, sliding, and cumulative method. Among the three, cumulative method produced the highest accuracy.
PDF KSCI

A Study on Continuous Digits Speech Recognition using Probabilistic Models (확률적 모델을 이용한 연속 숫자음 인식에 관한 연구)

Lee Ju-Sung;Lee Seong-Kwon;Kim Soon-Hyob
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.109-112
- /
- 1999
본 연구는 음소 단위의 CHMM(Continuous Hidden Markov Model)을 이용한 한국어 연속 음성인식에 관한 내용이다. 연구실 환경에서 음성으로 전화를 걸기 위하여 연속 숫자음 인식을 수행하였다. ETRI 445 데이터를 사용하여 초기의 모델은 ML(Maximum Likelihood) 추정법을 이용하여 작성하였고 적응화를 위해 최대 사후 확률 추정법을 사용하였다. 연속 숫자음의 인식을 위하여 한국어 숫자음 음성의 음향학적 특성을 고려하여 발성 사전을 작성하였고, 음절 단위로 되어있는 한국어 숫자음의 모든 경우를 고려하여 복수개의 단어를 사전에 등록하였다. 또한 숫자음의 알 뒤 연음현상을 고려하여 작성한 21 종류의 7자리 숫자음과 이를 음절 단위로 세그먼트한 숫자음을 DB로 사용하여 적응화를 수행하였다. 이의 효율성을 입증하기 위하여 ETRI에서 작성한 35종류의 4연속 숫자음 목록을 대상으로 인식실험을 수행하였다.
PDF

Improvement of Semicontinuous Hiden Markov Models and One-Pass Algorithm for Recognition of Keywords in Korean Continuous Speech (한국어 연속음성중 키워드 인식을 위한 반연속 은닉 마코브 모델과 One-Pass 알고리즘의 개선방안)

최관선
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.358-363
- /
- 1994
This paper presents the improvement of the SCHMM using discrete VQ and One-Pass algorithm for keywords recognition in Korean continuous speech. The SCHMM using discrete VQ is a simple model that is composed of a variable mixture gaussian probability density function with dynamic mixture number. One-Pass algorithm is improved such that recognition rates are enhanced by fathoming any undesirable semisyllable with the low likelihood and the high duration penalty, and computation time is reduced by testing only the frame which is dissimilar to the previously testd frame. In recognition experiments for speaker-dependent case, the improved One-Pass algorithm has shown recognition rates as high as 99.7% and has reduced compution time by about 30% compared with the currently abailable one-pass algorithm.
PDF

A Comparison of Discrete and Continuous Hidden Markov Models for Korean Digit Recognition (한국어 숫자음 인식을 위한 이산분포 HMM과 연속분포 HMM의 성능 비교 연구)

홍형진
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1994.06c
- /
- pp.157-160
- /
- 1994
본 논문에서는 한국어 숫자음 인식에 대한 이산분포 HMM과 연속분포 HMM의 인식 성능을 비교하였다. 일반적으로 연속분포 HMM은 많은 계산량이 필요하고, 학습시 초기값이 매우 민감하다는 단점이 있지만, 이산분포 HMM의 VQ로 인한 왜곡을 제거함으로써 인식률을 향상시킬 수 있다. 여기서는 성능비교를 위해서 mel-cepstrum의 분석차수, 이산분포 HMM의 codebook 크기, 연속분포 HMM의 miture 개수등에 따른 인식성능을 비교하였다. 실험 결과 이산분포 HMM에서는 mel-cepstrum 벡터가 14차이고, codebook 크기가 64일 때 가장 좋은 성능을 나타냈으며, 연속부포 HMM에서는 mel-cepstrum 벡터가 16차이고 miture가 3개일 때 가장 좋은 결과를 얻을 수 있었다. 특히 학습 데이터의 양이 적은 경우에는 연속분포 HMM이 이산분포 HMM보다 더 좋은 인식률을 나타내었다.
PDF

Stereo Vision Neural Networks with Competition and Cooperation for Phoneme Recognition

Kim, Sung-Ill;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.1E
- /
- pp.3-10
- /
- 2003
This paper describes two kinds of neural networks for stereoscopic vision, which have been applied to an identification of human speech. In speech recognition based on the stereoscopic vision neural networks (SVNN), the similarities are first obtained by comparing input vocal signals with standard models. They are then given to a dynamic process in which both competitive and cooperative processes are conducted among neighboring similarities. Through the dynamic processes, only one winner neuron is finally detected. In a comparative study, with, the average phoneme recognition accuracy on the two-layered SVNN was 7.7% higher than the Hidden Markov Model (HMM) recognizer with the structure of a single mixture and three states, and the three-layered was 6.6% higher. Therefore, it was noticed that SVNN outperformed the existing HMM recognizer in phoneme recognition.
PDF KSCI

Study on Efficient Generation of Dictionary for Korean Vocabulary Recognition (한국어 음성인식을 위한 효율적인 사전 구성에 관한 연구)

Lee Sang-Bok;Choi Dae-Lim;Kim Chong-Kyo
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.41-44
- /
- 2002
This paper is related to the enhancement of speech recognition rate using enhanced pronunciation dictionary. Modern large vocabulary, continuous speech recognition systems have pronunciation dictionaries. A pronunciation dictionary provides pronunciation information for each word in the vocabulary in phonemic units, which are modeled in detail by the acoustic models. But in most speech recognition system based on Hidden Markov Model, actual pronunciation variations are disregarded. Without the pronunciation variations in the speech recognition system, the phonetic transcriptions in the dictionary do not match the actual occurrences in the database. In this paper, we proposed the unvoiced rule of semivowel in allophone rules to pronunciation dictionary. Experimental results on speech recognition system give higher performance than existing pronunciation dictionaries.
PDF

BAYESIAN ROBUST ANALYSIS FOR NON-NORMAL DATA BASED ON A PERTURBED-t MODEL

Kim, Hea-Jung
- Journal of the Korean Statistical Society
- /
- v.35 no.4
- /
- pp.419-439
- /
- 2006
The article develops a new class of distributions by introducing a nonnegative perturbing function to $t_\nu$ distribution having location and scale parameters. The class is obtained by using transformations and conditioning. The class strictly includes $t_\nu$ and $skew-t_\nu$ distributions. It provides yet other models useful for selection modeling and robustness analysis. Analytic forms of the densities are obtained and distributional properties are studied. These developments are followed by an easy method for estimating the distribution by using Markov chain Monte Carlo. It is shown that the method is straightforward to specify distribution ally and to implement computationally, with output readily adopted for constructing required criterion. The method is illustrated by using a simulation study.
PDF KSCI

A BAYESIAN APPROACH FOR A DECOMPOSITION MODEL OF SOFTWARE RELIABILITY GROWTH USING A RECORD VALUE STATISTICS

Choi, Ki-Heon;Kim, Hee-Cheul
- Journal of applied mathematics & informatics
- /
- v.8 no.1
- /
- pp.243-252
- /
- 2001
The points of failure of a decomposition process are defined to be the union of the points of failure from two component point processes for software reliability systems. Because sampling from the likelihood function of the decomposition model is difficulty, Gibbs Sampler can be applied in a straightforward manner. A Markov Chain Monte Carlo method with data augmentation is developed to compute the features of the posterior distribution. For model determination, we explored the prequential conditional predictive ordinate criterion that selects the best model with the largest posterior likelihood among models using all possible subsets of the component intensity functions. A numerical example with a simulated data set is given.

A Bayesian Inference for Power Law Process with a Single Change Point

Kim, Kiwoong;Inkwon Yeo;Sinsup Cho;Kim, Jae-Joo
- International Journal of Quality Innovation
- /
- v.5 no.1
- /
- pp.1-9
- /
- 2004
The nonhomogeneous poisson process (NHPP) is often used to model repairable systems that are subject to a minimal repair strategy, with negligible repair times. In this situation, the system can be characterized by its intensity function. There have been many NHPP models according to intensity functions. However, the intensity function of system in use can be changed because of repair or its aging. We consider the single change point model as the modification of the power law process. The shape parameter of its intensity function is changed before and after the change point. We detect the presence of the change point using Bayesian methodology. Some numerical results are also presented.
PDF KSCI

Search Result 491, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)