Search | Korea Science

Isolated Word Recognition Using Hidden Markov Models with Bounded State Duration (제한적 상태지속시간을 갖는 HMM을 이용한 고립단어 인식)

이기희;임인칠
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.32B no.5
- /
- pp.756-764
- /
- 1995
In this paper, we proposed MLP(MultiLayer Perceptron) based HMM's(Hidden Markov Models) with bounded state duration for isolated word recognition. The minimum and maximum state duration for each state of a HMM are estimated during the training phase and used as parameters of constraining state transition in a recognition phase. The procedure for estimating these parameters and the recognition algorithm using the proposed HMM's are also described. Speaker independent isolated word recognition experiments using a vocabulary of 10 city names and 11 digits indicate that recognition rate can be improved by adjusting the minimum state durations.
PDF

A Study on the Syllable Recognition Using Neural Network Predictive HMM

Kim, Soo-Hoon;Kim, Sang-Berm;Koh, Si-Young;Hur, Kang-In
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.2E
- /
- pp.26-30
- /
- 1998
In this paper, we compose neural network predictive HMM(NNPHMM) to provide the dynamic feature of the speech pattern for the HMM. The NNPHMM is the hybrid network of neura network and the HMM. The NNPHMM trained to predict the future vector, varies each time. It is used instead of the mean vector in the HMM. In the experiment, we compared the recognition abilities of the one hundred Korean syllables according to the variation of hidden layer, state number and prediction orders of the NNPHMM. The hidden layer of NNPHMM increased from 10 dimensions to 30 dimensions, the state number increased from 4 to 6 and the prediction orders increased from 10 dimensions to 30 dimension, the state number increased from 4 to 6 and the prediction orders increased from the second oder to the fourth order. The NNPHMM in the experiment is composed of multi-layer perceptron with one hidden layer and CMHMM. As a result of the experiment, the case of prediction order is the second, the average recognition rate increased 3.5% when the state number is changed from 4 to 5. The case of prediction order is the third, the recognition rate increased 4.0%, and the case of prediction order is fourth, the recognition rate increased 3.2%. But the recognition rate decreased when the state number is changed from 5 to 6.
PDF

Performance Improvement of Continuous Digits Speech Recognition Using the Transformed Successive State Splitting and Demi-syllable Pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자 음 인식의 성능 향상)

Seo Eun-Kyoung;Choi Gab-Keun;Kim Soon-Hyob;Lee Soo-Jeong
- Journal of Korea Multimedia Society
- /
- v.9 no.1
- /
- pp.23-32
- /
- 2006
This paper describes the optimization of a language model and an acoustic model to improve speech recognition using Korean unit digits. Since the model is composed of a finite state network (FSN) with a disyllable, recognition errors of the language model were reduced by analyzing the grammatical features of Korean unit digits. Acoustic models utilize a demisyllable pair to decrease recognition errors caused by inaccurate division of a phone or monosyllable due to short pronunciation time and articulation. We have used the K-means clustering algorithm with the transformed successive state splitting in the feature level for the efficient modelling of feature of the recognition unit. As a result of experiments, 10.5% recognition rate is raised in the case of the proposed language model. The demi-syllable fair with an acoustic model increased 12.5% recognition rate and 1.5% recognition rate is improved in transformed successive state splitting.
PDF

A Study on the Korean Syllable As Recognition Unit (인식 단위로서의 한국어 음절에 대한 연구)

Kim, Yu-Jin;Kim, Hoi-Rin;Chung, Jae-Ho
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.3
- /
- pp.64-72
- /
- 1997
In this paper, study and experiments are performed for finding recognition unit fit which can be used in large vocabulary recognition system. Specifically, a phoneme that is currently used as recognition unit and a syllable in which Korean is well characterized are selected. From comparisons of recognition experiments, the study is performed whether a syllable can be considered as recognition unit of Korean recognition system. For report of an objective result of the comparison experiment, we collected speech data of a male speaker and processed them by hand-segmentation for phoneme boundary and labeling to construct speech database. And for training and recognition based on HMM, we used HTK (HMM Tool Kit) 2.0 of commercial tool from Entropic Co. to experiment in same condition. We applied two HMM model topologies, 3 emitting state of 5 state and 6 emitting state of 8 state, in Continuous HMM on training of each recognition unit. We also used 3 sets of PBW (Phonetically Balanced Words) and 1 set of POW(Phonetically Optimized Words) for training and another 1 set of PBW for recognition, that is "Speaker Dependent Medium Vocabulary Size Recognition." Experiments result reports that recognition rate is 95.65% in phoneme unit, 94.41% in syllable unit and decoding time of recognition in syllable unit is faster by 25% than in phoneme.
PDF

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

오세진;김광동;노덕규;위석오;송민규;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.4
- /
- pp.30-39
- /
- 2003
In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.
PDF

Dependence of Molecular Recognition for a Specific Cation on the Change of the Oxidation State of the Metal Catalyst Component in the Hydrogel Network

Basavaraja, Chitragara;Park, Do-Young;Choe, Young-Min;Park, Hyun-Tae;Zhao, Yan Shuang;Yamaguchi, Tomohiko;Huh, Do-Sung
- Bulletin of the Korean Chemical Society
- /
- v.28 no.5
- /
- pp.805-810
- /
- 2007
Molecular recognition for a specific cation depending on the change of the oxidation state of the metal catalyst component contained in the hydrogel network has been studied in a self-oscillating hydrogel. The selfoscillating hydrogels are synthesized by the copolymerization of N-isopropylacrylamide (NIPAAm), lead methacrylic acid (Pb(MAA)2), and Ru(bpy)3 2+ monomer as a metal catalyst component. The recognition for a specific cation (in this study, Ca2+ has been used) is characterized by the adsorbed amount of Ca2+ into the gel. The recognition of the gels for Ca2+ is higher at the temperature below the LCST, and also higher at the oxidized state than at reduced state of the metal catalyst component which corresponds to a more swollen state. Moreover, a propagating wave induced by a periodic change of the oxidation state with the diffusion phenomena in the oscillating hydrogel shows a possibility for temporal and site-specific molecular recognition due to the local swelling of the gel.
https://doi.org/10.5012/bkcs.2007.28.5.805 인용 PDF KSCI

Korean Speech Recognition using Dynamic Multisection Model (DMS 모델을 이용한 한국어 음성 인식)

안태옥;변용규;김순협
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.27 no.12
- /
- pp.1933-1939
- /
- 1990
In this paper, we proposed an algorithm which used backtracking method to get time information, and it be modelled DMS (Dynamic Multisection) by feature vectors and time information whic are represented to similiar feature in word patterns spoken during continuous time domain, for Korean Speech recognition by independent speaker using DMS. Each state of model is represented time sequence, and have time information and feature vector. Typical feature vector is determined as the feature vector of each state to minimize the distance between word patterns. DDD Area names are selected as recognition wcabulary and 12th LPC cepstrum coefficients are used as the feature parameter. State of model is made 8 multisection and is used 0.2 as weight for time information. Through the experiment result, recognition rate by DMS model is 94.8%, and it is shown that this is better than recognition rate (89.3%) by MSVQ(Multisection Vector Quantization) method.
PDF

Subword-based Lip Reading Using State-tied HMM (상태공유 HMM을 이용한 서브워드 단위 기반 립리딩)

Kim, Jin-Young;Shin, Do-Sung
- Speech Sciences
- /
- v.8 no.3
- /
- pp.123-132
- /
- 2001
In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.
PDF

CSI-based human activity recognition via lightweight compact convolutional transformers

Fahd Saad Abuhoureyah;Yan Chiew Wong;Malik Hasan Al-Taweel;Nihad Ibrahim Abdullah
- Advances in Computational Design
- /
- v.9 no.3
- /
- pp.187-211
- /
- 2024
WiFi sensing integration enables non-intrusive and is utilized in applications like Human Activity Recognition (HAR) to leverage Multiple Input Multiple Output (MIMO) systems and Channel State Information (CSI) data for accurate signal monitoring in different fields, such as smart environments. The complexity of extracting relevant features from CSI data poses computational bottlenecks, hindering real-time recognition and limiting deployment on resource-constrained devices. The existing methods sacrifice accuracy for computational efficiency or vice versa, compromising the reliability of activity recognition within pervasive environments. The lightweight Compact Convolutional Transformer (CCT) algorithm proposed in this work offers a solution by streamlining the process of leveraging CSI data for activity recognition in such complex data. By leveraging the strengths of both CNNs and transformer models, the CCT algorithm achieves state-of-the-art accuracy on various benchmarks, emphasizing its excellence over traditional algorithms. The model matches convolutional networks' computational efficiency with transformers' modeling capabilities. The evaluation process of the proposed model utilizes self-collected dataset for CSI WiFi signals with few daily activities. The results demonstrate the improvement achieved by using CCT in real-time activity recognition, as well as the ability to operate on devices and networks with limited computational resources.
https://doi.org/10.12989/acd.2024.9.3.187 인용

A phoneme duration modeling in a speech recognition system based on decision tree state tying (결정트리기반 음성인식 시스템에서의 음소지속시간 사용방법)

Koo Myoun-Wan;Kim Ho-Kyoung
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.197-200
- /
- 2002
In this paper, we propose a phoneme duration modeling in a speech recognition system based on disicion tree state tying. We assume that phone duration has a Gamma distribution. In a training mode, we model mean and variance of each state duration in context-independent phone model based on decision tree state tying. In a recognition mode, we get mean and variance of each context-dependent phone duration form state duration information obtaind during training mode. We make a comparative study of the proposed meth with conventinal methods. Our method results in good performance compared with conventional methods.
PDF

Search Result 1,016, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)