• Title/Summary/Keyword: hidden Markov model

Search Result 639, Processing Time 0.03 seconds

English Phoneme Recognition using Segmental-Feature HMM (분절 특징 HMM을 이용한 영어 음소 인식)

  • Yun, Young-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.167-179
    • /
    • 2002
  • In this paper, we propose a new acoustic model for characterizing segmental features and an algorithm based upon a general framework of hidden Markov models (HMMs) in order to compensate the weakness of HMM assumptions. The segmental features are represented as a trajectory of observed vector sequences by a polynomial regression function because the single frame feature cannot represent the temporal dynamics of speech signals effectively. To apply the segmental features to pattern classification, we adopted segmental HMM(SHMM) which is known as the effective method to represent the trend of speech signals. SHMM separates observation probability of the given state into extra- and intra-segmental variations that show the long-term and short-term variabilities, respectively. To consider the segmental characteristics in acoustic model, we present segmental-feature HMM(SFHMM) by modifying the SHMM. The SFHMM therefore represents the external- and internal-variation as the observation probability of the trajectory in a given state and trajectory estimation error for the given segment, respectively. We conducted several experiments on the TIMIT database to establish the effectiveness of the proposed method and the characteristics of the segmental features. From the experimental results, we conclude that the proposed method is valuable, if its number of parameters is greater than that of conventional HMM, in the flexible and informative feature representation and the performance improvement.

A study on the connected-digit recognition using MLP-VQ and Weighted DHMM (MLP-VQ와 가중 DHMM을 이용한 연결 숫자음 인식에 관한 연구)

  • Chung, Kwang-Woo;Hong, Kwang-Seok
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.8
    • /
    • pp.96-105
    • /
    • 1998
  • The aim of this paper is to propose the method of WDHMM(Weighted DHMM), using the MLP-VQ for the improvement of speaker-independent connect-digit recognition system. MLP neural-network output distribution shows a probability distribution that presents the degree of similarity between each pattern by the non-linear mapping among the input patterns and learning patterns. MLP-VQ is proposed in this paper. It generates codewords by using the output node index which can reach the highest level within MLP neural-network output distribution. Different from the old VQ, the true characteristics of this new MLP-VQ lie in that the degree of similarity between present input patterns and each learned class pattern could be reflected for the recognition model. WDHMM is also proposed. It can use the MLP neural-network output distribution as the way of weighing the symbol generation probability of DHMMs. This newly-suggested method could shorten the time of HMM parameter estimation and recognition. The reason is that it is not necessary to regard symbol generation probability as multi-dimensional normal distribution, as opposed to the old SCHMM. This could also improve the recognition ability by 14.7% higher than DHMM, owing to the increase of small caculation amount. Because it can reflect phone class relations to the recognition model. The result of my research shows that speaker-independent connected-digit recognition, using MLP-VQ and WDHMM, is 84.22%.

  • PDF

Design and Implementation of a Real-Time Lipreading System Using PCA & HMM (PCA와 HMM을 이용한 실시간 립리딩 시스템의 설계 및 구현)

  • Lee chi-geun;Lee eun-suk;Jung sung-tae;Lee sang-seol
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1597-1609
    • /
    • 2004
  • A lot of lipreading system has been proposed to compensate the rate of speech recognition dropped in a noisy environment. Previous lipreading systems work on some specific conditions such as artificial lighting and predefined background color. In this paper, we propose a real-time lipreading system which allows the motion of a speaker and relaxes the restriction on the condition for color and lighting. The proposed system extracts face and lip region from input video sequence captured with a common PC camera and essential visual information in real-time. It recognizes utterance words by using the visual information in real-time. It uses the hue histogram model to extract face and lip region. It uses mean shift algorithm to track the face of a moving speaker. It uses PCA(Principal Component Analysis) to extract the visual information for learning and testing. Also, it uses HMM(Hidden Markov Model) as a recognition algorithm. The experimental results show that our system could get the recognition rate of 90% in case of speaker dependent lipreading and increase the rate of speech recognition up to 40~85% according to the noise level when it is combined with audio speech recognition.

  • PDF

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.74-82
    • /
    • 2016
  • In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

A Design of the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomous vehicle safety (자율차량 안전을 위한 긴급상황 알림 및 운전자 반응 확인 시스템 설계)

  • Son, Su-Rak;Jeong, Yi-Na
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.134-139
    • /
    • 2021
  • Currently, the autonomous vehicle market is commercializing a level 3 autonomous vehicle, but it still requires the attention of the driver. After the level 3 autonomous driving, the most notable aspect of level 4 autonomous vehicles is vehicle stability. This is because, unlike Level 3, autonomous vehicles after level 4 must perform autonomous driving, including the driver's carelessness. Therefore, in this paper, we propose the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomousvehicle safety that notifies the driver of an emergency situation and recognizes the driver's reaction in a situation where the driver is careless. The EDCS uses the emergency situation delivery module to make the emergency situation to text and transmits it to the driver by voice, and the driver response confirmation module recognizes the driver's reaction to the emergency situation and gives the driver permission Decide whether to pass. As a result of the experiment, the HMM of the emergency delivery module learned speech at 25% faster than RNN and 42.86% faster than LSTM. The Tacotron2 of the driver's response confirmation module converted text to speech about 20ms faster than deep voice and 50ms faster than deep mind. Therefore, the emergency notification and driver response confirmation system can efficiently learn the neural network model and check the driver's response in real time.

Estimation and Weighting of Sub-band Reliability for Multi-band Speech Recognition (다중대역 음성인식을 위한 부대역 신뢰도의 추정 및 가중)

  • 조훈영;지상문;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.6
    • /
    • pp.552-558
    • /
    • 2002
  • Recently, based on the human speech recognition (HSR) model of Fletcher, the multi-band speech recognition has been intensively studied by many researchers. As a new automatic speech recognition (ASR) technique, the multi-band speech recognition splits the frequency domain into several sub-bands and recognizes each sub-band independently. The likelihood scores of sub-bands are weighted according to reliabilities of sub-bands and re-combined to make a final decision. This approach is known to be robust under noisy environments. When the noise is stationary a sub-band SNR can be estimated using the noise information in non-speech interval. However, if the noise is non-stationary it is not feasible to obtain the sub-band SNR. This paper proposes the inverse sub-band distance (ISD) weighting, where a distance of each sub-band is calculated by a stochastic matching of input feature vectors and hidden Markov models. The inverse distance is used as a sub-band weight. Experiments on 1500∼1800㎐ band-limited white noise and classical guitar sound revealed that the proposed method could represent the sub-band reliability effectively and improve the performance under both stationary and non-stationary band-limited noise environments.

Genome-wide survey and expression analysis of F-box genes in wheat

  • Kim, Dae Yeon;Hong, Min Jeong;Seo, Yong Weon
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2017.06a
    • /
    • pp.141-141
    • /
    • 2017
  • The ubiquitin-proteasome pathway is the major regulatory mechanism in a number of cellular processes for selective degradation of proteins and involves three steps: (1) ATP dependent activation of ubiquitin by E1 enzyme, (2) transfer of activated ubiquitin to E2 and (3) transfer of ubiquitin to the protein to be degraded by E3 complex. F-box proteins are subunit of SCF complex and involved in specificity for a target substrate to be degraded. F-box proteins regulate many important biological processes such as embryogenesis, floral development, plant growth and development, biotic and abiotic stress, hormonal responses and senescence. However, little is known about the F-box genes in wheat. The draft genome sequence of wheat (IWGSC Reference Sequence v1.0 assembly) used to analysis a genome-wide survey of the F-box gene family in wheat. The Hidden Markov Model (HMM) profiles of F-box (PF00646), F-box-like (PF12937), F-box-like 2 (PF13013), FBA (PF04300), FBA_1 (PF07734), FBA_2 (PF07735), FBA_3 (PF08268) and FBD (PF08387) domains were downloaded from Pfam database were searched against IWGSC Reference Sequence v1.0 assembly. RNA-seq paired-end libraries from different stages of wheat, such as stages of seedling, tillering, booting, day after flowering (DAF) 1, DAF 10, DAF 20, and DAF 30 were conducted and sequenced by Illumina HiSeq2000 for expression analysis of F-box protein genes. Basic analysis including Hisat, HTseq, DEseq, gene ontology analysis and KEGG mapping were conducted for differentially expressed gene analysis and their annotation mappings of DEGs from various stages. About 950 F-box domain proteins identified by Pfam were mapped to wheat reference genome sequence by blastX (e-value < 0.05). Among them, more than 140 putative F-box protein genes were selected by fold changes cut-offs of > 2, significance p-value < 0.01, and FDR<0.01. Expression profiling of selected F-box protein genes were shown by heatmap analysis, and average linkage and squared Euclidean distance of putative 144 F-box protein genes by expression patterns were calculated for clustering analysis. This work may provide valuable and basic information for further investigation of protein degradation mechanism by ubiquitin proteasome system using F-box proteins during wheat development stages.

  • PDF

Depth Image Poselets via Body Part-based Pose and Gesture Recognition (신체 부분 포즈를 이용한 깊이 영상 포즈렛과 제스처 인식)

  • Park, Jae Wan;Lee, Chil Woo
    • Smart Media Journal
    • /
    • v.5 no.2
    • /
    • pp.15-23
    • /
    • 2016
  • In this paper we propose the depth-poselets using body-part-poses and also propose the method to recognize the gesture. Since the gestures are composed of sequential poses, in order to recognize a gesture, it should emphasize to obtain the time series pose. Because of distortion and high degree of freedom, it is difficult to recognize pose correctly. So, in this paper we used partial pose for obtaining a feature of the pose correctly without full-body-pose. In this paper, we define the 16 gestures, a depth image using a learning image was generated based on the defined gestures. The depth poselets that were proposed in this paper consists of principal three-dimensional coordinates of the depth image and its depth image of the body part. In the training process after receiving the input defined gesture by using a depth camera in order to train the gesture, the depth poselets were generated by obtaining 3D joint coordinates. And part-gesture HMM were constructed using the depth poselets. In the testing process after receiving the input test image by using a depth camera in order to test, it extracts foreground and extracts the body part of the input image by comparing depth poselets. And we check part gestures for recognizing gesture by using result of applying HMM. We can recognize the gestures efficiently by using HMM, and the recognition rates could be confirmed about 89%.

Prediction of the direction of stock prices by machine learning techniques (기계학습을 활용한 주식 가격의 이동 방향 예측)

  • Kim, Yonghwan;Song, Seongjoo
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.745-760
    • /
    • 2021
  • Prediction of a stock price has been a subject of interest for a long time in financial markets, and thus, many studies have been conducted in various directions. As the efficient market hypothesis introduced in the 1970s acquired supports, it came to be the majority opinion that it was impossible to predict stock prices. However, recent advances in predictive models have led to new attempts to predict the future prices. Here, we summarize past studies on the price prediction by evaluation measures, and predict the direction of stock prices of Samsung Electronics, LG Chem, and NAVER by applying various machine learning models. In addition to widely used technical indicator variables, accounting indicators such as Price Earning Ratio and Price Book-value Ratio and outputs of the hidden Markov Model are used as predictors. From the results of our analysis, we conclude that no models show significantly better accuracy and it is not possible to predict the direction of stock prices with models used. Considering that the models with extra predictors show relatively high test accuracy, we may expect the possibility of a meaningful improvement in prediction accuracy if proper variables that reflect the opinions and sentiments of investors would be utilized.