• Title/Summary/Keyword: Speech rate

Search Result 1,245, Processing Time 0.034 seconds

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

A Study on Keyword Spotting System Using Pseudo N-gram Language Model (의사 N-gram 언어모델을 이용한 핵심어 검출 시스템에 관한 연구)

  • 이여송;김주곤;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.3
    • /
    • pp.242-247
    • /
    • 2004
  • Conventional keyword spotting systems use the connected word recognition network consisted by keyword models and filler models in keyword spotting. This is why the system can not construct the language models of word appearance effectively for detecting keywords in large vocabulary continuous speech recognition system with large text data. In this paper to solve this problem, we propose a keyword spotting system using pseudo N-gram language model for detecting key-words and investigate the performance of the system upon the changes of the frequencies of appearances of both keywords and filler models. As the results, when the Unigram probability of keywords and filler models were set to 0.2, 0.8, the experimental results showed that CA (Correctly Accept for In-Vocabulary) and CR (Correctly Reject for Out-Of-Vocabulary) were 91.1% and 91.7% respectively, which means that our proposed system can get 14% of improved average CA-CR performance than conventional methods in ERR (Error Reduction Rate).

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 김동수;남기환;한준희;배철수;나상동
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1998.11a
    • /
    • pp.181-185
    • /
    • 1998
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives and vowels.

  • PDF

Training Network Design Based on Convolution Neural Network for Object Classification in few class problem (소 부류 객체 분류를 위한 CNN기반 학습망 설계)

  • Lim, Su-chang;Kim, Seung-Hyun;Kim, Yeon-Ho;Kim, Do-yeon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.144-150
    • /
    • 2017
  • Recently, deep learning is used for intelligent processing and accuracy improvement of data. It is formed calculation model composed of multi data processing layer that train the data representation through an abstraction of the various levels. A category of deep learning, convolution neural network is utilized in various research fields, which are human pose estimation, face recognition, image classification, speech recognition. When using the deep layer and lots of class, CNN that show a good performance on image classification obtain higher classification rate but occur the overfitting problem, when using a few data. So, we design the training network based on convolution neural network and trained our image data set for object classification in few class problem. The experiment show the higher classification rate of 7.06% in average than the previous networks designed to classify the object in 1000 class problem.

Transcoding Algorithm for SMV and G.729A Vocoders via Direct Parameter Transformation (G.729A와 SMV 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬)

  • 장달원;서성호;이선일;유창동
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.6
    • /
    • pp.71-83
    • /
    • 2003
  • In this paper, a novel transcoding algorithm for the G.729A and the Selectable Mode Vocoder(SMV) vocoders via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the other without going through the decoding and encoding processes. In transcoder from SMV to G.729A, LSP conversion algorithm, pitch delay conversion algorithm and transcoding algorithm in lower rate are proposed, and in transcoder from G.729A to SMV, LSP conversion algorithm, pitch delay conversion algorithm and rate selection algorithm are proposed. Evaluation results show that while exhibiting better computational and delay characteristics, the proposed algorithm produces equivalent or Improved speech quality to that produced by the tandem transcoding algorithm.

Analysis of Phonatory Aerodynamic & E.G.G. during Passaggio of the Trained Male Singers (남성성악가의 Vocal Register Transition(Passaggio)시 공기역학적 변화와 EGG의 변화 연구)

  • Nam, Do-Hyun;Choi, Seong-Hee;Choi, Jae-Nam;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.21-26
    • /
    • 2004
  • Vocal Register Transition(Passaggio) is one of the most important vocal technique for classically trined male singers(tenor). Passaggio is that it bridges the chest register to head register without a noticeable voice break. Vocalist gest the feeling that voice is not locked a particular register. The purpose of this study was to clarify the difference between easy($B_3$) tone and non passaggio(F#_4$) & passaggio(F#_4$). We selected 6 trained singers(tenor), who had more than 12.6 years of experience and were well trained in passaggio technique. Simulataneous measurement was performed frequency(F0), mean flow rate(MFR), intensity(I), and subglottal pressure(Psub) using a phonatory function analyzer(Nagashima) and Closed Quotient(CQ), Jitter, Shimmer, NHR a Electro-glottography(EGG) of Lx. Speech Studio(Laryngogrph Lt, London, UK) and vocal efficiency was calculated by Carroll's method. For the tenor, target tone/a/was measured in three conditions : 1) easy phonation : $B_3$, 2) high tone without passaggio : F#_4$, 3) high tone with passaggio : F#_4$). The results revealed that F0 of the target tones between non-passaggio group and passaggio group were not significantly different though higher is F0, higher is subglottal pressure. And also CQ, MFR, Psub were increased in passagio than nonpssagio but these values were not statistically different. This study concluded that passaggio is the vocal technique to make the same quality of tone between chest register and head register in tenor.

  • PDF

Speaker-adaptive Word Recognition Using Mapped Membership Function (사상멤버쉽함수에 의한 화자적응 단어인식)

  • Lee, Ki-Yeong;Choi, Kap-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.40-52
    • /
    • 1992
  • In this paper, we propose the speaker adaptive word recognition method using a mapped membership function, in order to absorb a fluctuation owing to personal difference which is a problem of speaker independent speech recognition. In the training procedure of this method, the mapped membership function is made with the fuzzy theory introducded into a mapped codebook, between an unknown speaker's spectrum pattern and a standard speaker's one. In the recognition procedure, an input pattern of an unknown speaker is reconstructed to the pattern which is adapted to that of a standard speaker by the mapped membership function. To show the validity of this method, word recognition experiments are carried out using 28 DDD area names. The recognition rate of the conventional speaker-adaptive method using a mapped codebook by VQ is 64.9[%], and that made by a fuzzy VQ is 76.2[%]. Throughout the experiment using a mapped membership function, we can achieve 95.4[%] recognition rate. This shows that our proposed method is more excellent in recognition performance. Moreover, this method doesn't need an iterative training procedure to make the mapped membership function, and memory capacity and computation requirements for this method are reduced to 1/30 and 1/500 time of those for the conventional method using a mapped codebook, respectively.

  • PDF

A Study on the Reduction of LSP(Line Spectrum Pair) Transformation Time in Speech Coder for CDMA Digital Cellular System (이동통신용 음성부호화기에서의 LSP 계산시간 감소에 관한 연구)

  • Min, So-Yeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.8 no.3
    • /
    • pp.563-568
    • /
    • 2007
  • We propose the computation reduction method of real root method that is used in the EVRC(Enhanced Variable Rate Codec) system. The real root method is that if polynomial equations have the real roots, we are able to find those and transform them into LSP. However, this method takes much time to compute, because the root searching is processed sequentially in frequency region. But, the important characteristic of LSP is that most of coefficients are occurred in specific frequency region. So, to reduce the computation time of real root, we used the met scale that is linear below 1kHz and logarithmic above. In order to compare real root method with proposed method, we measured the following two. First, we compared the position of transformed LSP(Line Spectrum Pairs) parameters in the proposed method with these of real root method. Second, we measured how long computation time is reduced. The experimental result is that the searching time was reduced by about 48% in average without the change of LSP parameters.

  • PDF

Categorization of Korean News Articles Based on Convolutional Neural Network Using Doc2Vec and Word2Vec (Doc2Vec과 Word2Vec을 활용한 Convolutional Neural Network 기반 한국어 신문 기사 분류)

  • Kim, Dowoo;Koo, Myoung-Wan
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.742-747
    • /
    • 2017
  • In this paper, we propose a novel approach to improve the performance of the Convolutional Neural Network(CNN) word embedding model on top of word2vec with the result of performing like doc2vec in conducting a document classification task. The Word Piece Model(WPM) is empirically proven to outperform other tokenization methods such as the phrase unit, a part-of-speech tagger with substantial experimental evidence (classification rate: 79.5%). Further, we conducted an experiment to classify ten categories of news articles written in Korean by feeding words and document vectors generated by an application of WPM to the baseline and the proposed model. From the results of the experiment, we report the model we proposed showed a higher classification rate (89.88%) than its counterpart model (86.89%), achieving a 22.80% improvement. Throughout this research, it is demonstrated that applying doc2vec in the document classification task yields more effective results because doc2vec generates similar document vector representation for documents belonging to the same category.

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.