Search | Korea Science

An Analysis on Phone-Like Units for Korean Continuous Speech Recognition in Noisy Environments (잡음환경하의 연속 음성인식을 위한 유사음소단위 분석)

Shen Guang-Hu;Lim Soo-Ho;Seo Jun-Bae;Kim Joo-Gon;Jung Ho-Youl;Chung Hyun-Yeol
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.123-126
- /
- 2004
본 논문은 잡음환경 하에서의 효율적인 문맥의존 음향 모델 구성에 대한 기초연구로서 잡음환경 하에서의 유사 음소단위 수에 따른 연속 음성인식 성능을 비교, 평가한 결과에 대한 보고이다. 기존의 연구[1,2]로부터 연속음성 인식의 경우 문맥종속모델은 변이음을 고려한 39유사음소를 이용한 경우가 48유사음소를 이용하는 것보다 더 좋은 인식성능을 나타냄을 알 수 있었다. 이 연구 결과를 바탕으로 본 연구에서는 잡음환경에서도 효율적인 문맥 의존 음향모델을 구성하기 위한 기초 연구를 수행하였다. 다양한 잡음환경을 고려하기 위해 White, Pink, LAB 잡음을 신호 대 잡음비(Signal to Noise Ratio) 5dB, 10dB, 15dB 레벨로 음성에 부가한 후 각 유사음소단위 수에 따른 연속음성인식 실험을 수행하였다. 그 결과, 39유사음소를 이용한 경우가 48유사음소를 이용한 경우보다 clear 환경인 경우에 약 $7\%$와 $17\%$ 향상된 단어인식률과 문장 인식률을 얻을 수 있었으며, 각 잡음환경에서도 39유사음소를 이용한 경우가 48유사음소를 이용한 경우보다 평균 적으로 $17\%$와 $28\%$ 향상된 단어인식률과 문장인식률을 얻을 수 있어 39유사음소 단위가 한국어 연속음성인식에 더 적합하고 잡음환경에서도 유효함을 확인할 수 있었다.
PDF

Speaker verification system combining attention-long short term memory based speaker embedding and I-vector in far-field and noisy environments (Attention-long short term memory 기반의 화자 임베딩과 I-vector를 결합한 원거리 및 잡음 환경에서의 화자 검증 알고리즘)

Bae, Ara;Kim, Wooil
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.2
- /
- pp.137-142
- /
- 2020
Many studies based on I-vector have been conducted in a variety of environments, from text-dependent short-utterance to text-independent long-utterance. In this paper, we propose a speaker verification system employing a combination of I-vector with Probabilistic Linear Discriminant Analysis (PLDA) and speaker embedding of Long Short Term Memory (LSTM) with attention mechanism in far-field and noisy environments. The LSTM model's Equal Error Rate (EER) is 15.52 % and the Attention-LSTM model is 8.46 %, improving by 7.06 %. We show that the proposed method solves the problem of the existing extraction process which defines embedding as a heuristic. The EER of the I-vector/PLDA without combining is 6.18 % that shows the best performance. And combined with attention-LSTM based embedding is 2.57 % that is 3.61 % less than the baseline system, and which improves performance by 58.41 %.
https://doi.org/10.7776/ASK.2020.39.2.137 인용 PDF KSCI

Performance Improvement of Continuous Digits Speech Recognition Using the Transformed Successive State Splitting and Demi-syllable Pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자 음 인식의 성능 향상)

Seo Eun-Kyoung;Choi Gab-Keun;Kim Soon-Hyob;Lee Soo-Jeong
- Journal of Korea Multimedia Society
- /
- v.9 no.1
- /
- pp.23-32
- /
- 2006
This paper describes the optimization of a language model and an acoustic model to improve speech recognition using Korean unit digits. Since the model is composed of a finite state network (FSN) with a disyllable, recognition errors of the language model were reduced by analyzing the grammatical features of Korean unit digits. Acoustic models utilize a demisyllable pair to decrease recognition errors caused by inaccurate division of a phone or monosyllable due to short pronunciation time and articulation. We have used the K-means clustering algorithm with the transformed successive state splitting in the feature level for the efficient modelling of feature of the recognition unit. As a result of experiments, 10.5% recognition rate is raised in the case of the proposed language model. The demi-syllable fair with an acoustic model increased 12.5% recognition rate and 1.5% recognition rate is improved in transformed successive state splitting.
PDF

A study on the Method of the Keyword Spotting Recognition in the Continuous speech using Neural Network (신경 회로망을 이용한 연속 음성에서의 keyword spotting 인식 방식에 관한 연구)

Yang, Jin-Woo;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.4
- /
- pp.43-49
- /
- 1996
This research proposes a system for speaker independent Korean continuous speech recognition with 247 DDD area names using keyword spotting technique. The applied recognition algorithm is the Dynamic Programming Neural Network(DPNN) based on the integration of DP and multi-layer perceptron as model that solves time axis distortion and spectral pattern variation in the speech. To improve performance, we classify word model into keyword model and non-keyword model. We make an experiment on postprocessing procedure for the evaluation of system performance. Experiment results are as follows. The recognition rate of the isolated word is 93.45% in speaker dependent case. The recognition rate of the isolated word is 84.05% in speaker independent case. The recognition rate of simple dialogic sentence in keyword spotting experiment is 77.34% as speaker dependent, and 70.63% as speaker independent.
PDF

Performance Improvement of Continuous Digits Speech Recognition using the Transformed Successive State Splitting and Demi-syllable pair (반음절쌍과 변형된 연쇄 상태 분할을 이용한 연속 숫자음 인식의 성능 향상)

Kim Dong-Ok;Park No-Jin
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.9 no.8
- /
- pp.1625-1631
- /
- 2005
This paper describes an optimization of a language model and an acoustic model that improve the ability of speech recognition with Korean nit digit. Recognition errors of the language model are decreasing by analysis of the grammatical feature of korean unit digits, and then is made up of fsn-node with a disyllable. Acoustic model make use of demi-syllable pair to decrease recognition errors by inaccuracy division of a phone, a syllable because of a monosyllable, a short pronunciation and an articulation. we have used the k-means clustering algorithm with the transformed successive state splining in feature level for the efficient modelling of the feature of recognition unit . As a result of experimentations, $10.5\%$ recognition rate is raised in the case of the proposed language model. The demi-syllable pair with an acoustic model increased $12.5\%$ recognition rate and $1.5\%$ recognition rate is improved in transformed successive state splitting.
PDF KSCI

A Study of Keyword Spotting System Based on the Weight of Non-Keyword Model (비핵심어 모델의 가중치 기반 핵심어 검출 성능 향상에 관한 연구)

Kim, Hack-Jin;Kim, Soon-Hyub
- The KIPS Transactions:PartB
- /
- v.10B no.4
- /
- pp.381-388
- /
- 2003
This paper presents a method of giving weights to garbage class clustering and Filler model to improve performance of keyword spotting system and a time-saving method of dialogue speech processing system for keyword spotting by calculating keyword transition probability through speech analysis of task domain users. The point of the method is grouping phonemes with phonetic similarities, which is effective in sensing similar phoneme groups rather than individual phonemes, and the paper aims to suggest five groups of phonemes obtained from the analysis of speech sentences in use in Korean morphology and in stock-trading speech processing system. Besides, task-subject Filler model weights are added to the phoneme groups, and keyword transition probability included in consecutive speech sentences is calculated and applied to the system in order to save time for system processing. To evaluate performance of the suggested system, corpus of 4,970 sentences was built to be used in task domains and a test was conducted with subjects of five people in their twenties and thirties. As a result, FOM with the weights on proposed five phoneme groups accounts for 85%, which has better performance than seven phoneme groups of Yapanel [1] with 88.5% and a little bit poorer performance than LVCSR with 89.8%. Even in calculation time, FOM reaches 0.70 seconds than 0.72 of seven phoneme groups. Lastly, it is also confirmed in a time-saving test that time is saved by 0.04 to 0.07 seconds when keyword transition probability is applied.
https://doi.org/10.3745/KIPSTB.2003.10B.4.381 인용 PDF KSCI

Recognizing Five Emotional States Using Speech Signals (음성 신호를 이용한 화자의 5가지 감성 인식)

Kang Bong-Seok;Han Chul-Hee;Woo Kyoung-Ho;Yang Tae-Young;Lee Chungyong;Youn Dae-Hee
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.101-104
- /
- 1999
본 논문에서는 음성 신호를 이용해서 화자의 감정을 인식하기 위해 3가지 시스템을 구축하고 이들의 성능을 비교해 보았다. 인식 대상으로 하는 감정은 기쁨, 슬픔, 화남, 두려움, 지루함, 평상시의 감정이고, 각 감정에 대한 감정 음성 데이터베이스를 직접 구축하였다. 피치와 에너지 정보를 감성 인식의 특징으로 이용하였고, 인식 알고리듬은 MLB(Maximum-Likelihood Bayes)분류기, NN(Nearest Neighbor)분류기 및 HMM(Hidden Markov Model)분류기를 이용하였다. 이 중 MLB 분류기와 NN 분류기에서는 특징벡터로 피치와 에너지의 평균과 표준편차, 최대값 등 통계적인 정보를 이용하였고, TMM 분류기에서는 각 프레임에서의 델타 피치와 델타델타 피치, 델타 에너지와 델타델타 에너지 등 시간적 정보를 이용하였다. 실험은 화자종속, 문장독립형 방식으로 하였고, 인식 실험 결과는 MLB를 이용해서 $68.9\%, NN을 이용해서 $66.7\%를 얻었고, HMM 분류기를 이용해서 $89.30\%를 얻었다.
PDF

Exploiting Implicit Parallelism for Single Loops in Java Programming Language (JAVA 프로그래밍 언어에서 단일루프구조의 무시적 병렬성 검출)

Kwon, Oh-Jin
- Journal of Information Management
- /
- v.29 no.3
- /
- pp.1-26
- /
- 1998
The loop is a fundamental for the parallelism exploiting as it has a large portion of execution time for sequential Java program on the parallel machine. This paper proposes the method of exploiting the implicit parallelism through the analysis of data dependence in the existing Java programming language having a single loop structure. The parallel code generation method through the restructuring compiler and the translation method of Java source program into multithread statement, which is supported in the level of the Java programming language, are also proposed here. The performance test of the program translated into the thread statement is conducted using the trip count of loop and the thread count as parameters. The restructuring compiler makes it possible for users to reduce overhead and exploit parallelism efficiently in the Java programming.
PDF

Scalarization of HPF FORALL Construct (HPF FORALL 구조의 스칼라화(Scalarization))

Koo, Mi-Soon
- Journal of the Korea Society of Computer and Information
- /
- v.12 no.5
- /
- pp.121-129
- /
- 2007
Scalarization is a process that a parallel construct like an array statement of Fortran 90 or FORALL of HPF is converted into sequential loops that maintain the correct semantics. Most compilers of HPF, recognized as a standard data parallel language, convert a HPF program into a Fortran 77 program inserted message passing primitives. During scalariztion, a parallel construct FORALL should be translated into Fortran 77 DO loops maintaining the semantics of FORALL. In this paper, we propose a scalarization algorithm which converts a FORALL construct into a DO loop with improved performance. For this, we define and use a relation distance vector to keep necessary dependence informations. Then we evaluate execution times of the codes generated by our method and by PARADIGM compiler method for various array sizes.
PDF

Measuring Method of Worst-case Execution Time by Analyzing Relation between Source Code and Executable Code (소스코드와 실행코드의 상관관계 분석을 통한 최악실행시간 측정 방법)

Seo, Yongjin;Kim, Hyeon Soo
- Journal of Internet Computing and Services
- /
- v.17 no.4
- /
- pp.51-60
- /
- 2016
Embedded software has requirements such as real-time and environment independency. The real-time requirement is affected from worst-case execution time of loaded tasks. Therefore, to guarantee real-time requirement, we need to determine a program's worst-case execution time using static analysis approach. However, the existing methods for worst-case execution time analysis do not consider the environment independency. Thus, in this paper, in order to provide environment independency, we propose a method for measuring task's execution time from the source codes. The proposed method measures the execution time through the control flow graph created from the source codes instead of the executable codes. However, the control flow graph created from the source code does not have information about execution time. Therefore, in order to provide this information, the proposed method identifies the relationships between statements in the source code and instructions in the executable code. By parameterizing those parts that are dependent on processors based on the relationships, it is possible to enhance the flexibility of the tool that measures the worst-case execution time.
https://doi.org/10.7472/jksii.2016.17.4.51 인용 PDF KSCI

Search Result 54, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)