• Title/Summary/Keyword: 한국어 음성처리

Search Result 265, Processing Time 0.023 seconds

Language Modeling based on Inter-Word Dependency Relation (단어간 의존관계에 기반한 언어모델링)

  • Lee, Seung-Mi;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.239-246
    • /
    • 1998
  • 확률적 언어모델링은 일련의 단어열에 문장확률값을 적용하는 기법으로서 음성인식, 확률적 기계번역 등의 많은 자연언어처리 응용시스템의 중요한 한 요소이다. 기존의 접근방식으로는 크게 n-gram 기반, 문법 기반의 두가지가 있다. 일반적으로 n-gram 방식은 원거리 의존관계를 잘 표현 할 수 없으며 문법 기반 방식은 광범위한 커버리지의 문법을 습득하는데에 어려움을 가지고 있다. 본 논문에서는 일종의 단순한 의존문법을 기반으로 하는 언어모델링 기법을 제시한다. 의존문법은 단어와 단어 사이의 지배-피지배 관계로 구성되며 본 논문에서 소개되는 의존문법 재추정 알고리즘을 이용하여 원시 코퍼스로부터 자동적으로 학습된다. 실험 결과, 제시된 의존관계기반 모델이 tri-gram, bi-gram 모델보다 실험코퍼스에 대해서 약 11%에서 11.5%의 엔트로피 감소를 보임으로써 성능의 개선이 있었다.

  • PDF

Automatic sentence segmentation of subtitles generated by STT (STT로 생성된 자막의 자동 문장 분할)

  • Kim, Ki-Hyun;Kim, Hong-Ki;Oh, Byoung-Doo;Kim, Yu-Seop
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.559-560
    • /
    • 2018
  • 순환 신경망(RNN) 기반의 Long Short-Term Memory(LSTM)는 자연어처리 분야에서 우수한 성능을 보이는 모델이다. 음성을 문자로 변환해주는 Speech to Text (STT)를 이용해 자막을 생성하고, 생성된 자막을 다른 언어로 동시에 번역을 해주는 서비스가 활발히 진행되고 있다. STT를 사용하여 자막을 추출하는 경우에는 마침표가 없이 전부 연결된 문장이 생성되기 때문에 정확한 번역이 불가능하다. 본 논문에서는 영어자막의 자동 번역 시, 정확도를 높이기 위해 텍스트를 문장으로 분할하여 마침표를 생성해주는 방법을 제안한다. 이 때, LSTM을 이용하여 데이터를 학습시킨 후 테스트한 결과 62.3%의 정확도로 마침표의 위치를 예측했다.

  • PDF

Performance compare by the processing unit of the automatic phoneme labelling system (음운 자동 레이블링 시스템의 처리단위에 의한 성능비교)

  • Park, Soon-Cheol;Kim, Tae-Hwan;Kim, Bong-Wan;Lee, Yong-Ju
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.173-177
    • /
    • 1999
  • 본 논문에서는 레이블링 시스템에서 기본단위로 새롭게 제안된바 있는 demiphone의[1] 성능을 평가하기 위하여 monophone과 triphone, demiphone을 단위로 하는 레이블링 시스템을 구축하여 demiphone의 성능을 평가하였다. 음성 데이터 베이스는 PBW 452단어를 대상으로 남자 30명분의 데이터를 훈련에 사용하였으며, 훈련에 사용하지 않는 남자 4명분의 데이터를 시스템의 평가에 사용하였다. 평가결과 demiphone을 사용한 경우 경계오차가 20ms 이하의 경우에는 monophone에 비하여 6.31%, triphone에 비해 6.21%로 성능이 우수하다. 그리고, 40ms 이하의 경우에는 각각 4.33% 와 3.68%의 성능 향상을 가져왔다.

  • PDF

Consecutive Vowel Segmentation of Korean Speech Signal using Phonetic-Acoustic Transition Pattern (음소 음향학적 변화 패턴을 이용한 한국어 음성신호의 연속 모음 분할)

  • Park, Chang-Mok;Wang, Gi-Nam
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.10a
    • /
    • pp.801-804
    • /
    • 2001
  • This article is concerned with automatic segmentation of two adjacent vowels for speech signals. All kinds of transition case of adjacent vowels can be characterized by spectrogram. Firstly the voiced-speech is extracted by the histogram analysis of vowel indicator which consists of wavelet low pass components. Secondly given phonetic transcription and transition pattern spectrogram, the voiced-speech portion which has consecutive vowels automatically segmented by the template matching. The cross-correlation function is adapted as a template matching method and the modified correlation coefficient is calculated for all frames. The largest value on the modified correlation coefficient series indicates the boundary of two consecutive vowel sounds. The experiment is performed for 154 vowel transition sets. The 154 spectrogram templates are gathered from 154 words(PRW Speech DB) and the 161 test words(PBW Speech DB) which are uttered by 5 speakers were tested. The experimental result shows the validity of the method.

  • PDF

Query Normalization Using P-tuning of Large Pre-trained Language Model (Large Pre-trained Language Model의 P-tuning을 이용한 질의 정규화)

  • Suh, Soo-Bin;In, Soo-Kyo;Park, Jin-Seong;Nam, Kyeong-Min;Kim, Hyeon-Wook;Moon, Ki-Yoon;Hwang, Won-Yo;Kim, Kyung-Duk;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.396-401
    • /
    • 2021
  • 초거대 언어모델를 활용한 퓨샷(few shot) 학습법은 여러 자연어 처리 문제에서 좋은 성능을 보였다. 하지만 데이터를 활용한 추가 학습으로 문제를 추론하는 것이 아니라, 이산적인 공간에서 퓨샷 구성을 통해 문제를 정의하는 방식은 성능 향상에 한계가 존재한다. 이를 해결하기 위해 초거대 언어모델의 모수 전체가 아닌 일부를 추가 학습하거나 다른 신경망을 덧붙여 연속적인 공간에서 추론하는 P-tuning과 같은 데이터 기반 추가 학습 방법들이 등장하였다. 본 논문에서는 문맥에 따른 질의 정규화 문제를 대화형 음성 검색 서비스에 맞게 직접 정의하였고, 초거대 언어모델을 P-tuning으로 추가 학습한 경우 퓨샷 학습법 대비 정확도가 상승함을 보였다.

  • PDF

The Language Change and Language Processing (언어 변화와 언어 처리 - '는게/는데' 문법 화와 자동 태깅 시스템-)

  • 최운호
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.35-43
    • /
    • 1999
  • This paper aims to research the language changes in modern Korean and its effect to the language processing systems. In modern Korean. the syntactic constructions l like [Adnominal Ending + Bound Noun ( + Postposition)] are changing into the morphological constructions, and some of these constructions are reflected in the written language. For example. the syntactic construction [Ad nominal Ending + '-de (Bound N Noun)' (+ Postposition) ) co-exists with the mixed form '-neunde' and [Adnominal Ending + 'geot' (Bound Noun) + '-j' (Postposition)) does with ' neunge'. These constructions are used frequently in the spoken language. As like other verbal endings, these forms also participate in the construction of the complex sentence, and these forms have its own case function fused into themselves So, the analytic approach to these forms can make great effect on the automatic morphological analysis systems. automatic tagging systems. and the syntactic analysis systems. So. in the design phase of a language processing systems, the language change phenomena like these must be taken l into consideration.

  • PDF

Design and Implementation of Simple Text-to-Speech System using Phoneme Units (음소단위를 이용한 소규모 문자-음성 변환 시스템의 설계 및 구현)

  • Park, Ae-Hee;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.49-60
    • /
    • 1995
  • This paper is a study on the design and implementation of the Korean Text-to-Speech system which is used for a small and simple system. In this paper, a parameter synthesis method is chosen for speech syntheiss method, we use PARCOR(PARtial autoCORrelation) coefficient which is one of the LPC analysis. And we use phoneme for synthesis unit which is the basic unit for speech synthesis. We use PARCOR, pitch, amplitude as synthesis parameter of voice, we use residual signal, PARCOR coefficients as synthesis parameter of unvoice. In this paper, we could obtain the 60% intelligibility by using the residual signal as excitation signal of unvoiced sound. The result of synthesis experiment, synthesis of a word unit is available. The controlling of phoneme duration is necessary for synthesizing of a sentence unit. For setting up the synthesis system, PC 486, a 70[Hz]-4.5[KHz] band pass filter for speech input/output, amplifier, and TMS320C30 DSP board was used.

  • PDF

Extraction of MFCC feature parameters based on the PCA-optimized filter bank and Korean connected 4-digit telephone speech recognition (PCA-optimized 필터뱅크 기반의 MFCC 특징파라미터 추출 및 한국어 4연숫자 전화음성에 대한 인식실험)

  • 정성윤;김민성;손종목;배건성
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.279-283
    • /
    • 2004
  • In general, triangular shape filters are used in the filter bank when we extract MFCC feature parameters from the spectrum of the speech signal. A different approach, which uses specific filter shapes in the filter bank that are optimized to the spectrum of training speech data, is proposed by Lee et al. to improve the recognition rate. A principal component analysis method is used to get the optimized filter coefficients. Using a large amount of 4-digit telephone speech database, in this paper, we get the MFCCs based on the PCA-optimized filter bank and compare the recognition performance with conventional MFCCs and direct weighted filter bank based MFCCs. Experimental results have shown that the MFCC based on the PCA-optimized filter bank give slight improvement in recognition rate compared to the conventional MFCCs but fail to achieve better performance than the MFCCs based on the direct weighted filter bank analysis. Experimental results are discussed with our findings.

Phonological phrase boundary and word frequency that influence the phonological word recognition (음운구 경계와 단어빈도가 한국어 음운단어 재인에 미치는 영향)

  • Kim, Jeahong;Shin, Hasun;Kim, Yeseul;Yun, Gwangyeol;Kim, Daseul;Shin, Jiyoung;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.45-56
    • /
    • 2019
  • This study investigated the interaction between phonological phrase boundary and word frequency variable in Korean speech processing. A word monitoring task was performed to examine the interference caused by the frequency effect of target word depending on whether a phonological phrase is formed within the target word. Frequency of target word (high vs low) and phonological phrase boundary (within target word vs between target words) were applied as between and within subject condition respectively. Our results showed the significant main effect of the phonological phrase boundary and the significant interaction. In the post-hoc analysis, the high-frequency target words were detected significantly faster than the low-frequency target words only in the within phonological phrase boundary condition. Frequency effect in the between phonological phrase boundary condition did not appear. The results indicated that the phonological phrase boundary and word frequency variable played an important role in Korean speech processing. In particular, we discussed the possibility of processing the word frequency at the very early sensory information processing stage based on the interaction of two experimental factors.

Isolated Digit and Command Recognition in Car Environment (자동차 환경에서의 단독 숫자음 및 명령어 인식)

  • 양태영;신원호;김지성;안동순;이충용;윤대희;차일환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.11-17
    • /
    • 1999
  • This paper proposes an observation probability smoothing technique for the robustness of a discrete hidden Markov(DHMM) model based speech recognizer. Also, an appropriate noise robust processing in car environment is suggested from experimental results. The noisy speech is often mislabeled during the vector quantization process. To reduce the effects of such mislabelings, the proposed technique increases the observation probability of similar codewords. For the noise robust processing in car environment, the liftering on the distance measure of feature vectors, the high pass filtering, and the spectral subtraction methods are examined. Recognition experiments on the 14-isolated words consists of the Korean digits and command words were performed. The database was recorded in a stopping car and a running car environments. The recognition rates of the baseline recognizer were 97.4% in a stopping situation and 59.1% in a running situation. Using the proposed observation probability smoothing technique, the liftering, the high pass filtering, and the spectral subtraction the recognition rates were enhanced to 98.3% in a stopping situation and to 88.6% in a running situation.

  • PDF