• Title/Summary/Keyword: 단어길이

Search Result 147, Processing Time 0.026 seconds

Question Analysis based on Focus-words for Korean Question-Answering System (한국어 질의 응답 시스템을 위한 초점단어 기반 질의분석)

  • Kim, Won-Nam;Shin, Seung-Eun;Seo, Young-Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2004.11a
    • /
    • pp.476-482
    • /
    • 2004
  • Question-Answering (QA) system has to analyze user's intention correctly to respond correct answer for user's question., This paper proposes a focus-word-based question analysis approach for Korean QA system to analyze user's intention correctly. focus-word is a clue-word which selects question type. The question type is determined to one in 75 subcategories using semantics of focus-words. the proposed system accomplished 97.18% accuracy for the main category and 95.31% accuracy for the subcategory in the question classification.

  • PDF

A Recognition Time Reduction Algorithm for Large-Vocabulary Speech Recognition (대용량 음성인식을 위한 인식기간 감축 알고리즘)

  • Koo, Jun-Mo;Un, Chong-Kwan;,
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.3
    • /
    • pp.31-36
    • /
    • 1991
  • We propose an efficient pre-classification algorithm extracting candidate words to reduce the recognition time in a large-vocabulary recognition system and also propose the use of spectral and temporal smoothing of the observation probability to improve its classification performance. The proposed algorithm computes the coarse likelihood score for each word in a lexicon using the observation probabilities of speech spectra and duration information of recognition units. With the proposed approach we could reduce the computational amount by 74% with slight degradation of recognition accuracy in 1160-word recognition system based on the phoneme-level HMM. Also, we observed that the proposed coarse likelihood score computation algorithm is a good estimator of the likelihood score computed by the Viterbi algorithm.

  • PDF

Variables affecting Korean word recognition: focusing on syllable shape (한글 단어 재인에 영향을 미치는 변인: 음절 형태를 중심으로)

  • Min, Suyoung;Lee, Chang H.
    • Korean Journal of Cognitive Science
    • /
    • v.29 no.4
    • /
    • pp.193-220
    • /
    • 2018
  • Recent studies have demonstrated that word frequency, word length, neighborhood and word shape may have a role in visual word recognition. Shape information may affect word processing in different ways as Korean letter system works differently than that of English. The purpose of this study was to apply Gestalt's continuity principle to Korean alphabetic script(hangul), and to investigate the processing unit of hangul and to verify whether syllable shape affects word recognition in hangul. In experiment 1, three syllable words were utilized and two variables; 1) syllable types(horizontal syllable shape, e.g., "가". vertical syllable shape, e.g., "고") and 2) presenting direction (horizontal, vertical) were manipulated. Whereas "가" meets the criteria of Gestalt's continuity principle, "고" does not. Based on the result of lexical decision time, horizontal syllable shape type showed significant performance improvement, when compared to vertical syllable shape type, regardless of the presenting direction. In experiment 2, syllable types(horizontal syllable shape, vertical syllable shape) and the visual relationship between prime and target(identical, similar, different) were manipulated by using masked priming. There was a significant performance difference between the visual relationship of prime and target, and thus the effect of syllable shape was verified.

Comparison of vowel lengths of articles and monosyllabic nouns in Korean EFL learners' noun phrase production in relation to their English proficiency (한국인 영어학습자의 명사구 발화에서 영어 능숙도에 따른 관사와 단음절 명사 모음 길이 비교)

  • Park, Woojim;Mo, Ranm;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.33-40
    • /
    • 2020
  • The purpose of this research was to find out the relation between Korean learners' English proficiency and the ratio of the length of the stressed vowel in a monosyllabic noun to that of the unstressed vowel in an article of the noun phrases (e.g., "a cup", "the bus", etcs.). Generally, the vowels in monosyllabic content words are phonetically more prominent than the ones in monosyllabic function words as the former have phrasal stress, making the vowels in content words longer in length, higher in pitch, and louder in amplitude. This study, based on the speech samples from Korean-Spoken English Corpus (K-SEC) and Rated Korean-Spoken English Corpus (Rated K-SEC), examined 879 English noun phrases, which are composed of an article and a monosyllabic noun, from sentences which are rated on 4 levels of proficiency. The lengths of the vowels in these 879 target NPs were measured and the ratio of the vowel lengths in nouns to those in articles was calculated. It turned out that the higher the proficiency level, the greater the mean ratio of the vowels in nouns to the vowels in articles, confirming the research's hypothesis. This research thus concluded that for the Korean English learners, the higher the English proficiency level, the better they could produce the stressed and unstressed vowels with more conspicuous length differences between them.

Sentiment Analysis using Robust Parallel Tri-LSTM Sentence Embedding in Out-of-Vocabulary Word (Out-of-Vocabulary 단어에 강건한 병렬 Tri-LSTM 문장 임베딩을 이용한 감정분석)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.16-24
    • /
    • 2021
  • The exiting word embedding methodology such as word2vec represents words, which only occur in the raw training corpus, as a fixed-length vector into a continuous vector space, so when mapping the words incorporated in the raw training corpus into a fixed-length vector in morphologically rich language, out-of-vocabulary (OOV) problem often happens. Even for sentence embedding, when representing the meaning of a sentence as a fixed-length vector by synthesizing word vectors constituting a sentence, OOV words make it challenging to meaningfully represent a sentence into a fixed-length vector. In particular, since the agglutinative language, the Korean has a morphological characteristic to integrate lexical morpheme and grammatical morpheme, handling OOV words is an important factor in improving performance. In this paper, we propose parallel Tri-LSTM sentence embedding that is robust to the OOV problem by extending utilizing the morphological information of words into sentence-level. As a result of the sentiment analysis task with corpus in Korean, we empirically found that the character unit is better than the morpheme unit as an embedding unit for Korean sentence embedding. We achieved 86.17% accuracy on the sentiment analysis task with the parallel bidirectional Tri-LSTM sentence encoder.

Improving Multinomial Naive Bayes Text Classifier (다항시행접근 단순 베이지안 문서분류기의 개선)

  • 김상범;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.259-267
    • /
    • 2003
  • Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the Problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

Psalm Text Generator Comparison Between English and Korean Using LSTM Blocks in a Recurrent Neural Network (순환 신경망에서 LSTM 블록을 사용한 영어와 한국어의 시편 생성기 비교)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.269-271
    • /
    • 2022
  • In recent years, RNN networks with LSTM blocks have been used extensively in machine learning tasks that process sequential data. These networks have proven to be particularly good at sequential language processing tasks by being more able to accurately predict the next most likely word in a given sequence than traditional neural networks. This study trained an RNN / LSTM neural network on three different translations of 150 biblical Psalms - in both English and Korean. The resulting model is then fed an input word and a length number from which it automatically generates a new Psalm of the desired length based on the patterns it recognized while training. The results of training the network on both English text and Korean text are compared and discussed.

  • PDF

Vocal Tract Length Normalization for Speech Recognition (음성인식을 위한 성도 길이 정규화)

  • 지상문
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.7
    • /
    • pp.1380-1386
    • /
    • 2003
  • Speech recognition performance is degraded by the variation in vocal tract length among speakers. In this paper, we have used a vocal tract length normalization method wherein the frequency axis of the short-time spectrum associated with a speaker's speech is scaled to minimize the effects of speaker's vocal tract length on the speech recognition performance In order to normalize vocal tract length, we tried several frequency warping functions such as linear and piece-wise linear function. Variable interval piece-wise linear warping function is proposed to effectively model the variation of frequency axis scale due to the large variation of vocal tract length. Experimental results on TIDIGITS connected digits showed the dramatic reduction of word error rates from 2.15% to 0.53% by the proposed vocal tract normalization.

Cursor Moving by Voice Command using DTW method (DTW방식을 이용한 음성 명령에 의한 커서 조작)

  • 추명경;손영선
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.1
    • /
    • pp.82-87
    • /
    • 2001
  • 본 논문에서는 마우스 대신에 음성으로 명령을 입력하여 퍼지 추론을 통해 위도우 화면상의 커서를 이동시키는 인터페이스를 구현하였다. 입력된 음성이 대체로 짧은 언어이기에 이를 인식하기 위하여 고립단어 인식에 강한 DTW방식을 사용하였다. DTW방식의 단점중인 하나가 음성길이가 비슷한 명령을 입력하였을 때 표준패턴 중 오차 값이 가장 작은 패턴으로 인식하는 것이다. 예를 들면 \"아주 많이 이동해\"하는 음성이 입력되었을 때 비슷한 음성길이를 가진 \"아주 많이 오른쪽\"으로 인식하는 경우가 있다. 이런 오류를 해결하고자 각 패턴의 DTW오차 거리 값과 표준 패턴의 음성길이를 기준으로 임계값을 퍼지 추론하여 명령으로서의 수락 여부를 결정하였다. 판단이 애매한 부분은 사용자에게 질의를 하여 응답에 따라 수락 여부를 결정하였다.

  • PDF

A Study on the Foreign Accent of English Stressed Syllables (영어강세음절의 외국인어투에 관한 연구)

  • Park, Hee-Suk
    • Journal of Convergence Society for SMB
    • /
    • v.6 no.4
    • /
    • pp.51-57
    • /
    • 2016
  • This study aims at investigating and comparing the vowel lengths of the eight stressed syllable vowels among the Korean college students with the English native speakers. To do this English sentences were uttered and recorded by twenty Korean subjects. Acoustic features were measured from a sound spectrogram with the help of the Praat software program and analyzed through statistical analysis. From the results of the experiment, I was able to find out that the differences of the lengths of the first syllable stressed vowels were significant. Especially in the pronunciation of the English front low vowel /${\ae}$/, native subjects pronounced significantly longer than Korean subjects, and this result could be used as a teaching material in pronunciation class.