• Title/Summary/Keyword: Korean Language Model

Search Result 1,555, Processing Time 0.028 seconds

Brain-Operated Typewriter using the Language Prediction Model

  • Lee, Sae-Byeok;Lim, Heui-Seok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.5 no.10
    • /
    • pp.1770-1782
    • /
    • 2011
  • A brain-computer interface (BCI) is a communication system that translates brain activity into commands for computers or other devices. In other words, BCIs create a new communication channel between the brain and an output device by bypassing conventional motor output pathways consisting of nerves and muscles. This is particularly useful for facilitating communication for people suffering from paralysis. Due to the low bit rate, it takes much more time to translate brain activity into commands. Especially it takes much time to input characters by using BCI-based typewriters. In this paper, we propose a brain-operated typewriter which is accelerated by a language prediction model. The proposed system uses three kinds of strategies to improve the entry speed: word completion, next-syllable prediction, and next word prediction. We found that the entry speed of BCI-based typewriter improved about twice as much through our demonstration which utilized the language prediction model.

A Study on the Multilingual Speech Recognition using International Phonetic Language (IPA를 활용한 다국어 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kim, Woo-Sung;Woo, In-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3267-3274
    • /
    • 2011
  • Recently, speech recognition technology has dramatically developed, with the increase in the user environment of various mobile devices and influence of a variety of speech recognition software. However, for speech recognition for multi-language, lack of understanding of multi-language lexical model and limited capacity of systems interfere with the improvement of the recognition rate. It is not easy to embody speech expressed with multi-language into a single acoustic model and systems using several acoustic models lower speech recognition rate. In this regard, it is necessary to research and develop a multi-language speech recognition system in order to embody speech comprised of various languages into a single acoustic model. This paper studied a system that can recognize Korean and English as International Phonetic Language (IPA), based on the research for using a multi-language acoustic model in mobile devices. Focusing on finding an IPA model which satisfies both Korean and English phonemes, we get 94.8% of the voice recognition rate in Korean and 95.36% in English.

Vocabulary assessment based on construct definition in task-based language learning (과제 중심 학습에서 어휘 능력의 구성요소와 평가)

  • Kim, Yeon-Jin
    • English Language & Literature Teaching
    • /
    • v.12 no.3
    • /
    • pp.123-145
    • /
    • 2006
  • The purpose of this study is to propose an efficient vocabulary assessment model in task-based language learning and to verify the viability of this assessment model. Bachman and Palmer (1996) pointed out the fact that many language tests focus on just one of the areas of language knowledge. However, researchers suggested that it is necessary to acknowledge the needs of several analytic scales, which can provide separate ratings for different components of the language ability to be tested. Although there were many studies which tried to evaluate the various aspects of vocabulary ability, most of them measured only one or two factors. Based on previous research, this study proposed an assessment model of general construct of vocabulary ability and tried to measure vocabulary ability in four separate areas. The subjects were two classes of university level Korean EFL students. They participated in small group discussion via synchronous CMC. One class used a lexically focused task, which was proposed by Kim and Jeong (2006) and the other class used a non-lexically focused task. The results showed that the students with a lexically focused task significantly outperformed those with a non-lexically focused task in overall vocabulary ability as well as four subdivisions of vocabulary ability. In conclusion, the assessment model of separate ratings is a viable measure of vocabulary ability and this can provide elaborate interpretation of vocabulary ability.

  • PDF

A Language Model Approach to "The Vegetarian" (채식주의자: 랭귀지 모델 접근)

  • Kim, Jaejun;Kwon, Junhyeok;Kim, Yoolae;Park, Myung-Kwan;Song, Sanghoun
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.260-263
    • /
    • 2017
  • This paper is to broaden the possible spectrums of analyzing the Korean-written novel "The Vegetarian" by using the computational linguistics program. Through the use of language model, which was usually used in bi-gram analysis in corpus linguistics, to the International Man Booker award winning novel, the characteristics of "The Vegetarian" is investigated by comparing it to the English-written novel "A Little Life".

  • PDF

Instruction Tuning for Controlled Text Generation in Korean Language Model (Instruction Tuning을 통한 한국어 언어 모델 문장 생성 제어)

  • Jinhee Jang;Daeryong Seo;Donghyeon Jeon;Inho Kang;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.289-294
    • /
    • 2023
  • 대형 언어 모델(Large Language Model)은 방대한 데이터와 파라미터를 기반으로 문맥 이해에서 높은 성능을 달성하였지만, Human Alignment를 위한 문장 생성 제어 연구는 아직 활발한 도전 과제로 남아있다. 본 논문에서는 Instruction Tuning을 통한 문장 생성 제어 실험을 진행한다. 자연어 처리 도구를 사용하여 단일 혹은 다중 제약 조건을 포함하는 Instruction 데이터 셋을 자동으로 구축하고 한국어 언어 모델인 Polyglot-Ko 모델에 fine-tuning 하여 모델 생성이 제약 조건을 만족하는지 검증하였다. 실험 결과 4개의 제약 조건에 대해 평균 0.88의 accuracy를 보이며 효과적인 문장 생성 제어가 가능함을 확인하였다.

  • PDF

A Comparative Study of Second Language Acquisition Models: Focusing on Vowel Acquisition by Chinese Learners of Korean (중국인 학습자의 한국어 모음 습득에 대한 제2언어 습득 모델 비교 연구)

  • Kim, Jooyeon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.27-36
    • /
    • 2014
  • This study provided longitudinal examination of the Chinese learners' acquisition of Korean vowels. Specifically, I examined the Chinese learners' Korean monophthongs /i, e, ɨ, ${\Lambda}$, a, u, o/ that were created at the time of 1 month and 12 months, tried to verify empirically how they learn by dealing with their mother tongue, and Korean vowels through dealing with pattern of the Perceptual Assimilation Model (henceforth PAM) of Best (Best, 1993; 1994; Best & Tyler, 2007) and the Speech Learning Model (henceforth SLM) of Flege (Flege, 1987; Bohn & Flege, 1992, Flege, 1995). As a result, most of the present results are shown to be similarly explained by the PAM and SLM, and the only discrepancy between these two models is found in the 'similar' category of sounds between the learners' native language and the target language. Specifically, the acquisition pattern of /u/ and /o/ in Korean is well accounted for the PAM, but not in the SLM. The SLM did not explain why the Chinese learners had difficulty in acquiring the Korean vowel /u/, because according to the SLM, the vowel /u/ in Chinese (the native language) is matched either to the vowel /u/ or /o/ in Korean (the target language). Namely, there is only a one-to-one matching relationship between the native language and the target language. In contrast, the Chinese learners' difficulty for the Korean vowel /u/ is well accounted for in the PAM in that the Chinese vowel /u/ is matched to the vowel pair /o, u/ in Korean, not the single vowel, /o/ or /u/.

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

A Concept Language Model combining Word Sense Information and BERT (의미 정보와 BERT를 결합한 개념 언어 모델)

  • Lee, Ju-Sang;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.3-7
    • /
    • 2019
  • 자연어 표상은 자연어가 가진 정보를 컴퓨터에게 전달하기 위해 표현하는 방법이다. 현재 자연어 표상은 학습을 통해 고정된 벡터로 표현하는 것이 아닌 문맥적 정보에 의해 벡터가 변화한다. 그 중 BERT의 경우 Transformer 모델의 encoder를 사용하여 자연어를 표상하는 기술이다. 하지만 BERT의 경우 학습시간이 많이 걸리며, 대용량의 데이터를 필요로 한다. 본 논문에서는 빠른 자연어 표상 학습을 위해 의미 정보와 BERT를 결합한 개념 언어 모델을 제안한다. 의미 정보로 단어의 품사 정보와, 명사의 의미 계층 정보를 추상적으로 표현했다. 실험을 위해 ETRI에서 공개한 한국어 BERT 모델을 비교 대상으로 하며, 개체명 인식을 학습하여 비교했다. 두 모델의 개체명 인식 결과가 비슷하게 나타났다. 의미 정보가 자연어 표상을 하는데 중요한 정보가 될 수 있음을 확인했다.

  • PDF

A Study on Recognition of Citation Metadata using Bidirectional GRU-CRF Model based on Pre-trained Language Model (사전학습 된 언어 모델 기반의 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 이용한 참고문헌 메타데이터 인식 연구)

  • Ji, Seon-yeong;Choi, Sung-pil
    • Journal of the Korean Society for information Management
    • /
    • v.38 no.1
    • /
    • pp.221-242
    • /
    • 2021
  • This study applied reference metadata recognition using bidirectional GRU-CRF model based on pre-trained language model. The experimental group consists of 161,315 references extracted by 53,562 academic documents in PDF format collected from 40 journals published in 2018 based on rules. In order to construct an experiment set. This study was conducted to automatically extract the references from academic literature in PDF format. Through this study, the language model with the highest performance was identified, and additional experiments were conducted on the model to compare the recognition performance according to the size of the training set. Finally, the performance of each metadata was confirmed.