• Title/Summary/Keyword: Korean Language Model

Search Result 1,580, Processing Time 0.029 seconds

A Statistical Model for Choosing the Best Translation of Prepositions. (통계 정보를 이용한 전치사 최적 번역어 결정 모델)

  • 심광섭
    • Language and Information
    • /
    • v.8 no.1
    • /
    • pp.101-116
    • /
    • 2004
  • This paper proposes a statistical model for the translation of prepositions in English-Korean machine translation. In the proposed model, statistical information acquired from unlabeled Korean corpora is used to choose the best translation from several possible translations. Such information includes functional word-verb co-occurrence information, functional word-verb distance information, and noun-postposition co-occurrence information. The model was evaluated with 443 sentences, each of which has a prepositional phrase, and we attained 71.3% accuracy.

  • PDF

Donguibogam-Based Pattern Diagnosis Using Natural Language Processing and Machine Learning (자연어 처리 및 기계학습을 통한 동의보감 기반 한의변증진단 기술 개발)

  • Lee, Seung Hyeon;Jang, Dong Pyo;Sung, Kang Kyung
    • The Journal of Korean Medicine
    • /
    • v.41 no.3
    • /
    • pp.1-8
    • /
    • 2020
  • Objectives: This paper aims to investigate the Donguibogam-based pattern diagnosis by applying natural language processing and machine learning. Methods: A database has been constructed by gathering symptoms and pattern diagnosis from Donguibogam. The symptom sentences were tokenized with nouns, verbs, and adjectives with natural language processing tool. To apply symptom sentences into machine learning, Word2Vec model has been established for converting words into numeric vectors. Using the pair of symptom's vector and pattern diagnosis, a pattern prediction model has been trained through Logistic Regression. Results: The Word2Vec model's maximum performance was obtained by optimizing Word2Vec's primary parameters -the number of iterations, the vector's dimensions, and window size. The obtained pattern diagnosis regression model showed 75% (chance level 16.7%) accuracy for the prediction of Six-Qi pattern diagnosis. Conclusions: In this study, we developed pattern diagnosis prediction model based on the symptom and pattern diagnosis from Donguibogam. The prediction accuracy could be increased by the collection of data through future expansions of oriental medicine classics.

An MP Interpretation of EFL Learners′ Linguistic Behaviour

  • Kang, Ae-Jin
    • Korean Journal of English Language and Linguistics
    • /
    • v.4 no.1
    • /
    • pp.33-60
    • /
    • 2004
  • This study was an attempt to present an appropriate way of interpreting L2 learners' linguistic behavior within Universal Grammar (UG) framework. Based on the Korean EFL adult learners' performance on the Subjacency violation sentences, the study suggested that the EFL learners are able to acquire subtle knowledge of target grammar and their linguistic behavior should be interpreted with the most recent version of UG theory, the Minimalist Program (MP) notion. The MP notion seems more plausible to accommodate incomplete L2 grammar while acknowledging UG-constrained interlanguage which the previous version, Principles and Parameters (P&P) approach, could not explain very well. The study observed no age-effects among the Korean EFL learners in their linguistic competence measured by the performance on the UG-constraint violation sentences. Having suggested that the MP notion can be a more reasonable tool to explain the EFL learners' linguistic behavior, the study introduced comprehensive hypotheses such as Constructionist Model (CM) and the Ontogeny Phylogeny Model (OPM).

  • PDF

Unpaired Korean Text Style Transfer with Masked Language Model (마스크 언어 모델 기반 비병렬 한국어 텍스트 스타일 변환)

  • Bae, Jangseong;Lee, Changki;Noh, Hyungjong;Hwang, Jeongin
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.391-395
    • /
    • 2021
  • 텍스트 스타일 변환은 입력 스타일(source style)로 쓰여진 텍스트의 내용(content)을 유지하며 목적 스타일(target style)의 텍스트로 변환하는 문제이다. 텍스트 스타일 변환을 시퀀스 간 변환 문제(sequence-to-sequence)로 보고 기존 기계학습 모델을 이용해 해결할 수 있지만, 모델 학습에 필요한 각 스타일에 대응되는 병렬 말뭉치를 구하기 어려운 문제점이 있다. 따라서 최근에는 비병렬 말뭉치를 이용해 텍스트 스타일 변환을 수행하는 방법들이 연구되고 있다. 이 연구들은 주로 인코더-디코더 구조의 생성 모델을 사용하기 때문에 입력 문장이 가지고 있는 내용이 누락되거나 다른 내용의 문장이 생성될 수 있는 문제점이 있다. 본 논문에서는 마스크 언어 모델(masked language model)을 이용해 입력 텍스트의 내용을 유지하면서 원하는 스타일로 변경할 수 있는 텍스트 스타일 변환 방법을 제안하고 한국어 긍정-부정, 채팅체-문어체 변환에 적용한다.

  • PDF

Hypernetwork Memory-Based Model for Infant's Language Learning (유아 언어학습에 대한 하이퍼망 메모리 기반 모델)

  • Lee, Ji-Hoon;Lee, Eun-Seok;Zhang, Byoung-Tak
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.983-987
    • /
    • 2009
  • One of the critical themes in the language acquisition is its exposure to linguistic environments. Linguistic environments, which interact with infants, include not only human beings such as its parents but also artificially crafted linguistic media as their functioning elements. An infant learns a language by exploring these extensive language environments around it. Based on such large linguistic data exposure, we propose a machine learning based method on the cognitive mechanism that simulate flexibly and appropriately infant's language learning. The infant's initial stage of language learning comes with sentence learning and creation, which can be simulated by exposing it to a language corpus. The core of the simulation is a memory-based learning model which has language hypernetwork structure. The language hypernetwork simulates developmental and progressive language learning using the structure of new data stream through making it representing of high level connection between language components possible. In this paper, we simulates an infant's gradual and developmental learning progress by training language hypernetwork gradually using 32,744 sentences extracted from video scripts of commercial animation movies for children.

Korean Sentence Comprehension of Korean/English Bilingual Children (한국어/영어 이중언어사용 아동의 한국어 문장이해: 조사, 의미, 어순 단서의 활용을 중심으로)

  • Hwang, Min-A
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.241-254
    • /
    • 2003
  • The purpose of the present study was to investigate the sentence comprehension strategies used by Korean/English bilingual children when they listened to sentences of their first language, i.e., Korean. The framework of competition model was employed to analyze the influence of the second language, i.e., English, during comprehension of Korean sentences. The participants included 10 bilingual children (ages 7;4-13;0) and 20 Korean-speaking monolingual children(ages 5;7-6;10) with similar levels of development in Korean language as bilingual children. In an act-out procedure, the children were asked to determine the agent in sentences composed of two nouns and a verb with varying conditions of three cues (case-marker, animacy, and word-order). The results revealed that both groups of children used the case marker cues as the strongest cue among the three. The bilingual children relied on case-marker cues even more than the monolingual children. However, the bilingual children used animacy cues significantly less than the monolingual children. There were no significant differences between the groups in the use of word-order cues. The bilingual children appeared less effective in utilizing animacy cues in Korean sentence comprehension due to the backward transfer from English where the cue strength of animacy is very weak. The influence of the second language on the development of the first language in bilingual children was discussed.

  • PDF

Evaluating the Impact of Training Conditions on the Performance of GPT-2-Small Based Korean-English Bilingual Models

  • Euhee Kim;Keonwoo Koo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.69-77
    • /
    • 2024
  • This study evaluates the performance of second language acquisition models learning Korean and English using the GPT-2-Small model, analyzing the impact of various training conditions on performance. Four training conditions were used: monolingual learning, sequential learning, sequential-interleaved learning, and sequential-EWC learning. The model was trained using datasets from the National Institute of Korean Language and English from BabyLM Challenge, with performance measured through PPL and BLiMP metrics. Results showed that monolingual learning had the best performance with a PPL of 16.2 and BLiMP accuracy of 73.7%. In contrast, sequential-EWC learning had the highest PPL of 41.9 and the lowest BLiMP accuracy of 66.3%(p < 0.05). Monolingual learning proved most effective for optimizing model performance. The EWC regularization in sequential-EWC learning degraded performance by limiting weight updates, hindering new language learning. This research improves understanding of language modeling and contributes to cognitive similarity in AI language learning.

A study on the optimal task-based instructional model: Focused on Korean EFL classroom practice (효율적인 과업중심 교수.학습모형 연구: EFL 교실 상황을 중심으로)

  • Jeon, In-Jae
    • English Language & Literature Teaching
    • /
    • v.11 no.4
    • /
    • pp.365-389
    • /
    • 2005
  • The purpose of this study is to present the task model that is the most effective in English language methodology based on the investigation of task-based performance in Korean EFL classroom practice. The subjects were 538 high school students and 126 high school teachers, each of whom had common experiences using the materials of task-based activities for more than one year. To analyze the data, the program SPSS WIN 11.0 including frequency distribution and chi-square analysis was used. The results of the questionnaire analysis showed that both teachers and students had a comparatively high level of satisfaction in task rationale, but that they had some mixed responses in the fields of input data, settings, and activity types. To conclude, a few suggestions are made to provide some meaningful considerations for the EFL teachers and material developers: a) task goals and rationale that encourage the learner's positive motivation; b) authenticity of input data based on the real-world context; c) collaborative learning environment that enhances communicative interaction; d) proportional representation of the creative problem-solving activities related to discussions and decision-making processes; e) systematic introduction of integrated language skills. It also suggests that the multi-lateral task model, which has some positive assets compared to previous task models, be newly introduced and applied to the second language learning classrooms.

  • PDF