• Title/Summary/Keyword: Lexical model

Search Result 99, Processing Time 0.124 seconds

A Comparison Study between Human and Computation Model on Language Phenomena of in Korean Lexical Decision Task (한국어 어휘판단과제와 관련된 언어현상의 인간과 계산주의 모델의 비교)

  • Lim, Heui-Seok;Kwon, You-An;Park, Ki-Nam
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2006.06a
    • /
    • pp.33-37
    • /
    • 2006
  • 본 논문은 어휘판단과제(LDT: Lexical Decision Task)시 나타나는 여러 언어현상 중 단어빈도효과(word frequency effect)와 단어유사성효과(word similarity effect)를 한국어에 적용시켜 인간과 계산주의적 모델을 통해 실험하고, 결과를 비교하였다. 실험결과 인간과 계산주의적 모델 각각 한국어에 대해 단어빈도효과와 단어 유사성효과를 보였으며, 인간의 실험결과와 계산주의적 모델의 결과가 유의미한 유사성을 나타내었다.

  • PDF

Korean Semantic Similarity Measures for the Vector Space Models

  • Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.49-55
    • /
    • 2015
  • It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.

A Comparison Study between Human and Computation Model on Language Phenomena of in Korean Lexical Decision Task (한국어 어휘판단과제와 관련된 언어현상의 인간과 계산주의 모델의 비교)

  • Park, Ki-Nam;Lim, Heui-Seok
    • Proceedings of the KAIS Fall Conference
    • /
    • 2006.05a
    • /
    • pp.391-393
    • /
    • 2006
  • 본 논문은 어휘판단과제(LDT: Lexical Decision Task)시 나타나는 여러 언어현상 중 단어빈도효과(word frequency effect)와 단어유사성효과(word similarity effect)를 한국어에 적용시켜 인간과 계산 주의적 모델을 통해 실험하고, 결과를 비교하였다. 실험결과 인간과 계산주의적 모델 각각 한국어에 대해 단어빈도효과와 단어 유사성효과를 보였으며, 인간의 실험결과와 계산주의적 모델의 결과가 유의미한 유사성을 나타내었다.

  • PDF

Cognitive-Neuro Computational Model of Lexical Acquisition in Korean (인지신경기반의 한국어 어휘습득 계산주의적 모델)

  • Yu, Won-Hee;Park, Ki-Nam;Lyu, Ki-Gon;Lim, Heui-Seok;Nam, Ki-Chun
    • Proceedings of the KAIS Fall Conference
    • /
    • 2007.11a
    • /
    • pp.89-91
    • /
    • 2007
  • 본 논문은 인간의 어휘획득(Lexical Aquisition)과정을 하이브리드(hybrid)한 형태의 계산주의적(Computational) 모델을 설계,반복 실험을 통해 인지신경기반의 어휘습득 모델을 구현하고 실험하였다. 이 연구를 통해 인간의 어휘획득 과정을 모사(simulate)할수 있었고, 이로인해 인지신경기반 어휘 정보처리 시스템 개발을 위한 자동어휘 획득, 심성 어휘집 표상, 어휘 인식(word recognition)의 계산주의적 모델 개발에 기여할 수 있을 것이다.

  • PDF

One-Class Classification Model Based on Lexical Information and Syntactic Patterns (어휘 정보와 구문 패턴에 기반한 단일 클래스 분류 모델)

  • Lee, Hyeon-gu;Choi, Maengsik;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.817-822
    • /
    • 2015
  • Relation extraction is an important information extraction technique that can be widely used in areas such as question-answering and knowledge population. Previous studies on relation extraction have been based on supervised machine learning models that need a large amount of training data manually annotated with relation categories. Recently, to reduce the manual annotation efforts for constructing training data, distant supervision methods have been proposed. However, these methods suffer from a drawback: it is difficult to use these methods for collecting negative training data that are necessary for resolving classification problems. To overcome this drawback, we propose a one-class classification model that can be trained without using negative data. The proposed model determines whether an input data item is included in an inner category by using a similarity measure based on lexical information and syntactic patterns in a vector space. In the experiments conducted in this study, the proposed model showed higher performance (an F1-score of 0.6509 and an accuracy of 0.6833) than a representative one-class classification model, one-class SVM(Support Vector Machine).

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

A Study on the Multilingual Speech Recognition for On-line International Game (온라인 다국적 게임을 위한 다국어 혼합 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kang, Heung-Soon;Woo, In-Sung;Shin, Chwa-Cheul;Yoon, Chun-Duk
    • Journal of Korea Game Society
    • /
    • v.8 no.4
    • /
    • pp.107-114
    • /
    • 2008
  • The requests for speech-recognition for multi-language in field of game and the necessity of multi-language system, which expresses one phonetic model from many different kind of language phonetics, has been increased in field of game industry. Here upon, the research regarding development of multi-national language system which can express speeches, that is consist of various different languages, into only one lexical model is needed. In this paper is basic research for establishing integrated system from multi-language lexical model, and it shows the system which recognize Korean and English speeches into IPA(International Phonetic Alphabet). We focused on finding the IPA model which is satisfied with Korean and English phoneme one simutaneously. As a result, we could get the 90.62% of Korean speech-recognition rate, also 91.71% of English speech-recognition rate.

  • PDF

Recognition of Answer Type for WiseQA (WiseQA를 위한 정답유형 인식)

  • Heo, Jeong;Ryu, Pum Mo;Kim, Hyun Ki;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.7
    • /
    • pp.283-290
    • /
    • 2015
  • In this paper, we propose a hybrid method for the recognition of answer types in the WiseQA system. The answer types are classified into two categories: the lexical answer type (LAT) and the semantic answer type (SAT). This paper proposes two models for the LAT detection. One is a rule-based model using question focuses. The other is a machine learning model based on sequence labeling. We also propose two models for the SAT classification. They are a machine learning model based on multiclass classification and a filtering-rule model based on the lexical answer type. The performance of the LAT detection and the SAT classification shows F1-score of 82.47% and precision of 77.13%, respectively. Compared with IBM Watson for the performance of the LAT, the precision is 1.0% lower and the recall is 7.4% higher.

Sentence-Chain Based Seq2seq Model for Corpus Expansion

  • Chung, Euisok;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.455-466
    • /
    • 2017
  • This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.

A Constant Time Algorithm for Deterministic Finite Automata Problem on a Reconfigurable Mesh (재구성 가능한 메쉬에서 결정적 유한 자동장치 문제에 대한 상수시간 알고리즘)

  • Kim, Yeong-Hak
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.11
    • /
    • pp.2946-2953
    • /
    • 1999
  • Finite automation is a mathematical model to represent a system with discrete inputs and outputs. Finite automata are a useful tool for solving problems such as text editor, lexical analyzer, and switching circuit. In this paper, given a deterministic finite automaton of an input string of length n and m states, we propose a constant time parallel algorithm that represents the transition states of finite automata and determines the acceptance of an input string on a reconfigurable mesh of size [nm/2]$\times$2m.

  • PDF