• 제목/요약/키워드: Word order

검색결과 1,011건 처리시간 0.028초

A Study on Word Sense Disambiguation Using Bidirectional Recurrent Neural Network for Korean Language

  • Min, Jihong;Jeon, Joon-Woo;Song, Kwang-Ho;Kim, Yoo-Sung
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권4호
    • /
    • pp.41-49
    • /
    • 2017
  • Word sense disambiguation(WSD) that determines the exact meaning of homonym which can be used in different meanings even in one form is very important to understand the semantical meaning of text document. Many recent researches on WSD have widely used NNLM(Neural Network Language Model) in which neural network is used to represent a document into vectors and to analyze its semantics. Among the previous WSD researches using NNLM, RNN(Recurrent Neural Network) model has better performance than other models because RNN model can reflect the occurrence order of words in addition to the word appearance information in a document. However, since RNN model uses only the forward order of word occurrences in a document, it is not able to reflect natural language's characteristics that later words can affect the meanings of the preceding words. In this paper, we propose a WSD scheme using Bidirectional RNN that can reflect not only the forward order but also the backward order of word occurrences in a document. From the experiments, the accuracy of the proposed model is higher than that of previous method using RNN. Hence, it is confirmed that bidirectional order information of word occurrences is useful for WSD in Korean language.

청각 단어 재인에서 나타난 한국어 단어길이 효과 (The Korean Word Length Effect on Auditory Word Recognition)

  • 최원일;남기춘
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.137-140
    • /
    • 2002
  • This study was conducted to examine the korean word length effects on auditory word recognition. Linguistically, word length can be defined by several sublexical units such as letters, phonemes, syllables, and so on. In order to investigate which units are used in auditory word recognition, lexical decision task was used. Experiment 1 and 2 showed that syllable length affected response time, and syllable length interacted with word frequency. As a result, in recognizing auditory word syllable length was important variable.

  • PDF

Word Order and Cliticization in Sakizaya: A Corpus-based Approach

  • Lin, Chihkai
    • 아시아태평양코퍼스연구
    • /
    • 제1권2호
    • /
    • pp.41-56
    • /
    • 2020
  • This paper aims to investigate how word order interacts with cliticization in Sakizaya, a Formosan language. This paper looks into nominative and genitive case markers from a corpus-based approach. The data are collected from an online dictionary of Sakizaya, and they are classified into two word orders: nominative case marker preceding genitive case marker and vice versa. The data are also divided into three categories, according to the demarcation of the case markers, which include right, left, or no demarcation. The corpus includes 700 sentences in the construction of predicate + noun phrase + noun phrase. The results suggest that the two case markers tend to be parsed into the preceding word and show right demarcation. The results also reveal that there are type difference and distance effect of the case markers on the cliticization. Nominative case markers show more right demarcation than genitive case markers do in the corpus. Also, the closer the case markers are to the predicate, the more possible the case markers undergo cliticization.

한국어의 어순 구조를 고려한 Two-Path 언어모델링 (Two-Path Language Modeling Considering Word Order Structure of Korean)

  • 신중휘;박재현;이정태;임해창
    • 한국음향학회지
    • /
    • 제27권8호
    • /
    • pp.435-442
    • /
    • 2008
  • n-gram 모델은 영어와 같이 어순이 문법적으로 제약을 받는 언어에 적합하다. 그러나 어순이 비교적 자유로운 한국어에는 적합하지 않다. 기존 연구는 어절 간 어순의 고려가 어려운 한국어의 특성을 반영한 twoply HMM을 제안했으나, 인접 어절 간 어순 구조를 반영하지 못하였다. 본 논문에서는 용언형태소 사이에 나타나는 인접 어절 간에 어순 특성을 반영하기 위해 두 어절을 결합하는 세그먼트 단위를 정의하고, 제안한 세그먼트 단위에서 문맥에 따라 확률을 달리 추정하는 two-path 언어모델을 제안한다. 그 결과 기존 한국어 언어모델에 비해 제안하는 two-path 언어모델은 기존 연구보다 25.68% 혼잡도를 줄였으며, 어절 간에 결합이 일어나는 경계인 용언형태소에서는 94.03%의 혼잡도를 줄였다.

Ranking Translation Word Selection Using a Bilingual Dictionary and WordNet

  • Kim, Kweon-Yang;Park, Se-Young
    • 한국지능시스템학회논문지
    • /
    • 제16권1호
    • /
    • pp.124-129
    • /
    • 2006
  • This parer presents a method of ranking translation word selection for Korean verbs based on lexical knowledge contained in a bilingual Korean-English dictionary and WordNet that are easily obtainable knowledge resources. We focus on deciding which translation of the target word is the most appropriate using the measure of semantic relatedness through the 45 extended relations between possible translations of target word and some indicative clue words that play a role of predicate-arguments in source language text. In order to reduce the weight of application of possibly unwanted senses, we rank the possible word senses for each translation word by measuring semantic similarity between the translation word and its near synonyms. We report an average accuracy of $51\%$ with ten Korean ambiguous verbs. The evaluation suggests that our approach outperforms the default baseline performance and previous works.

Ordering a Left-branching Language: Heaviness vs. Givenness

  • Choi, Hye-Won
    • 한국언어정보학회지:언어와정보
    • /
    • 제13권1호
    • /
    • pp.39-56
    • /
    • 2009
  • This paper investigates ordering alternation phenomena in Korean using the dative construction data from Sejong Corpus of Modern Korean (Kim, 2000). The paper first shows that syntactic weight and information structure are distinct and independent factors that influence word order in Korean. Moreover, it reveals that heaviness and givenness compete each other and exert diverging effects on word order, which contrasts the converging effects of these factors shown in word orders of right-branching languages like English. The typological variation of syntactic weight effect poses interesting theoretical and empirical questions, which are discussed in relation to processing efficiency in ordering.

  • PDF

한국어 어휘 의미망(alias. KorLex)의 지식 그래프 임베딩을 이용한 문맥의존 철자오류 교정 기법의 성능 향상 (Performance Improvement of Context-Sensitive Spelling Error Correction Techniques using Knowledge Graph Embedding of Korean WordNet (alias. KorLex))

  • 이정훈;조상현;권혁철
    • 한국멀티미디어학회논문지
    • /
    • 제25권3호
    • /
    • pp.493-501
    • /
    • 2022
  • This paper is a study on context-sensitive spelling error correction and uses the Korean WordNet (KorLex)[1] that defines the relationship between words as a graph to improve the performance of the correction[2] based on the vector information of the word embedded in the correction technique. The Korean WordNet replaced WordNet[3] developed at Princeton University in the United States and was additionally constructed for Korean. In order to learn a semantic network in graph form or to use it for learned vector information, it is necessary to transform it into a vector form by embedding learning. For transformation, we list the nodes (limited number) in a line format like a sentence in a graph in the form of a network before the training input. One of the learning techniques that use this strategy is Deepwalk[4]. DeepWalk is used to learn graphs between words in the Korean WordNet. The graph embedding information is used in concatenation with the word vector information of the learned language model for correction, and the final correction word is determined by the cosine distance value between the vectors. In this paper, In order to test whether the information of graph embedding affects the improvement of the performance of context- sensitive spelling error correction, a confused word pair was constructed and tested from the perspective of Word Sense Disambiguation(WSD). In the experimental results, the average correction performance of all confused word pairs was improved by 2.24% compared to the baseline correction performance.

영한 기계 번역에서 한국어 부사의 어순 결정에 관한 연구 (A Study of Korean Adverb Ordering in English-Korean Machine Translation)

  • 이신원;안동언;정성종
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 하계종합학술대회 논문집(3)
    • /
    • pp.203-206
    • /
    • 2001
  • In the EKMT system, the part of Korea generation makes Korea sentence by using information obtained in the part of transfer. In the case of Korea generation, the conventional EKMT system don't arrange hierarchical word order and performs word order in the only modifier word. This paper proposes Korean adverb odering rule in English-Korean Machine Translation system which generates Korean sentence.

  • PDF

핵심어 인식기에서 단어의 음소레벨 로그 우도 비율의 패턴을 이용한 발화검증 방법 (Utterance Verification using Phone-Level Log-Likelihood Ratio Patterns in Word Spotting Systems)

  • 김정현;권석봉;김회린
    • 말소리와 음성과학
    • /
    • 제1권1호
    • /
    • pp.55-62
    • /
    • 2009
  • This paper proposes an improved method to verify a keyword segment that results from a word spotting system. First a baseline word spotting system is implemented. In order to improve performance of the word spotting systems, we use a two-pass structure which consists of a word spotting system and an utterance verification system. Using the basic likelihood ratio test (LRT) based utterance verification system to verify the keywords, there have been certain problems which lead to performance degradation. So, we propose a method which uses phone-level log-likelihood ratios (PLLR) patterns in computing confidence measures for each keyword. The proposed method generates weights according to the PLLR patterns and assigns different weights to each phone in the process of generating confidence measures for the keywords. This proposed method has shown to be more appropriate to word spotting systems and we can achieve improvement in final word spotting accuracy.

  • PDF

한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여 (Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method)

  • 오준호
    • 대한한의학원전학회지
    • /
    • 제32권1호
    • /
    • pp.61-74
    • /
    • 2019
  • Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.