• Title/Summary/Keyword: allomorph

Search Result 7, Processing Time 0.02 seconds

A Point-Of-Interest Allomorph Database Construction System (POI 이형태 데이타베이스 구축 시스템)

  • Yang, Seung-Weon;Lee, Hyun-Young;Wang, Ji-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.226-235
    • /
    • 2009
  • People use various information for searching POI in the navigation system such as name, category, address, phone number. Most of users use name and category to search their POT. They don't know exact name in POI DB provided by Maker. They use abbreviated or generalized name as key word for searching POI. Because of these reasons, the hit ratio has been very low. In this paper, We suggest a extra DB_construction system for raising the hit ratio. It generates allomorphes DB link to the POI name in original DB. We classified the POI names in original DB into seven types of allomorph by analyzing the gathered patterns from the POI DB which has over 650,000 entries. For auto_generating the allomorphes, we made 577 rules based on the classified types. And we generated the allomorphes manually for the entries which are difficult to make the rule and has low frequency The generated allomorphes account for 35.8% of all original DB. The hit ratio is 89% under suggested system.

Automatic Construction of Alternative Word Candidates to Improve Patent Information Search Quality (특허 정보 검색 품질 향상을 위한 대체어 후보 자동 생성 방법)

  • Baik, Jong-Bum;Kim, Seong-Min;Lee, Soo-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.10
    • /
    • pp.861-873
    • /
    • 2009
  • There are many reasons that fail to get appropriate information in information retrieval. Allomorph is one of the reasons for search failure due to keyword mismatch. This research proposes a method to construct alternative word candidates automatically in order to minimize search failure due to keyword mismatch. Assuming that two words have similar meaning if they have similar co-occurrence words, the proposed method uses the concept of concentration, association word set, cosine similarity between association word sets and a filtering technique using confidence. Performance of the proposed method is evaluated using a manually extracted alternative list. Evaluation results show that the proposed method outperforms the context window overlapping in precision and recall.

A Study on Some Forms that Originated from the Dependent Noun "것" [kət] (의존 명사 '것'으로부터 도출된 몇몇 형식에 대한 고찰)

  • Lee, Eun-Sup
    • Cross-Cultural Studies
    • /
    • v.41
    • /
    • pp.245-273
    • /
    • 2015
  • The purpose of this paper is to investigate the characteristics of some forms, "거" [kə], "게" [ke], and "걸" [kəl], which originated from a dependent noun "것" [kət]. Nowadays, some studies have argued that these forms are allomorphs of "것" [kət]. However, they are not allomorphs because they do not show a complementary distribution with "것" [kət]. Moreover, we should not deal with "게" [ke] and "걸" [kəl] as at the same level of "것" [kət] and "거" [kə] because they respectively consist of "거" [kə], and the subjective case marker "이" [i] or the accusative case marker "ㄹ" [l]. In other words, they function as an element of a sentence. Therefore, just the "거" [kə] and "걸" [kəl] remain to be argued about concerning variation among them. Especially, the "거" [kə] is almost freely alternated with "것" [kət], whereas even though "걸" [kəl], which is not part of KP (N + case marker), is very restricted so as to appear to be from "거" [kə]. Of course, the restriction they show cannot be under the condition that corresponds to the conception of the alternation. In conclusion, only the "거" [kə] is just an optional variation morph of "것" [kət], whereas "걸" [kəl] is an optional variation morph of "거" [kə]. There is no allomorph of "것" [kət] in the forms originated from itself.

A Rule-Based Analysis from Raw Korean Text to Morphologically Annotated Corpora

  • Lee, Ki-Yong;Markus Schulze
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.105-128
    • /
    • 2002
  • Morphologically annotated corpora are the basis for many tasks of computational linguistics. Most current approaches use statistically driven methods of morphological analysis, that provide just POS-tags. While this is sufficient for some applications, a rule-based full morphological analysis also yielding lemmatization and segmentation is needed for many others. This work thus aims at 〔1〕 introducing a rule-based Korean morphological analyzer called Kormoran based on the principle of linearity that prohibits any combination of left-to-right or right-to-left analysis or backtracking and then at 〔2〕 showing how it on be used as a POS-tagger by adopting an ordinary technique of preprocessing and also by filtering out irrelevant morpho-syntactic information in analyzed feature structures. It is shown that, besides providing a basis for subsequent syntactic or semantic processing, full morphological analyzers like Kormoran have the greater power of resolving ambiguities than simple POS-taggers. The focus of our present analysis is on Korean text.

  • PDF

Semantic Alternation of Korean Case Markers '에e' and '에게ege', and '에서eseo' and '에게서 egeseo'

  • Kim, Jungnam;Shim, Yanghee
    • Cross-Cultural Studies
    • /
    • v.36
    • /
    • pp.271-291
    • /
    • 2014
  • In this paper, we maintain that case makers '에e' and '에게ege', and '에서eseo' and '에게서egeseo' are not two separate morphemes but are simply allomorphs of the same morphemes respectively. When '에e' and '에게ege' are used as a dative marker, they show exactly the same semantic function and are in complementary distribution in relation to the semantic features of their preceding noun; that is, if the preceding noun is an animate noun, '에게ege' is used and '에e' is used if not. Also, '에게서egeseo' and '에서eseo' as ablative and locative case makers show exactly the same semantic function and show complementary distribution depending on whether the preceding noun is animate or non-animate. Therefore, we assume that these markers are semantically conditioned allomorphs.

Improve Performance of Phrase-based Statistical Machine Translation through Standardizing Korean Allomorph (한국어의 이형태 표준화를 통한 구 기반 통계적 기계 번역 성능 향상)

  • Lee, Won-Kee;Kim, Young-Gil;Lee, Eui-Hyun;Kwon, Hong-Seok;Jo, Seung-U;Cho, Hyung-Mi;Lee, Jong-Hyeok
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.285-290
    • /
    • 2016
  • 한국어는 형태론적으로 굴절어에 속하는 언어로서, 어휘의 형태가 문장 속에서 문법적인 기능을 하게 되고, 형태론적으로 풍부한 언어라는 특징 때문에 조사나 어미와 같은 기능어들이 다양하게 내용어들과 결합한다. 이와 같은 특징들은 한국어를 대상으로 하는 구 기반 통계적 기계번역 시스템에서 데이터 부족문제(Data Sparseness problem)를 더욱 크게 부각시킨다. 하지만, 한국어의 몇몇 조사와 어미는 함께 결합되는 내용어에 따라 의미는 같지만 두 가지의 형태를 가지는 이형태로 존재한다. 따라서 본 논문에서 이러한 이형태들을 하나로 표준화하여 데이터부족 문제를 완화하고, 베트남-한국어 통계적 기계 번역에서 성능이 개선됨을 보였다.

  • PDF

Improve Performance of Phrase-based Statistical Machine Translation through Standardizing Korean Allomorph (한국어의 이형태 표준화를 통한 구 기반 통계적 기계 번역 성능 향상)

  • Lee, Won-Kee;Kim, Young-Gil;Lee, Eui-Hyun;Kwon, Hong-Seok;Jo, Seung-U;Cho, Hyung-Mi;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.285-290
    • /
    • 2016
  • 한국어는 형태론적으로 굴절어에 속하는 언어로서, 어휘의 형태가 문장 속에서 문법적인 기능을 하게 되고, 형태론적으로 풍부한 언어라는 특징 때문에 조사나 어미와 같은 기능어들이 다양하게 내용어들과 결합한다. 이와 같은 특징들은 한국어를 대상으로 하는 구 기반 통계적 기계번역 시스템에서 데이터 부족 문제(Data Sparseness problem)를 더욱 크게 부각시킨다. 하지만, 한국어의 몇몇 조사와 어미는 함께 결합되는 내용어에 따라 의미는 같지만 두 가지의 형태를 가지는 이형태로 존재한다. 따라서 본 논문에서 이러한 이형태들을 하나로 표준화하여 데이터부족 문제를 완화하고, 베트남-한국어 통계적 기계 번역에서 성능이 개선됨을 보였다.

  • PDF