• Title/Summary/Keyword: 표제어복원

Search Result 5, Processing Time 0.017 seconds

Restoring an Elided title for Encyclopedia QA System (백과사전 질의응답을 위한 생략된 표제어 복원에 관한 연구)

  • Lim Soojong;Lee Changi;Jang Myoung-Gil
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.541-543
    • /
    • 2005
  • 백과사전에서 정답을 찾기 위해 문장의 구조를 분석하는데 한국어 백과사전은 표제어에 대한 정보를 문장에서 생략한다. 그러나 표제어는 문장에서 주어나 목적어 역할을 하기 때문에 생략된 정보를 복원하지 못 하면 질의에 대한 정답을 제시할 수 없다. 생략된 표제어에 대한 정보를 복원하기 위해서 본 연구에서는 표제어의 의미범주 정보, 격틀, Maximum Entropy 모델을 이용하여 표제어 주어, 표제어 목적어 복원, 미복원 3가지로 인식한다. 표제어 의미범주는 의미 범주에 대해 일정 수준의 복원 성향을 보일 경우 Maximum Entropy 정보를 창조하였고 격틀을 이용하여 복원 여부를 결정한다. 만약 표제어의 의미범주 정보, 격틀을 이용하여도 복원 여부를 결정하지 못할 경우에는 Maximum Entropy 모델에 기반한 통계 기법을 적용하여 복원 여부를 결정한다. 그리고 각각 방법의 단점을 보완하기 위해서 규칙에 해당하는 표제어 의미범주 정보와 격틀 정보에는 통계 모델인 ME 모델을 보완하여 사용한다.

  • PDF

Restoring Encyclopedia Title Words Using a Zero Anaphora Resolution Technique (무형대용어 해결 기술을 이용한 백과사전 표제어 복원)

  • Hwang, Min-Kook;Kim, Young-Tae;Ra, Dongyul;Lim, Soojong
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.65-69
    • /
    • 2014
  • 한국어 문장의 경우 문맥상 추론이 가능하다면 용언의 격이 생략되는 현상 즉 무형대용어 (zero anaphora) 현상이 흔히 발생한다. 무형대용어를 채울 수 있는 선행어 (명사구)를 찾는 문제는 대용어 해결 (anaphora resolution) 문제와 같은 성격의 문제이다. 이러한 생략현상은 백과사전이나 위키피디아 등 백과사전류 문서에서도 자주 발생한다. 특히 선행어로 표제어가 가능한 경우 무형대용어 현상이 빈번히 발생한다. 백과사전류 문서는 질의응답 (QA) 시스템의 정답 추출 정보원으로 많이 이용되는데 생략된 표제어의 복원이 없다면 유용한 정보를 제공하기 어렵다. 본 논문에서는 생략된 표제어 복원을 위해 무형대용어의 해결을 기반으로 하는 시스템을 제안한다.

  • PDF

Implementation of Compressing a Korean Lexicon (한국어 사전의 압축 구현)

  • 임한규;박상호
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 1997.11a
    • /
    • pp.395-403
    • /
    • 1997
  • 한국어 처리의 기본이 되는 형태소 분석을 위한 사전의 효율적인 구성을 위해 각 표제어의 반복 음절수에 의한 방식으로 이를 압축하고 복원하는 알고리즘을 보였다. 사전의 크기에 있어서 25% 줄일 수 있었으며 표제어를 검색할 때 횟수를 36 % 줄일 수 있었다. 아울러 빠른 검색을 위한 이진 사전을 오프셋에 의해 구성하였다.

  • PDF

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

Formative Characteristics of Water Space and Scenic Spot of Baegun-dong Wonlim in Gangjin Aaun Village (강진 안운마을 백운동원림의 승경과 수공간의 조영 특성)

  • Park, Yool-Jin;Kim, Hong-Gyun;Rho, Jae-Hyun;Kim, Hwa-Ok;Goh, Yea-Bin
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.29 no.2
    • /
    • pp.99-107
    • /
    • 2011
  • This study is aiming to acquire data for enhancing genuineness of restoration through reviewing external and internal scenery of Baegun-dong Wonlim such as Baegun-dong 8 Objects of Scenery(pines, bamboos, chrysanthemum, orchid, maehwa blossom, magnolia, fiddle and crane) and 12 Scenic Spots and reviewing internal and external scenery of Baegun-dong Wonlim such as views and scenery for plantation. For Baegun-dong 8 Young which sang songs about scenic spots of Baegun-dong Wonlim it seemed that its head words were formed through borrowing rhyming words from caption of Baegun-dong Yuseogi(白雲洞幽棲記). Accordingly it seemed these scenery secured its status of Wonlim from the beginning. Particularly the words of fiddle and crane in 琴棋書畵(Geumgisuhha) implied that playing Komungo and brushwriting were firmly rooted as romantic pursuits of classical scholars of that time. In consideration of distance upto Okpanbong which is one of 12 scenic spots of Baegun-dong radius of outer circumference is estimated to be around 1.6km. From Okgpanbong, the epicenter, Sandagyeong, Baegokmae, Hongokpok and Pungdan etc. correspond to transitional space. And inner scenery was formed with hub of thatched cottages and bowers surrounded with chrysanthemums, poenies, rhododendron, Phyllostachys bambusoides, pines and upper and lower water paths. Thus it seemed there was scenic structure of centrifugal nature as well as of multiplicity. Forms of majority of water paths with residual structure found in the country have streamlined forms on the other hand Baegun-dong water paths have straight line which almost dominate inner gardens in terms of scale and forms thus revealing its extraordinary idea and design. In order to promote genuine restoration of Baegun-dong Wonlim it will be necessary to have consideration from standpoint of managing perspective to assure presentation of gradual scenery with elements of scenic objects for outer view among 12 Scenic Spots.