• Title/Summary/Keyword: Korean thesaurus

Search Result 224, Processing Time 0.026 seconds

Two-Level Clausal Segmentation using Sense Information (의미 정보를 이용한 이단계 단문분할)

  • Park, Hyun-Jae;Woo, Yo-Seop
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.9
    • /
    • pp.2876-2884
    • /
    • 2000
  • Clausal segmentation is the method that parses Korean sentences by segmenting one long sentence into several phrases according to the predicates. So far most of researches could be useful for literary sentences, but long sentences increase complexities of the syntax analysis. Thus this paper proposed Two-Level Clausal Segmentation using sense information which was designed and implemented to solve this problem. Analysis of clausal segmentation and understanding of word senses can reduce syntactic and semantic ambiguity. Clausal segmentation using Sense Information is necessary because there are structural ambiguity of sentences and a frequent abbreviation of auxiliary word in common sentences. Two-Level Clausal Segmentation System(TLCSS) consists of Complement Selection Process(CSP) and Noncomplement Expansion Process(NEP). CSP matches sentence elements to subcategorization dictionary and noun thesaurus. As a result of this step, we can find the complement and subcategorization pattern. Secondly, NEP is the method that uses syntactic property and the others methods for noncomplement increase of growth. As a result of this step, we acquire segmented sentences. We present a technique to estimate the precision of Two-Level Clausal Segmentation System, and shows a result of Clausal Segmentation with 25,000 manually sense tagged corpus constructed by ETRl-KONAN group. An Two-Level Clausal Segmentation System shows clausal segmentation precision of 91.8%.

  • PDF

An Efficient Index Structure for Semantic-based XML Keyword Search (의미 기반의 XML키워드 검색을 위한 효율적인 인덱스 구조)

  • Lee, Hyung-Dong;Kim, Sung-Jin;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.513-525
    • /
    • 2006
  • Search results of XML keyword search are defined generally as the most specific elements containing all query keywords in the literature. The labels of XML elements and semantic information such as ontology, conceptual model, thesaurus, and so on, are used to improve the preciseness of the search results. This paper presents a hierarchical index for an efficient XML keyword query processing on the condition that returnable search concepts are defined and users' query concepts can be interpreted with the help of the semantic information. The hierarchical index separately stores the XML elements containing a keyword on the basis of the hierarchical relations of the concepts that the XML elements belong to, and makes it possible to obtain least common ancestors, which are candidates for the search results, with selectively reading the elements belonging to the concepts relevant to query concepts and without considering all the combinations of the elements having been read. This paper deals with how to organize the hierarchical index and how to process XML keyword queries with the index. In our experiment with the DBLP XML document and the XML documents in the INEX2003 test set, the hierarchical index worked well.

Definition and Extraction of Causal Relations for Question-Answering on Fault-Diagnosis of Electronic Devices (전자장비 고장진단 질의응답을 위한 인과관계 정의 및 추출)

  • Lee, Sheen-Mok;Shin, Ji-Ae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.335-346
    • /
    • 2008
  • Causal relations in ontology should be defined based on the inference types necessary to solve problems specific to application as well as domain. In this paper, we present a model to define and extract causal relations for application ontology for Question-Answering (QA) on fault-diagnosis of electronic devices. Causal categories are defined by analyzing generic patterns of QA application; the relations between concepts in the corpus belonging to the causal categories are defined as causal relations. Instances of casual relations are extracted using lexical patterns in the concept definitions of domain, and extended incrementally with information from thesaurus. On the evaluation by domain specialists, our model shows precision of 92.3% in classification of relations and precision of 80.7% in identifying causal relations at the extraction phase.

Query Processing Model Using Two-level Fuzzy Knowledge Base (2단계 퍼지 지식베이스를 이용한 질의 처리 모델)

  • Lee, Ki-Young;Kim, Young-Un
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.4 s.36
    • /
    • pp.1-16
    • /
    • 2005
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. Accordingly, this study suggests the re-ranking retrieval model which reflects the content based similarity between user's inquiry terms and index words by grasping the document knowledge structure. In order to accomplish this, the former constructs a thesaurus and similarity relation matrix to provide the subject analysis mechanism and the latter propose the algorithm which establishes a search model such as query expansion in order to analyze the user's demands. Therefore, the algorithm that this study suggests as retrieval utilizing the information structure of a retrieval system can be content-based retrieval mechanism to establish a 2-step search model for the preservation of recall and improvement of accuracy which was a weak point of the previous fuzzy retrieval model.

  • PDF

A Study on the Effective Database Integration Methodology - The Identification of Name Conflict - (데이터베이스의 효과적인 통합방안에 관한 연구 - Name Conflict의 식별을 중심으로-)

  • Lee Hong-Girl;Higa Kunihiko;Fujikawa Takayuki
    • Journal of Navigation and Port Research
    • /
    • v.29 no.5 s.101
    • /
    • pp.457-464
    • /
    • 2005
  • Database integration has been recognized as a critical issue for effective logistics service in logistics environment. However, research related to effective methodology for this have been little studied, and also, prominent achievements have yet to be suggested. The aim of this paper is to present a quantitative methodology for the identification of conflict that is a representative problem on database integration. To achieve this aim, we suggested a quantitative methodology that can efficiently fine troubles such as name conflicts when schema integration, based on the level of semantic similarity between attributes and entities. And, in order to measure these semantic similarities, we used a thesaurus dictionary that proposed previous research. Finally, we presented effectiveness of the proposed methodology through some typical examples.

Determining the Specificity of Terms using Compositional and Contextual Information (구성정보와 문맥정보를 이용한 전문용어의 전문성 측정 방법)

  • Ryu Pum-Mo;Bae Sun-Mee;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.7
    • /
    • pp.636-645
    • /
    • 2006
  • A tenn with more domain specific information has higher level of term specificity. We propose new specificity calculation methods of terms based on information theoretic measures using compositional and contextual information. Specificity of terms is a kind of necessary conditions in tenn hierarchy construction task. The methods use based on compositional and contextual information of terms. The compositional information includes frequency, $tf{\cdot}idf$, bigram and internal structure of the terms. The contextual information of a tenn includes the probabilistic distribution of modifiers of terms. The proposed methods can be applied to other domains without extra procedures. Experiments showed very promising result with the precision of 82.0% when applied to the terms in MeSH thesaurus.

A Study on metadata structuralization for context representation of women's oral life history (여성구술생애기록물 맥락 표현을 위한 메타데이터 구조화에 관한 연구)

  • Lee, Jung Yeon;LEe, Jung Yeoun;Ryoo, Jong Duk;Lee, Jong Yoon
    • The Korean Journal of Archival Studies
    • /
    • no.30
    • /
    • pp.57-88
    • /
    • 2011
  • Oral history is the work to make the record of the verbal content recreated by the memories of the survivors. Oral history recording is accomplished through the collaboration of the interviewee, the interviewer, the cameraman, the recorder, the transcriber and etc. Therefore, it is important for the context at the time of the production to be expressed. So planning for the collection of oral records, the collection of oral records, and their preservation and maintenance should be managed systematically. This study, being started from this sense of problem, designed conceptual model of metadata to well reflect the contextual characteristics of the oral records of the women life of among the oral records and extracted the elements through this. The whole process of records management including from planning, production, preservation, management, and leading to use, related to the oral records of the women life, was classified into a hierarchy. It also proposed the system which can express the characteristics of the 'gender' through authority records and subject thesaurus.

Automatic Extraction of Opinion Words from Korean Product Reviews Using the k-Structure (k-Structure를 이용한 한국어 상품평 단어 자동 추출 방법)

  • Kang, Han-Hoon;Yoo, Seong-Joon;Han, Dong-Il
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.6
    • /
    • pp.470-479
    • /
    • 2010
  • In relation to the extraction of opinion words, it may be difficult to directly apply most of the methods suggested in existing English studies to the Korean language. Additionally, the manual method suggested by studies in Korea poses a problem with the extraction of opinion words in that it takes a long time. In addition, English thesaurus-based extraction of Korean opinion words leaves a challenge to reconsider the deterioration of precision attributed to the one to one mismatching between Korean and English words. Studies based on Korean phrase analyzers may potentially fail due to the fact that they select opinion words with a low level of frequency. Therefore, this study will suggest the k-Structure (k=5 or 8) method, which may possibly improve the precision while mutually complementing existing studies in Korea, in automatically extracting opinion words from a simple sentence in a given Korean product review. A simple sentence is defined to be composed of at least 3 words, i.e., a sentence including an opinion word in ${\pm}2$ distance from the attribute name (e.g., the 'battery' of a camera) of a evaluated product (e.g., a 'camera'). In the performance experiment, the precision of those opinion words for 8 previously given attribute names were automatically extracted and estimated for 1,868 product reviews collected from major domestic shopping malls, by using k-Structure. The results showed that k=5 led to a recall of 79.0% and a precision of 87.0%; while k=8 led to a recall of 92.35% and a precision of 89.3%. Also, a test was conducted using PMI-IR (Pointwise Mutual Information - Information Retrieval) out of those methods suggested in English studies, which resulted in a recall of 55% and a precision of 57%.

Generalization of error decision rules in a grammar checker using Korean WordNet, KorLex (명사 어휘의미망을 활용한 문법 검사기의 문맥 오류 결정 규칙 일반화)

  • So, Gil-Ja;Lee, Seung-Hee;Kwon, Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.405-414
    • /
    • 2011
  • Korean grammar checkers typically detect context-dependent errors by employing heuristic rules that are manually formulated by a language expert. These rules are appended each time a new error pattern is detected. However, such grammar checkers are not consistent. In order to resolve this shortcoming, we propose new method for generalizing error decision rules to detect the above errors. For this purpose, we use an existing thesaurus KorLex, which is the Korean version of Princeton WordNet. KorLex has hierarchical word senses for nouns, but does not contain any information about the relationships between cases in a sentence. Through the Tree Cut Model and the MDL(minimum description length) model based on information theory, we extract noun classes from KorLex and generalize error decision rules from these noun classes. In order to verify the accuracy of the new method in an experiment, we extracted nouns used as an object of the four predicates usually confused from a large corpus, and subsequently extracted noun classes from these nouns. We found that the number of error decision rules generalized from these noun classes has decreased to about 64.8%. In conclusion, the precision of our grammar checker exceeds that of conventional ones by 6.2%.

A Curricular Study on AI & ES in Library and Information Science (문헌정보학에서의 인공지능과 전문가시스템 교육과정 연구)

  • Koo Bon-Young;Park Mi-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.2
    • /
    • pp.211-232
    • /
    • 1998
  • It is the purpose of this study to specify contents of Library and Information Science to train information professional to meet environment change of technology and system. Among them. recognizing necessity of present Artificial Intelligence and Export System (AI and ES) required by changing environment of latest Information technology, it is also the purpose of this work to figure out fundamental data and the way of solution how to introduce what contents out of AI and ES to Library and Information Science. The briefed results are as follows. 1. Due to rapid change of high Information technology and computer application it is the most important essential points, In order of Importance, in finding available network source, In indexing on-line data base, in analysing and design information system. and in computer application ability. 2. In contents of AI and ES, most Important training portion for Library and Information Science are : data base treating, thesaurus, natural language processing. and knowledge representation. 3. Library and information science professors recognize It necessary for bigger number of Library and Information Science students to be educated artificial intelligence and expert system. 4. During forthcoming age it shows more important reorganization that artificial intelligence and expert system improves information professional in reference service, cataloging, classification, information retrieval, and documentation delivery 5. According to library and information science professors more important reorganization on the subject of AI and ES, the curricular on AI and ES is, forthcoming, to be Introduced to curricular on library and information science in the nation, In order of importance, (see 1. above).

  • PDF