• Title/Summary/Keyword: lexical retrieval

Search Result 27, Processing Time 0.023 seconds

An Efficient Index Term Extraction Method in IR using Lexical Chains (정보검색에서 어휘체인을 이용한 효과적인 색인어 추출 방안)

  • Kang, Bo-Yeong;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.8
    • /
    • pp.584-594
    • /
    • 2002
  • In information retrieval or digital library, one of the most important factors is to find out the exact information which users need. In this paper, we present an efficient index term extraction method which makes it possible to guess the content of documents and get the information more exactly. To find out index terms in a document, we use lexical chains. Before generating lexical chains, we roughly disambiguate the senses of nouns in a document using specific concept, called semantic window. Semantic window is that we look ahead semantic relations of peripheral nouns and disambiguate the senses of nouns. After generating lexical chains with sense-disambiguated nouns, we find out strong chains by some metrics and extract index terms from a few strong chains. We evaluated our system, using results of a key phrase extraction system, KEA. This system works in general domains of documents Including Information Retrieval and Digital Library.

Implementation of Very Large Hangul Text Retrieval Engine HMG (대용량 한글 텍스트 검색 엔진 HMG의 구현)

  • 박미란;나연묵
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.2
    • /
    • pp.162-172
    • /
    • 1998
  • In this paper, we implement a gigabyte Hangul text retrieval engine HMG(Hangul MG) which is based on the English text retrieval engine MG(Managing Gigabytes) and the Hangul lexical analyzer HAM(Hangul Analysis Module). To support Hangul information, we use the KSC 5601 code in the database construction and query processing stages. The lexical analyzer, parser, and index construction module of the MG system are modified to support Hangul information. To show the usefulness of HMG system, we implemented a NOD(Novel On Demand) system supporting the retrieval of Hangul novels on the WWW. The proposed system HMG can be utilized in the construction of massive full-text information retrieval systems supporting Hangul.

  • PDF

The Relationship between Lexical Retrieval and Coverbal Gestures (어휘인출과 구어동반 제스처의 관계)

  • Ha, Ji-Wan;Sim, Hyun-Sub
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.2
    • /
    • pp.123-143
    • /
    • 2011
  • At what point in the process of speech production are gestures involved? According to the Lexical Retrieval Hypothesis, gestures are involved in the lexicalization in the formulating stage. According to the Information Packaging Hypothesis, gestures are involved in the conceptual planning of massages in the conceptualizing stage. We investigated these hypotheses, using the game situation in a TV program that induced the players to involve in both lexicalization and conceptualization simultaneously. The transcription of the verbal utterances was augmented with all arm and hand gestures produced by the players. Coverbal gestures were classified into two types of gestures: lexical gestures and motor gestures. As a result, concrete words elicited lexical gestures significantly more frequently than abstract words, and abstract words elicited motor gestures significantly more frequently than concrete words. The difficulty of conceptualization in concrete words was significantly correlated with the amount of lexical gestures. However, the amount of words and the word frequency were not correlated with the amount of both gestures. This result supports the Information Packaging Hypothesis. Most of all, the importance of motor gestures was inferred from the result that abstract words elicited motor gestures more frequently rather than concrete words. Motor gestures, which have been considered as unrelated to verbal production, were excluded from analysis in many gestural studies. This study revealed motor gestures seemed to be connected to the abstract conceptualization.

  • PDF

Extraction of Thematic Roles from Dictionary Definitions

  • Mc-Hale, Michael-L.;Myaeng, Sung-H.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 1996.02a
    • /
    • pp.137-146
    • /
    • 1996
  • Our research goal has been the development of a domain independent natural language processing (NLP) system suitable for information retrieval. As part of that research, we have investigated ways to automatically extend the semantics of a lexicon derived from machine-readable lexical sources. This paper details the extraction of thematic roles derived from lexical patterns in a machine-readable dictionary.

  • PDF

English Floating Quantifiers and Lexical specification of Quantifier Retrieval

  • Yoo, Eun-Jung
    • Language and Information
    • /
    • v.5 no.1
    • /
    • pp.1-15
    • /
    • 2001
  • Floating quantifiers(FQs) in English exhibit both universal and language specific proper- ties This paper discusses how such syntactic and semantic characteristics can be explained in terms of a constraint-based, lexical approach to the floating quanti- fer construction within the framework of Head-Driven Phrase Structure Grammar(HPSG). Based on the assumption and FQs are base-generated VP modifiers, this paper proposes and account in which the semantic contribution of FQs consists of a "lexically retrieved" universal quantifier taking scope over the VP meaning.P meaning.

  • PDF

Korean Lexical Disambiguation Based on Statistical Information (통계정보에 기반을 둔 한국어 어휘중의성해소)

  • 박하규;김영택
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.2
    • /
    • pp.265-275
    • /
    • 1994
  • Lexical disambiguation is one of the most basic areas in natural language processing such as speech recognition/synthesis, information retrieval, corpus tagging/ etc. This paper describes a Korean lexical disambiguation mechanism where the disambigution is perfoemed on the basis of the statistical information collected from corpora. In this mechanism, the token tags corresponding to the results of the morphological analysis are used instead of part of speech tags for the purpose of detail disambiguation. The lexical selection function proposed shows considerably high accuracy, since the lexical characteristics of Korean such as concordance of endings or postpositions are well reflected in it. Two disambiguation methods, a unique selection method and a multiple selection method, are provided so that they can be properly according to the application areas.

  • PDF

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

A Study on Semantic Based Indexing and Fuzzy Relevance Model (의미기반 인덱스 추출과 퍼지검색 모델에 관한 연구)

  • Kang, Bo-Yeong;Kim, Dae-Won;Gu, Sang-Ok;Lee, Sang-Jo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.238-240
    • /
    • 2002
  • If there is an Information Retrieval system which comprehends the semantic content of documents and knows the preference of users. the system can search the information better on the Internet, or improve the IR performance. Therefore we propose the IR model which combines semantic based indexing and fuzzy relevance model. In addition to the statistical approach, we chose the semantic approach in indexing, lexical chains, because we assume it would improve the performance of the index term extraction. Furthermore, we combined the semantic based indexing with the fuzzy model, which finds out the exact relevance of the user preference and index terms. The proposed system works as follows: First, the presented system indexes documents by the efficient index term extraction method using lexical chains. And then, if a user tends to retrieve the information from the indexed document collection, the extended IR model calculates and ranks the relevance of user query. user preference and index terms by some metrics. When we experimented each module, semantic based indexing and extended fuzzy model. it gave noticeable results. The combination of these modules is expected to improve the information retrieval performance.

  • PDF

Vocabulary Recognition Retrieval Optimized System using MLHF Model (MLHF 모델을 적용한 어휘 인식 탐색 최적화 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.217-223
    • /
    • 2009
  • Vocabulary recognition system of Mobile terminal is executed statistical method for vocabulary recognition and used statistical grammar recognition system using N-gram. If limit arithmetic processing capacity in memory of vocabulary to grow then vocabulary recognition algorithm complicated and need a large scale search space and many processing time on account of impossible to process. This study suggest vocabulary recognition optimize using MLHF System. MLHF separate acoustic search and lexical search system using FLaVoR. Acoustic search feature vector of speech signal extract using HMM, lexical search recognition execution using Levenshtein distance algorithm. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%, represent recognition speed of 1.61 second.

Parallel Information Retrieval with Query Expansion (질의 확장을 이용한 병렬 정보 검색)

  • 정유진
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.04b
    • /
    • pp.103-105
    • /
    • 2002
  • 이 논문에서는, PC 클러스터 환경에서 질의 확장을 사용하는 정보 검색 시스템 (IR)을 설계하고 구현한 내용을 기술한다. 이 정도 검색 시스템은 문서 집합을 저장하고, 문서 집합은 역색인 파인 (IIF)로 색인되고, 랭킹 방법으로 벡터 모델을 사실하며, 질의 확장 방법으로 코사인 유사도를 사용한다. 질의 확장이란 사용자가 준 원래의 질의에 연관된 단어를 추가하여 검색 효율을 향상시키는 것이다. 여기서 제안하는 병렬 정보 검색 시스템에서는 역색인 과일은 여러 개로 분활되는데 lexical 분할 방법과 greedy 분할 방법을 사용한다. 사용자의 질의가 들어오면 질의확장을 하여 여러 개의 단어로 이루어진 확장된 질의가 만들어 지는데 이 확장된 질의를 구성하는 단어들은 각 단어와 연관된 IIF를 가지고 있는 노드에 보내어져서 병렬로 처리된다. 실험을 통하여 병렬 IR 시스템의 성능이 질의 확장과 IIF의 두 가지 분한 방법에 의해 어떻게 영향을 받는지 보인다. 실험에는 표준 한국어 테스트 말뭉치인 EKSET과 KTSET을 사용하였다. 실험에 따르면 greedy 분활 방법이 lexical 분할 방법에 비해 20%정도의 성능 향상을 보였다.

  • PDF