• Title/Summary/Keyword: Automatic Information Retrieval Algorithm

Search Result 32, Processing Time 0.018 seconds

A Study of the Automatic Extraction of Hypernyms arid Hyponyms from the Corpus (코퍼스를 이용한 상하위어 추출 연구)

  • Pang, Chan-Seong;Lee, Hae-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.2
    • /
    • pp.143-161
    • /
    • 2008
  • The goal of this paper is to extract the hyponymy relation between words in the corpus. Adopting the basic algorithm of Hearst (1992), I propose a method of pattern-based extraction of semantic relations from the corpus. To this end, I set up a list of hypernym-hyponym pairs from Sejong Electronic Dictionary. This list is supplemented with the superordinate-subordinate terms of CoroNet. Then, I extracted all the sentences from the corpus that include hypemym-hyponym pairs of the list. From these extracted sentences, I collected all the sentences that contain meaningful constructions that occur systematically in the corpus. As a result, we could obtain 21 generalized patterns. Using the PERL program, we collected sentences of each of the 21 patterns. 57% of the sentences are turned out to have hyponymy relation. The proposed method in this paper is simpler and more advanced than that in Cederberg and Widdows (2003), in that using a word net or an electronic dictionary is generally considered to be efficient for information retrieval. The patterns extracted by this method are helpful when we look fer appropriate documents during information retrieval, and they are used to expand the concept networks like ontologies or thesauruses. However, the word order of Korean is relatively free and it is difficult to capture various expressions of a fired pattern. In the future, we should investigate more semantic relations than hyponymy, so that we can extract various patterns from the corpus.

  • PDF

A News Video Mining based on Multi-modal Approach and Text Mining (멀티모달 방법론과 텍스트 마이닝 기반의 뉴스 비디오 마이닝)

  • Lee, Han-Sung;Im, Young-Hee;Yu, Jae-Hak;Oh, Seung-Geun;Park, Dai-Hee
    • Journal of KIISE:Databases
    • /
    • v.37 no.3
    • /
    • pp.127-136
    • /
    • 2010
  • With rapid growth of information and computer communication technologies, the numbers of digital documents including multimedia data have been recently exploded. In particular, news video database and news video mining have became the subject of extensive research, to develop effective and efficient tools for manipulation and analysis of news videos, because of their information richness. However, many research focus on browsing, retrieval and summarization of news videos. Up to date, it is a relatively early state to discover and to analyse the plentiful latent semantic knowledge from news videos. In this paper, we propose the news video mining system based on multi-modal approach and text mining, which uses the visual-textual information of news video clips and their scripts. The proposed system systematically constructs a taxonomy of news video stories in automatic manner with hierarchical clustering algorithm which is one of text mining methods. Then, it multilaterally analyzes the topics of news video stories by means of time-cluster trend graph, weighted cluster growth index, and network analysis. To clarify the validity of our approach, we analyzed the news videos on "The Second Summit of South and North Korea in 2007".