• Title/Summary/Keyword: paragraph retrieval

Search Result 7, Processing Time 0.019 seconds

A Study of Retrieval Model Providing Relevant Sentences in Storytelling on Semantic Web (시맨틱 웹 환경에서 적합한 문장을 제공하는 이야기 쓰기 도우미에 관한 연구)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.4
    • /
    • pp.7-34
    • /
    • 2009
  • Structures of stories, paragraphs, and sentences and inferences applied to indexing and searching were studied to construct the full-text and sentence retrieval system for storytelling. The system designed the database of stories, paragraphs, and sentences and the knowledge-base of inference rules to aid to write the story. The Knowledge-base comprised the files of story frames, paragraph scripts, and sentence logics made by mark-up languages like SWRL etc. able to operate in semantic web. It is necessary to establish more precise indexing language represented the sentences and to create a mark-up languages able to construct more accurate inference rules.

A Construction of Indexing System for Sentence Retrieval (문장 검색을 위한 색인시스템 구축 : 초 .중등 학생의 한국어 및 영어 문장을 중심으로)

  • 이태영
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.1
    • /
    • pp.145-163
    • /
    • 2003
  • An indexing language were studied to construct the sentences and paragraphs providing system aided to write a Korean or English composition. The indexing language includes the index terms like noun, predicate, and adverb. and also various index symbols. The subject name and the keyword Included the symbols, which Indicate the connectives between clauses in a sentence, is used as the access point. The search results show this system will be effective with large database and developed retrieval methods.

Construction of Full-text Database by SGML (문서기술언어 SGML에 의한 전문 데이터베이스의 구축)

  • Kim, Chang-Bong
    • Journal of Information Management
    • /
    • v.27 no.4
    • /
    • pp.35-56
    • /
    • 1996
  • SGML(Standard Generalized Markup Language) and its application to full-text database including a table, a figure and a picture are explained. A structure of SGML based full-text database Is defined by DTD(document type definition) written in SGML, and full-text itself is described with generalized markup depending on DTD. This article explains how to represent a document structure : a hierarchical structure like a chapter, a section, or a paragraph, or non-hierarchical(referencial) structure like a note, a table, a figure or a picture. Merits of SGML, electronic publishing, a retrieval system or hypertext and SGML tools are also described.

  • PDF

Paragraph Retrieval Model for Machine Reading Comprehension using IN-OUT Vector of Word2Vec (Word2Vec의 IN-OUT Vector를 이용한 기계독해용 단락 검색 모델)

  • Kim, Sihyung;Park, Seongsik;Kim, Harksoo
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.326-329
    • /
    • 2019
  • 기계독해를 실용화하기 위해 단락을 검색하는 검색 모델은 최근 기계독해 모델이 우수한 성능을 보임에 따라 그 필요성이 더 부각되고 있다. 그러나 기존 검색 모델은 질의와 단락의 어휘 일치도나 유사도만을 계산하므로, 기계독해에 필요한 질의 어휘의 문맥에 해당하는 단락 검색을 하지 못하는 문제가 있다. 본 논문에서는 이러한 문제를 해결하기 위해 Word2vec의 입력 단어열의 벡터에 해당하는 IN Weight Matrix와 출력 단어열의 벡터에 해당하는 OUT Weight Matrix를 사용한 단락 검색 모델을 제안한다. 제안 방법은 기존 검색 모델에 비해 정확도를 측정하는 Precision@k에서 좋은 성능을 보였다.

  • PDF

Relevant Image Retrieval of Korean Documents based on Sentence and Word Importance (문장 및 단어 중요도를 통한 한국어 문서 연관 이미지 검색)

  • Kim, Nam-Gyu;Kang, Shin-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.3
    • /
    • pp.43-48
    • /
    • 2019
  • While reading text-only documents and finding unknown words, readers will become the focus disturbed and not be able to understand the content of the documents. Because children have little experience, it is difficult to understand correctly if the description in context is unfamiliar or ambiguous. In this paper, in order to help understand the text and increase the interest of the readers, we analyze the texts of documents and select the contents that are considered important, and implement a system that displays the most relevant images automatically from the web and links the texts and the images together. The implementation of the system divides the article into paragraphs, analyzes the text, selects important sentences for each paragraph and the important words that best represent the meaning of the important sentences, searches for images related to the words on the web, and then links the images to each of the previous paragraphs. Experiments have shown how to select important sentences and how to select important words in the sentences. As a result of the experiment, we could get 60% performance by evaluating the accuracy of the relation between three selected images and corresponding important sentences.

A Study on the Structure and Content of Abstracts: Focused on Journal of the Korean Society for Information Management (초록의 구조와 내용에 관한 연구: 정보관리학회지를 중심으로)

  • Chang, Woo-Kwon
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.107-131
    • /
    • 2016
  • This study aims to identify the characteristics of abstracts by analyzing the status of abstracts published in 'Journal of the Korean Society for Information Management.' To this end, the study analyzed the components of and the types of abstracts. Target abstracts were those published in the journal from 1984 to 2015. The journal published 1,168 articles with indicative abstracts accounting for 96.6%, informative abstracts 3.4%, and abstracts written in both English and Korean 99.5%. As for research methods, case study through literature review was 52.8%, surveys 21.1%, and experimentation 26.1%. The percentage of abstracts consisting of one paragraph was 92.1%, more than two paragraphs were 7.9%, fewer than 5 sentences were 79%, and 6 sentences or more were 21%. The use of the first person was 90.5%. In terms of topic areas, library and information center management was 19.4%, information services 17.3%, information technology 16.2%, information retrieval 15.1%, and informetrics 9.6%, etc.

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.