• Title/Summary/Keyword: Full-text information

Search Result 274, Processing Time 0.031 seconds

Automatic Construction of Korean Unknown Word Dictionary using Occurrence Frequency in Web Documents (웹문서에서의 출현빈도를 이용한 한국어 미등록어 사전 자동 구축)

  • Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.27-33
    • /
    • 2008
  • In this paper, we propose a method of automatically constructing a dictionary by extracting unknown words from given eojeols in order to improve the performance of a Korean morphological analyzer. The proposed method is composed of a dictionary construction phase based on full text analysis and a dictionary construction phase based on web document frequency. The first phase recognizes unknown words from strings repeatedly occurred in a given full text while the second phase recognizes unknown words based on frequency of retrieving each string, once occurred in the text, from web documents. Experimental results show that the proposed method improves 32.39% recall by utilizing web document frequency compared with a previous method.

  • PDF

Implementation of One-Stop Service System on Domestic & Foreign Technology Information (국내외 기술정보의 연계 서비스 체제 구축)

  • Seo, Jin-Ny;Noh, Kyung-Ran
    • Journal of Information Management
    • /
    • v.32 no.1
    • /
    • pp.1-22
    • /
    • 2001
  • In traditional environment, user must search each journal OPAC, bibliographic DB, and full-text DB and E-Journal separately until user finds scientific and technology informations that he needs. The purpose of this study is to build one-click service system of Journals that supports integrating search. This system provides various functions, such as, journal browsing, journal search, article search, alert function, my library, document delivery service by integrating databases and electronic journals. Users search all information sources through journal OPAC and acquire journal full-text by single interface.

  • PDF

A Study on Patent Structure in Patent Full-text Retrieval (특허정보 전문검색을 위한 문헌구조화 연구)

  • 권영숙;이두영
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1999.08a
    • /
    • pp.29-32
    • /
    • 1999
  • 특허정보는 일반 과학기술정보와 다른 특성을 가지고 있어 정확성과 최신성이 절대적으로 필요하다. 이와 같은 특허정보의 특성을 고려하여 이용자의 정보요구를 충족시키고 효과적으로 검색할 수 있는 특허정보검색시스템 구축을 위한 기초자료로서 특허문헌구조를 고찰하였다.

  • PDF

Automatic Text Categorization Using Passage-based Weight Function and Passage Type (문단 단위 가중치 함수와 문단 타입을 이용한 문서 범주화)

  • Joo, Won-Kyun;Kim, Jin-Suk;Choi, Ki-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.6 s.102
    • /
    • pp.703-714
    • /
    • 2005
  • Researches in text categorization have been confined to whole-document-level classification, probably due to lacks of full-text test collections. However, full-length documents availably today in large quantities pose renewed interests in text classification. A document is usually written in an organized structure to present its main topic(s). This structure can be expressed as a sequence of sub-topic text blocks, or passages. In order to reflect the sub-topic structure of a document, we propose a new passage-level or passage-based text categorization model, which segments a test document into several Passages, assigns categories to each passage, and merges passage categories to document categories. Compared with traditional document-level categorization, two additional steps, passage splitting and category merging, are required in this model. By using four subsets of Routers text categorization test collection and a full-text test collection of which documents are varying from tens of kilobytes to hundreds, we evaluated the proposed model, especially the effectiveness of various passage types and the importance of passage location in category merging. Our results show simple windows are best for all test collections tested in these experiments. We also found that passages have different degrees of contribution to main topic(s), depending on their location in the test document.

An Analysis on the Operational state of Distance Universities' Electronic Libraries through the Life-long Education Law (평생교육법령하의 원격대학 전자도서관의 운영 실태 분석)

  • Lee Jong-Moon
    • Journal of Korean Library and Information Science Society
    • /
    • v.36 no.4
    • /
    • pp.99-113
    • /
    • 2005
  • The purpose of this research is to analyze the operational state of Distance universities' electronic libraries through the Lifelong Education Law, and to find out the related problems. The main investigational focus was on the operational methodologies of the libraries and the usage levels of the full-text service. The data were collected through accessing the URLs of 17 Distance universities authorized till 2005. The result is that every university is operating their libraries either on their own $(17.7\%)$ or by using the links to the external libraries $(82.4\%)$. However, only $(35.3\%)$ of the surveyed universities provide the full-text service available on the Internet. Thus, in order to establish the fourth generation Distance university based on the Internet and Web, it is urgently needed to improve the construction and operation standards of electronic libraries in the Lifelong Education Law.

  • PDF

A Study of the Behaviours in Searching Full-Text Databases- Subject Specialists vs. Professional Searchers - (전문데이터베이스의 탐색특성에 관한 연구 - 주제전문가와 탐색전문가 -)

  • Lee Eung-Bong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.30 no.2
    • /
    • pp.51-86
    • /
    • 1996
  • The primary purpose of this study is to verify the difference of behavioural characteristics between the subject specialists and professional searchers in searching full-text databases. The major findings and conclusions from this study are summarized as follows. Analyses of Search questions(the degree of understanding with search questions, the degree of difficulty in selecting terms, and the degree of expectation of search results), search processes(the number of search terms used, the number of Boolean operators and qualifiers used, the number of documents browsed and the search time(the connecting time, time to spend per one output document, time to spend per one relevant output document) and search results(the searching efficiency(the number of relevant documents, the ,recall ratio and the precision ratio), the search cost(the total search cost. the search cost per one output document and the search cost per one relevant output document) and the degree of satisfaction with search results) are significantly different between the subject specialists and professional searchers in searching full-text databases.

  • PDF

The Extraction of Effective Index Database from Voice Database and Information Retrieval (음성 데이터베이스로부터의 효율적인 색인데이터베이스 구축과 정보검색)

  • Park Mi-Sung
    • Journal of Korean Library and Information Science Society
    • /
    • v.35 no.3
    • /
    • pp.271-291
    • /
    • 2004
  • Such information services source like digital library has been asked information services of atypical multimedia database like image, voice, VOD/AOD. Examined in this study are suggestions such as word-phrase generator, syllable recoverer, morphological analyzer, corrector for voice processing. Suggested voice processing technique transform voice database into tort database, then extract index database from text database. On top of this, the study suggest a information retrieval model to use in extracted index database, voice full-text information retrieval.

  • PDF

Comparisons of Practical Performance for Constructing Compressed Suffix Arrays (압축된 써픽스 배열 구축의 실제적인 성능 비교)

  • Park, Chi-Seong;Kim, Min-Hwan;Lee, Suk-Hwan;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.5_6
    • /
    • pp.169-175
    • /
    • 2007
  • Suffix arrays, fundamental full-text index data structures, can be efficiently used where patterns are queried many times. Although many useful full-text index data structures have been proposed, their O(nlogn)-bit space consumption motivates researchers to develop more space-efficient ones. However, their space efficient versions such as the compressed suffix array and the FM-index have been developed; those can not reduce the practical working space because their constructions are based on the existing suffix array. Recently, two direct construction algorithms of compressed suffix arrays from the text without constructing the suffix array have been proposed. In this paper, we compare practical performance of these algorithms of compressed suffix arrays with that of various algorithms of suffix arrays by measuring the construction times, the peak memory usages during construction and the sizes of their final outputs.