• 제목/요약/키워드: Full-text information

Search Result 273, Processing Time 0.031 seconds

Construction of Full-Text Database and Implementation of Service Environment for Electronic Theses and Dissertations (학위논문 전문데이터베이스 구축 및 서비스환경 구현)

  • Lee, Kyi-Ho;Kim, Jin-Suk;Yoon, Wha-Muk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.41-49
    • /
    • 2000
  • Form the middle of 199os, most universities in Korea have requested their students to submit not only the original text books but also their Electronic Theses and Dissertations(ETD) for masters degree and doctorates degree. The ETD submitted by the students are usually developed by various kinds of word processors such as MS-Word, LaTex, and HWP. Since there is no standard format for ETD to merge various different formats yet, it is difficult to construct the integrated database that provides full-tex service. In this paper, we transform three different ETD formats into a unified one, construct a full-text database, and implement the full-text retrieval system for effective search in the Internet environment.

  • PDF

Users' Perception on Theses and Dissertation Services (학위논문 이용현황과 활성화 방안에 관한 연구)

  • Shin, Yu-Ri;Chung, Eun-Kyung
    • Journal of Information Management
    • /
    • v.40 no.1
    • /
    • pp.29-46
    • /
    • 2009
  • Theses and Dissertation(TD) have been considered one of valuable scholarly resources, while there have existed some limitations to collect, organize, and provide them. The purpose of this study is to investigate users' perception on six TD services from five institutions and to propose improvement strategies. Six TD services includes National Library, National Assembly Library, RISS, dCollections, NDSL, and Council of Theses and Dissertation Common Use. Based on the survey results from 151 users, the findings of this study identified that National Assembly Library and RISS were preferred by users. In addition, users preferred keyword, full text, department, abstract, table of contents, as well as title and author over other bibliographic information. More importantly, users needs were placed on whether specific TD services provide full text or not. In case that is not possible to provide full text, users have a preference for full text link information. As a result, in order to improve the TD services, service promotion activities, diverse access points, and full text provision are desirable.

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

  • Oh, So-Yeon;Kim, Ji-Hyeon;Kim, Seo-Jin;Nam, Hee-Jo;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.16 no.3
    • /
    • pp.75-77
    • /
    • 2018
  • Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.

Copyright issues in building a full-text DB (Full-text DB의 구축과 저작권 문제)

  • 이제환;황혜선
    • Journal of Korean Library and Information Science Society
    • /
    • v.26
    • /
    • pp.169-204
    • /
    • 1997
  • With a rapid digitalization of information media, the philosophy and principles of the traditional copyright laws have been widely challenged. This study explores how we could deal with the copyright issues in such a rapidly changing information environment. In details, this study discusses (1) the basic philosophy and principles of copyright law from both domestic and international perspectives and (2) how the philosophy and principles should be changed to adjust itself into the rapidly changing information environment. In addition, this study identifies the copyright-related problems which might be confronted when the building and use of a full-text DB is attempted. Finally suggested are the legal methods to resolve such problems.

  • PDF

Study on Improved Decryption Method of WeChat Messenger and Deleted Message Recovery Using SQLite Full Text Search Data (WeChat 메신저의 향상된 복호화 방안과 SQLite Full Text Search 데이터를 이용한 삭제된 메시지 복구에 관한 연구)

  • Hur, Uk;Park, Myungseo;Kim, Jongsung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.3
    • /
    • pp.405-415
    • /
    • 2020
  • With the increase in smartphone user, mobile forensics has become an essential element in modern digital forensic investigation. Mobile messenger data is very important data in mobile forensics because it can acquire information such as user's life pattern and mental state. In order to analyze messenger data, a decryption technique of an encrypted messenger data is required. Since most messengers provide a message deleting function, a technique for recovering deleted messages is required. WeChat Messenger, a messenger used by about 1 billion people around the world, uses IMEI (International Mobile Equipment Identity) information to encrypt data and provides message deletion function. In this paper, we propose a data decryption method in the absence of IMEI information and propose a method for recovering deleted messages using FTS (Full Text Search) database created for full-text search function of SQLite database.

Application of the 2-Poisson Model to Full-Text Information Retrieval System (2-포아송 모형의 전문검색시스템 응용에 관한 연구)

  • 문성빈
    • Journal of the Korean Society for information Management
    • /
    • v.16 no.3
    • /
    • pp.49-63
    • /
    • 1999
  • The purpose of this study is to investigate whether the terms in queries are distributed according to the 2-Poisson model in the documents represented by abstract/title or full-text. In this study, retrieval experiments using Binary independence and 2-Poisson independence model, which are based on the probabilistic theory, were conducted to see if the 2-Poisson distribution of the query terms has an influence on the retrieval effectiveness, particularly of full-text information retrieval system.

  • PDF

Design for Creating Full-Text Database of Korean Dissertation (대학도서관의 학위논문 전문DB구축방안)

  • 방준필
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.9 no.1
    • /
    • pp.39-52
    • /
    • 1998
  • The purpose of this study is to design the solution for creating full-text database of Korean dissertaion, After considering file formats for text based and image based database, Viewer, Search, Copy Right, Abstracts and Indexes, situation of Korea University Library, decided the principles of creating database. And suggested the design to produce the database for Korea University Library, that is easy to get file format conversion in case of the introducing new technology for the future.

  • PDF

A Hybrid Information Retrieval Model Using Metadata and Text (메타데이타와 텍스트 정보의 통합검색 모델)

  • Yoo, Jeong-Mok;Myaeng, Sung-Hyon;Kim, Sung-Soo;Lee, Mann-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.3
    • /
    • pp.232-243
    • /
    • 2007
  • Metadata IR model has high precision and low recall because the query in Metadata IR model is strict that is, the query can express user information need exactly, while Full-text IR model has low precision and high recall because the query in Full-text IR model is a kind of simple keyword query which expresses user information need roughly. If user can translate one's information need into structured query well, the retrieval result will be improved. However, it is little possible to make relevant query without understanding characteristics of metadata. Unfortunately, most users do not interested in metadata, then they cannot construct well-made structured query. Amount of information contained in metadata is less than text information. In this paper, we suggest hybrid IR model using metadata and text which can provide users with lots of relevant documents by retrieving from metadata field and text field complementarily.

Inverted Index based Modified Version of K-Means Algorithm for Text Clustering

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.2
    • /
    • pp.67-76
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.

Inverted Index based Modified Version of KNN for Text Categorization

  • Jo, Tae-Ho
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.17-26
    • /
    • 2008
  • This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.