• Title/Summary/Keyword: Full-text information

Search Result 274, Processing Time 0.025 seconds

XQuery Full-Text Search in RDBMS (관계형 데이터베이스를 이용한 XQuery 전문 검색)

  • Cheon, Yun-Woo;Hong, Dong-Kweon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11c
    • /
    • pp.1339-1342
    • /
    • 2003
  • XML이 인터넷상에서 디지털 정보를 표현하고 교환하기 위한 표준이 되어감에 따라 최근까지 XML을 저장하고 검색하기 위한 역인덱스 기법에 대한 연구가 활발히 진행되고 있다. 본 논문에서는 XML 전문 검색을 위한 새로운 역인덱스 구조를 제안한다. 기존에 연구된 역인덱스 기법을 통한 키워드 검색 기능을 더욱 보완하고 최근에 W3C에서 새로운 기능으로 추가된 전문 검색 기능을 구현한다.

  • PDF

A Study on Internet Knowledge Markets and Copyright Issues in Korea (인터넷 지식거래소와 저작권에 관한 연구)

  • Noh, Young-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.1 s.63
    • /
    • pp.121-145
    • /
    • 2007
  • This study aims to identify copyright issues regarding the knowledge content currently circulated through knowledge exchange markets in the Republic of Korea. The content providers of knowledge exchange markets comprise government & public institutions, full-text database companies, publishers and individuals. It is worth noting that commercial trade of copyrighted content or material among academic journals, database companies and knowledge exchange markets essentially exclude individual authors who are the actual copyright holders. In principle, the original author owns the copyright whether it has an explicit notice or not. Unless the author/owner officially agrees to transfer the copyright including the right for so-called "derivative works", content-making based or derived from the copyrighted material, digitalization of the copyrighted work as well as its registration on full-text database and circulation through knowledge markets are illegal.

Toward the Effective Utilization of Usage Statistics for the Management of Electronic Journals (전자저널 관리를 위한 이용통계의 효과적 활용 방안)

  • Kim, Sung-Jin
    • Journal of Information Management
    • /
    • v.41 no.4
    • /
    • pp.69-91
    • /
    • 2010
  • Libraries are encountering hostile conditions around journal licensing such as limited budget, the high price of packages, and vendor-led negotiation. They need to collect and analyze usage data of electronic journals to develop electronic journal collection appropriate for their own circumstances. The purpose of this study is to suggest an practical guideline for librarians' analysis of electronic journal usage statistics. For this, the study reviewed related previous studies and examined current state on usage statistics provided from COUNTER release 3 compliant vendors. Finally this study proposed five core statistics including full-text article request per journal, journal using rate, price per full-text article request, most use group, and low use group, and further discussed how to use them effectively for the electronic journal management.

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.47-54
    • /
    • 2021
  • Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.

The DTD Development through Document Structure Analysis of Journals (학술지 논문기사의 문헌구조 분석을 통한 DTD개발)

  • Yoon, So-Young
    • Journal of Information Management
    • /
    • v.28 no.2
    • /
    • pp.20-53
    • /
    • 1997
  • To use SGML, which is international standard of markup language to construct fulltext database in digital libraries, the DTD is developed first. It is based on structure analysis of document. This study develops the SGML DTD for Korean document through document structure analysis of the Journal of the Korean Society for Information Management.

  • PDF

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

User Access and Preferences to Full-text Databases When Searching Individual and Integrated Databases (데이터베이스통합이 유용성과 이용자선호도에 미치는 영향)

  • 박소연
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 1999.08a
    • /
    • pp.157-162
    • /
    • 1999
  • 본 연구는 분산환경에서 이용자가 다수의 데이터베이스를 개별적으로 검색할 매와 통합적으로 검색할 때에 유용성과 이용자선호도, 이용자 만족도를 비교 분석하였다. 본 연구에는 럿거스대학 School of Communication, Information, and Library Studies에 재학중인 28명의 대학원생들이 참가하였다. 두 시스템에 대한 이용자선호도와 만족도에는 통계적으로 유의한 차이가 있는 것으로 나타났다. 즉, 많은 참가자들이 통합인터페이스보다 분리인터페이스를 선호하였고, 분리인터페이스의 검색결과에 더 만족하였다. 통합인터페이스의 편리함과 능률성에도 불구하고 참가자들이 분리인터페이스를 선호한 주된 이유중의 하나는 데이터베이스를 이용자 스스로 선택하고 통제할 수 있기 때문인 것으로 나타났다.

  • PDF

Newspaper Thesaurus Construction in Theory and Practice (신문 시소러스 개발의 이론과 실제)

  • Chung Young-Mee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.25
    • /
    • pp.51-82
    • /
    • 1993
  • Effective indexing systems are required to enhance the performance of full-text retrieval systems. The result of the analysis of index terms selected by human indexers without a newspaper thesaurus indicates that controlled indexing language is necessary for effective and consistent indexing of newspaper articles. In this paper, basic principles are established for keyword selection from Korean newspapers and significant problems identified in the process of developing a newspaper thesaurus are discussed in depth.

  • PDF

Fast Construction of Suffix Arrays for DNA Strings (DNA 스트링에 대하여 써픽스 배열을 구축하는 빠른 알고리즘)

  • Jo, Jun-Ha;Kim, Nam-Hee;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.319-326
    • /
    • 2007
  • To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.

Research and Development of Document Recognition System for Utilizing Image Data (이미지데이터 활용을 위한 문서인식시스템 연구 및 개발)

  • Kwag, Hee-Kue
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.125-138
    • /
    • 2010
  • The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.