• Title/Summary/Keyword: Web Document Retrieval

Search Result 129, Processing Time 0.032 seconds

Implementation of an XML-Based Editor/Transformer for Large Volume of Similar Documents (XML 기반의 대용량 유사 문서 편집기/변환기 구현)

  • 황인준
    • The Journal of Society for e-Business Studies
    • /
    • v.9 no.1
    • /
    • pp.21-38
    • /
    • 2004
  • With its recent popularity, Web is now considered as a huge repository of information. Most documents on the web have been created using HTML(Hyper Text Markup Language). Even though HTML is simple and easy to learn, it has several features that are obstacles to the efficient information retrieval. XML(eXtensible Markup Language) can provide a solution to such problems and in fact, has already been used in many applications, XML is a standard markup language for exchanging data on the web. It can describe a document structure freely by defining its DTD, which enables efficient integration and retrieval of data on the web. In this paper, we propose a versatile and efficient XML document manager. Its features include (i) form-based XML editor that enables easy creation of new XML documents, (ii) automatic document converter that can transform HTML documents with similar structure into XML documents automatically, and (iii) GUI-based DTD editor.

  • PDF

A Dynamic Ontology-based Multi-Agent Context-Awareness User Profile Construction Method for Personalized Information Retrieval

  • Gao, Qian;Cho, Young Im
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.270-276
    • /
    • 2012
  • With the increase in amount of data and information available on the web, there have been high demands on personalized information retrieval services to provide context-aware services for the web users. This paper proposes a novel dynamic multi-agent context-awareness user profile construction method based on ontology to incorporate concepts and properties to model the user profile. This method comprehensively considers the frequency and the specific of the concept in one document and its corresponding domain ontology to construct the user profile, based on which, a fuzzy c-means clustering method is adopted to cluster the user's interest domain, and a dynamic update policy is adopted to continuously consider the change of the users' interest. The simulation result shows that along with the gradual perfection of the our user profile, our proposed system is better than traditional semantic based retrieval system in terms of the Recall Ratio and Precision Ratio.

A Study on Paper Retrieval System based on OWL Ontology (OWL 온톨로지를 기반으로 하는 논문 검색 시스템에 관한 연구)

  • Sun, Bok-Keun;We, Da-Hyun;Han, Kwang-Rok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.2
    • /
    • pp.169-180
    • /
    • 2009
  • The conventional paper retrieval is the keyword-based search and as a huge amount of data be published, this search becomes more difficult in retrieving information that user want to find. In order to search for information to the user's intent, we need to introduce semantic Web that represents semantics of Web document resources on the Internet environment as ontology and enables the computer to understand this ontology. Therefore, we describe a paper retrieval system through OWL(Ontology Web Language) ontology-based reason in this paper. We build the paper ontology based on OWL which is new popular ontology language for semantic Web and represent the correlation among diverse paper properties as the DL(description logic) query, and then this system infers the correct results from the paper ontology by using the DL query and makes it possible to retrieve information intelligently. Finally, we compared our experimental result with the conventional retrieval.

The Avata Construction System for Image Lossless Scaling (이미지 손실없는 확대/축소가 가능한 아바타 생성 시스템)

  • 김원중;장미화
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.2
    • /
    • pp.181-189
    • /
    • 2002
  • In this paper, we designed and implemented Avata construction system using XML(extensible Markup Language) and SVG(Scalable Vector Graphic). The Web character created with Avata(or Web character) construction system are displayed in same (on without damage of image, regardless terminal type and user can modify and change image easily in form that want. Compare with existing Web character system, the Reusability of web character part element Is increased greatly with Avata construction system of this paper. Because SVG is described by text, graphic retrieval is convenient, and applications can use easily SVG document. Also, SVG can create web graphic document dynamically with database because can access easily in all graphic primitives of line, Polygon, text, image etc. As well as web character using study finding, we may develop usable technology to some contents on World Wide Web.

A Study on Document Retrieval of Web Using Relevance Feedback (적합성 피드백을 이용한 웹 문서검색에 관한 연구)

  • 김영천;이성주
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.3
    • /
    • pp.597-604
    • /
    • 2001
  • In conventional boolean retrieval systems, document ranking is not supported and similarity coefficients cannot be computed between queries and documents. The MMM, Paice and P-norm models have been proposed in the past to support the ranking facility for boolean retrieval systems. They have common properties of interpreting boolean operators softly. In this paper we propose a new soft evaluation method for Information retrieval using query splitting relevance feedback model. We also show through performance comparison that query splitting relevance feedback(QSRF) is more efficient and effective than MMM, Paice and P-norm.

  • PDF

An Ontology-based Knowledge Management System - Integrated System of Web Information Extraction and Structuring Knowledge -

  • Mima, Hideki;Matsushima, Katsumori
    • Proceedings of the CALSEC Conference
    • /
    • 2005.03a
    • /
    • pp.55-61
    • /
    • 2005
  • We will introduce a new web-based knowledge management system in progress, in which XML-based web information extraction and our structuring knowledge technologies are combined using ontology-based natural language processing. Our aim is to provide efficient access to heterogeneous information on the web, enabling users to use a wide range of textual and non textual resources, such as newspapers and databases, effortlessly to accelerate knowledge acquisition from such knowledge sources. In order to achieve the efficient knowledge management, we propose at first an XML-based Web information extraction which contains a sophisticated control language to extract data from Web pages. With using standard XML Technologies in the system, our approach can make extracting information easy because of a) detaching rules from processing, b) restricting target for processing, c) Interactive operations for developing extracting rules. Then we propose a structuring knowledge system which includes, 1) automatic term recognition, 2) domain oriented automatic term clustering, 3) similarity-based document retrieval, 4) real-time document clustering, and 5) visualization. The system supports integrating different types of databases (textual and non textual) and retrieving different types of information simultaneously. Through further explanation to the specification and the implementation technique of the system, we will demonstrate how the system can accelerate knowledge acquisition on the Web even for novice users of the field.

  • PDF

Optimization Model on the World Wide Web Organization with respect to Content Centric Measures (월드와이드웹의 내용기반 구조최적화)

  • Lee Wookey;Kim Seung;Kim Hando;Kang Sukho
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.30 no.1
    • /
    • pp.187-198
    • /
    • 2005
  • The structure of a Web site can prevent the search robots or crawling agents from confusion in the midst of huge forest of the Web pages. We formalize the view on the World Wide Web and generalize it as a hierarchy of Web objects such as the Web as a set of Web sites, and a Web site as a directed graph with Web nodes and Web edges. Our approach results in the optimal hierarchical structure that can maximize the weight, tf-idf (term frequency and inverse document frequency), that is one of the most widely accepted content centric measures in the information retrieval community, so that the measure can be used to embody the semantics of search query. The experimental results represent that the optimization model is an effective alternative in the dynamically changing Web environment by replacing conventional heuristic approaches.

Design and Implementation of Supporting System of a Self-Directed Learning using Virtual Document Concept (가상문서를 개념을 활용한자기 주도적 학습지원 시스템의 설계 및 구현)

  • Noh, Jin-Soon;Lee, Yong-Bae;Myaeng, Sung-Hyon
    • Journal of The Korean Association of Information Education
    • /
    • v.6 no.2
    • /
    • pp.234-245
    • /
    • 2002
  • A new era has come where high quality educational materials can be acquired easily through the World Wide Web. These materials, however, need to be refined and streamlined to maximize their effect on education. In order to provide such a streamlined flow, we need to be able to re-organize documents, which exist independent of each other on the Web, in a way that maintains their appropriate order in the right context to satisfy educational purposes. In addition, we should be able to provide supplementary explanations or missing information to the organized materials for smooth connections among them. In order to meet the requirements, we employed the virtual document concept that allows us to reuse existing documents for educational purposes. By providing a retrieval engine for virtual documents, we attempt to induce self-directed learning based on document retrieval, suitable for the level and purpose of students.

  • PDF

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

  • Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.87-93
    • /
    • 2012
  • This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.