• Title/Summary/Keyword: relevant information retrieval

Search Result 186, Processing Time 0.029 seconds

A Study on Query Refinement by Online Relevance Feedback in an Information Filtering System (온라인 이용자 피드백을 사용한 정보필터링 시스템의 수정질의 최적화에 관한 연구)

  • Choi, Kwang;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.4 s.50
    • /
    • pp.23-48
    • /
    • 2003
  • In this study an information filtering system was implemented and a series of relevance feedback experiments were conducted using the system. For the relevance feedback, the original queries were searched against the database and the results were reviewed by the researchers. Based on users' online relevance judgements a pair of 17 refined queries were generated using two methods called 'co-occurrence exclusion method' and 'lower frequencies exclusion method,' In order to generate them, the original queries, the descriptors and category codes appeared in either relevant or irrelevant document sets were applied as elements. Users' relevance judgments on the search results of the refined queries were compared and analyzed against those of the original queries.

Comparison and Evaluation of Web-based Image Search Engines (이미지정보 탐색을 위한 웹 검색엔진의 비교 평가)

  • Kim, Hyo-Jung
    • Journal of Information Management
    • /
    • v.31 no.4
    • /
    • pp.50-70
    • /
    • 2000
  • Since the contents of internet resources are beginning to include texts, images and sounds, different Web-based image search engines have been developed accordingly. It is a fact that these diversities of multimedia contents have made search process and retrieval of relevant information very difficult. The purpose of the study is to compare and evaluate its special features and performance of the existing image search engines in order to provide user help to select appropriate search engines. The study selected AV Photo Finder, Lycos MultiMedia, Amazing Picture Machine, Image Surfer, WebSeek, Ditto for comparison and evaluation because of their reputations of popularity among users of image search engines. The methodology of the study was to analyze previous related literature and establish criteria for the evaluation of image search engines. The study investigated characteristics, indexing methods, search capabilities, screen display and user interfaces of different search engines for the purpose of comparison of its performance. Finally, the study measured relative recall and precision ratios to evaluate their electiveness of retrieval under the experimental set up. Results of the comparative analysis in regard to its search performance are as follows. AV Photo Finder marked the highest rank among other image search engines. Ditto and WebSeek also showed comparatively high precision ratio. Lycos MultiMedia and Image Surfer follows after them. Amazing Picture Machine stowed the lowest in ranking.

  • PDF

A Study on the Law2Vec Model for Searching Related Law (연관법령 검색을 위한 워드 임베딩 기반 Law2Vec 모형 연구)

  • Kim, Nari;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.18 no.7
    • /
    • pp.1419-1425
    • /
    • 2017
  • The ultimate goal of legal knowledge search is to obtain optimal legal information based on laws and precedent. Text mining research is actively being undertaken to meet the needs of efficient retrieval from large scale data. A typical method is to use a word embedding algorithm based on Neural Net. This paper demonstrates how to search relevant information, applying Korean law information to word embedding. First, we extracts reference laws from precedents in order and takes reference laws as input of Law2Vec. The model learns a law by predicting its surrounding context law. The algorithm then moves over each law in the corpus and repeats the training step. After the training finished, we could infer the relationship between the laws via the embedding method. The search performance was evaluated based on precision and the recall rate which are computed from how closely the results are associated to the search terms. The test result proved that what this paper proposes is much more useful compared to existing systems utilizing only keyword search when it comes to extracting related laws.

Document Clustering using Term reweighting based on NMF (NMF 기반의 용어 가중치 재산정을 이용한 문서군집)

  • Lee, Ju-Hong;Park, Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.11-18
    • /
    • 2008
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the re-weighted term based NMF(non-negative matrix factorization) to cluster documents relevant to a user's requirement. The proposed model uses the re-weighted term by using user feedback to reduce the gap between the user's requirement for document classification and the document clusters by means of machine. The Proposed method can improve the quality of document clustering because the re-weighted terms. the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

  • PDF

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

An Experimental Study on the Performance Improvement of Automatic Classification for the Articles of Korean Journals Based on Controlled Keywords in International Database (해외 데이터베이스의 통제키워드에 기초한 국내 학술지 논문의 자동분류 성능 향상에 관한 실험적 연구)

  • Kim, Pan Jun;Lee, Jae Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.48 no.3
    • /
    • pp.491-510
    • /
    • 2014
  • As a major factor for efficient management and retrieval of the articles in databases, keywords are classified into uncontrolled keywords and controlled keywords. Most of Korean scholarly databases fail to provide controlled vocabularies to indexing research articles which help users to retrieve relevant papers exhaustively. In this paper, we carried out automatic descriptor assignment experiments to Korean articles using automatic classifiers learned with descriptors in international database. The results of the experiments show that the classifier learning with descriptors in international database can potentially offer controlled vocabularies to Korean scholarly articles having English s. Also, we sought to improve the performance of automatic descriptor assignment using various classifiers and combination of them.

Building a Philosophy Ontology based on Content of Texts and its Application to Learning (텍스트 내용 기반의 철학 온톨로지 구축 및 교육에의 응용)

  • Chung, Hyun-Sook;Choi, Byung-Il
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.2
    • /
    • pp.257-270
    • /
    • 2005
  • Researchers of humane studies including philosophy acquire knowledge from understanding of their texts. They spent a lot time and efforts to retrieve, read and understand many texts relevant to their research fields using a metadata-based text retrieval system. In this paper, we develop a philosophy ontology that enables researchers to retrieve knowledge in the content of texts of philosophy. Our philosophy ontology includes concepts and their hierarchical and associative relationships defined by philosophy researchers. We propose a methodology for constructing text-based ontology comprised of three phases and fourteen steps. This methodology may be used to construct another ontologies for learning. Also, we introduce a case study for applying our philosophy ontology to acquire and interchange knowledge of philosophy between a professor and students during philosophy classes.

  • PDF

User Profile based Personalized Web Agent (사용자 프로파일 기반 개인 웹 에이전트)

  • So, Young-Jun;Park, Young-Tack
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.3
    • /
    • pp.248-256
    • /
    • 2000
  • This paper presents a personalized web agent that constructs user profile which consists of user preferences on the web and recommends his/her relevant information to the user. The personalized web agent consists of monitor agent, user profile construction agent, and user profile refinement agent. The monitor agent makes a user describe his/her preferences directly and it creates the database of preference document, finally performs several keyword extraction to increase the accuracy of the DB. The user profile construction agent transforms the extracted keywords into user profile that could be confirmed and edited by the user. and the refinement agent refines user profile by recursively learning and processing user feedback. In this paper, we describe the several keyword weighting and inductive learning techniques in detail. Finally, we describe the adaptive web retrieval and push agent that perform adaptive services to the user.

  • PDF

Design and Evaluation of a Gateway to Faculty Syllabi in Computer Science (인터넷 대학강의안 전산학분야 메타데이터 시스템 구축 및 평가)

  • 이은경;오삼균
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.1
    • /
    • pp.65-84
    • /
    • 2001
  • The purpose of this study was to design and evaluate a metadata system for internet-based syllabi in computer science. The study constructed two prototype systems for the experiment. One was constructed using only Dubline Core (DC) elements and the other was a DC-expanded system with additional eight elements that are not the part of DC elements. The thlrty subjects were chosen from those who majored in Computer Science. Two retrieval tasks were assigned to them. One was to find syllabi in which they are' interested and the other was to find relevant course syllabi for a given course title. After the search, they were asked to evaluate the systems in terms of efficiency, accuracy, and their satisfaction of the system. The result of the first experiment indicates that DC-based system performed significantly better in terms of search time and DC-expanded system in terms of satisfaction measure. An additional experiment was conducted to test efficiency of the browsing categories. The interview with subjects was carried out to find any difficulties associated with the current browsing scheme. The subjects expressed much1 satisfaction about assigning a course to multiple browsing categories.

  • PDF

Standard Translation of Terms of Korean Medicine through Consideration of Chinese-Korean Collated Medical Classics - With focus on 『Eonhaegugeupbang』, 『Eonhaetaesanjipyo』 and 『Eonhaetaesanjipyo』 - (언해의서 비교고찰을 통한 한의학용어의 번역표준안 - 『언해두창집요』, 『언해구급방』, 『언해태산집요』를 중심으로)

  • Ku, Hyunhee;Kim, Hyunkoo;Lee, JungHyun;Oh, Junho;Kwon, Ohmin
    • Korean Journal of Oriental Medicine
    • /
    • v.18 no.3
    • /
    • pp.49-61
    • /
    • 2012
  • This article set out to develop an old Chinese - modern Korean collated terminology by analyzing and paralleling Chinese-Korean translational terms relevant to Korean medicine at a minimum meaning unit from "Eonhaegugeupbang", "Eonhaetaesanjipyo" and "Eonhaetaesanjipyo". Those are composed of original Chinese texts and their subsequent corresponding Korean translations. It tries to make a list of translational standards of Korean medicine terms by classifying the cases of translational ambiguity in terms of disease, body position, thumbnail-pressing acupuncture method, and disease-curing method. The above-mentioned ancient books are medical classics written by Huh Jun, the representative medical physician, and published by the Joseon government. Thus, they are appropriate enough as historically legitimate medical documents, from which are drawn out words and terms to form an old Chinese - modern Korean collation dictionary. This collation glossary will contribute to the increased relevance of data ming, or information retrieval. in a database system and information search engine of massive Korean medical records, by means of providing a novel way to obtaining synchronized results between the original writings of old Chinese and the secondary translated ones of modern Korean. The glossary will promote the collective but consistent translation of numerous old archives of Korean medicine and in other related fields as well.