• Title/Summary/Keyword: 색인화

Search Result 272, Processing Time 0.025 seconds

Design and Implementation of Distributed Web Crawler Using Globus Environment (글로버스를 이용한 분산 웹 크롤러의 설계 및 구현)

  • 이지선;김양우;이필우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04a
    • /
    • pp.712-714
    • /
    • 2004
  • 대부분의 웹 검색 엔진들과 많은 특화된 검색 도구들은 웹 페이지의 색인화와 분석을 위한 전처리 단계로 대규모 웹 페이지들을 수집하기 위해 웹 크롤러에 의존한다. 일반적인 웹 크롤러는 몇 주 또는 몇 달의 주기에 걸쳐 수백만 개의 호스트들과 상호작용을 통해 웹 페이지 정보를 수집한다. 본 논문에서는 이러한 크롤러의 성능향상과 효율적인 실행을 위해 그리드 미들웨어인 글로버스 툴킷을 이용하여 분산된 크롤러를 제안한다. 본 웹 크롤러의 실행은 그 기능의 분산처리를 위한 각 호스트 서버들을 글로버스로 연결하고, 인증하여, 작업을 할당하는 단계와, 크롤러 프로그램이 실행되어 자료를 수집하는 단계. 마지막으로 이렇게 수집된 웹 페이지 정보들을 처음 명령한 시스템으로 반환하는 단계로 나누어진다. 결과 수집 작업을 보다 분산화 할 수 있게 하였으며 여러 대의 저 비용의 시스템에서 고 비용, 고 사양의 서버의 성능을 얻을 수 있었으며, 확장이 용이하고, 견고한 크롤러 프로그램 및 시스템 환경을 구축할 수 있었다.

  • PDF

Proximate Word Filtering by Hierarchical Clustering (계층적 군집화를 이용한 근사 단어 필터링 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.1101-1104
    • /
    • 2012
  • 단어 필터링은 유해정보를 차단위한 기본적인 기능이다. 그러나 악의적인 사용자는 필터링 시스템을 우회하기 위하여 금지 단어에 의도적인 변형을 가한다. 이에 대응하기 위해 일정 오류를 허용하여 필터링을 수행하는 근사 단어 필터링이 있다. 근사 단어를 검색하기 위한 문자열 색인 방법으로는 주로 기준 단어(Pivot)을 이용한 유클리드 공간에의 사상을 이용하는데, 이는 단어 필터링에 응용하기에는 근본적인 구조상의 한계점이 있다. 본 논문에서는 필터링 대상이 되는 단어 집합 내에서 군집화를 수행하여 계층적인 자료구조를 구성하고, 단어 필터링을 위한 필터링 질의(Filtering query)를 정의한 뒤 그에 적합한 탐색 상의 적용에 관하여 설명한다. 실험 결과 기존의 기준 단어(Pivot)을 이용한 색인 기법에 비하여 16.9%~26.6%의 탐색 속도 향상을 확인할 수 있었다.

방송자료의 데이터베이스와 검색시스템 구축

  • 김종태;조창익
    • Broadcasting and Media Magazine
    • /
    • v.1 no.4
    • /
    • pp.12-19
    • /
    • 1996
  • 음향과 영상 자료를 디지털화하여 데이타베이스에 저장, 색인, 검색 관리를 효과적으로 할 수 있는 시스템의 사례를 중심으로 기술하였다. 특히 시스템의 설계 및 구축에 요하는 과정과 필요요소 및 관련 사항에 대해 소개하였고, 마지막으로 설계시 고려할 중요한 사항에 대해 언급하였다.

  • PDF

Video retrieval method using non-parametric based motion classification (비-파라미터 기반의 움직임 분류를 통한 비디오 검색 기법)

  • Kim Nac-Woo;Choi Jong-Soo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.2 s.308
    • /
    • pp.1-11
    • /
    • 2006
  • In this paper, we propose the novel video retrieval algorithm using non-parametric based motion classification in the shot-based video indexing structure. The proposed system firstly gets the key frame and motion information from each shot segmented by scene change detection method, and then extracts visual features and non-parametric based motion information from them. Finally, we construct real-time retrieval system supporting similarity comparison of these spatio-temporal features. After the normalized motion vector fields is created from MPEG compressed stream, the extraction of non-parametric based motion feature is effectively achieved by discretizing each normalized motion vectors into various angle bins, and considering a mean, a variance, and a direction of these bins. We use the edge-based spatial descriptor to extract the visual feature in key frames. Experimental evidence shows that our algorithm outperforms other video retrieval methods for image indexing and retrieval. To index the feature vectors, we use R*-tree structures.

A Comparative Analysis of Content-based Music Retrieval Systems (내용기반 음악검색 시스템의 비교 분석)

  • Ro, Jung-Soon
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.3
    • /
    • pp.23-48
    • /
    • 2013
  • This study compared and analyzed 15 CBMR (Content-based Music Retrieval) systems accessible on the web in terms of DB size and type, query type, access point, input and output type, and search functions, with reviewing features of music information and techniques used for transforming or transcribing of music sources, extracting and segmenting melodies, extracting and indexing features of music, and matching algorithms for CBMR systems. Application of text information retrieval techniques such as inverted indexing, N-gram indexing, Boolean search, truncation, keyword and phrase search, normalization, filtering, browsing, exact matching, similarity measure using edit distance, sorting, etc. to enhancing the CBMR; effort for increasing DB size and usability; and problems in extracting melodies, deleting stop notes in queries, and using solfege as pitch information were found as the results of analysis.

A Study on Spatial-temporal indexing for querying current and past positions (현재와 과거 위치 질의를 위한 시공간 색인에 관한 연구)

  • Jun, Bong-Gi
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.6
    • /
    • pp.1250-1256
    • /
    • 2004
  • The movement of continuously changing positions should be stored and indexed for querying current and past positions. A simple extension of the original R-tree to add time as another dimension, called 3D R-tree, does not handle current position queries and does not address the problem of low space utilization due to high overlap of index nodes. In this paper, 1 propose the dynamic splitting policy for improving the 3D R-tree in order to improve space utilization of split nodes. 1 also extend the original 3D R-tree by introducing a new tagged index structure for being able to query the current and past positions of moving objects. 1 found out that my extension of the original R-tree, called the tagged dynamic 3DR-tree, outperforms both the 3D R-tree and 75-tree when querying current and past position.

An Automatic Text Categorization Theories and Techniques for Text Management (문서관리를 위한 자동문서범주화에 대한 이론 및 기법)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of Information Management
    • /
    • v.33 no.2
    • /
    • pp.19-32
    • /
    • 2002
  • With the growth of the digital library and the use of Internet, the amount of online text information has increased rapidly. The need for efficient data management and retrieval techniques has also become greater. An automatic text categorization system assigns text documents to predefined categories. The system allows to reduce the manual labor for text categorization. In order to classify text documents, the good features from the documents should be selected and the documents are indexed with the features. In this paper, each steps of text categorization and several techniques used in each step are introduced.

Reordering Scheme of Location Identifiers for Indexing RFID Tags (RFID 태그의 색인을 위한 위치 식별자 재순서 기법)

  • Ahn, Sung-Woo;Hong, Bong-Hee
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.198-214
    • /
    • 2009
  • Trajectories of RFID tags can be modeled as a line, denoted by tag interval, captured by an RFID reader and indexed in a three-dimensional domain, with the axes being the tag identifier (TID), the location identifier (LID), and the time (TIME). Distribution of tag intervals in the domain space is an important factor for efficient processing of a query for tracing tags and is changed according to arranging coordinates of each domain. Particularly, the arrangement of LIDs in the domain has an effect on the performance of queries retrieving the traces of tags as times goes by because it provides the location information of tags. Therefore, it is necessary to determine the optimal ordering of LIDs in order to perform queries efficiently for retrieving tag intervals from the index. To do this, we propose LID proximity for reordering previously assigned LIDs to new LIDs and define the LID proximity function for storing tag intervals accessed together closely in index nodes when a query is processed. To determine the sequence of LIDs in the domain, we also propose a reordering scheme of LIDs based on LID proximity. Our experiments show that the proposed reordering scheme considerably improves the performance of Queries for tracing tag locations comparing with the previous method of assigning LIDs.

Development of Similar Bibliographic Retrieval System based on Neighboring Words and Keyword Topic Information (인접한 단어와 키워드 주제어 정보에 기반한 유사 문헌 검색 시스템 개발)

  • Kim, Kwang-Young;Kwak, Seung-Jin
    • Journal of Korean Library and Information Science Society
    • /
    • v.40 no.3
    • /
    • pp.367-387
    • /
    • 2009
  • The similar bibliographic retrieval system follows whether it selects a thing of the extracted index term and or not the difference in which the similar document retrieval system There be many in the search result is generated. In this research, the method minimally making the error of the selection of the extracted candidate index term is provided In this research, the word information in which it is adjacent by using candidate index terms extracted from the similar literature and the keyword topic information were used. And by using the related author information and the reranking method of the search result, the similar bibliographic system in which an accuracy is high was developed. In this paper, we conducted experiments for similar bibliographic retrieval system on a collection of Korean journal articles of science and technology arena. The performance of similar bibliographic retrieval system was proved through an experiment and user evaluation.

  • PDF

Leveled Spatial Indexing Technique supporting Map Generalization (지도 일반화를 지원하는 계층화된 공간 색인 기법)

  • Lee, Ki-Jung;WhangBo, Taeg-Keun;Yang, Young-Kyu
    • Journal of Korea Spatial Information System Society
    • /
    • v.6 no.2 s.12
    • /
    • pp.15-22
    • /
    • 2004
  • Map services for cellular phone have problem for implementation, which are the limitation of a screen size. To effectively represent map data on screen of celluar phone, it need a process which translate a detailed map data into less detailed data using map generalization, and it should manipulate zoom in out quickly by leveling the generalized data. However, current spatial indexing methods supporting map generalization do not support all map generalization operations. In this paper, We propose a leveled spatial indexing method, LMG-tree, supporting map generalization and presents the results of performance evaluation.

  • PDF