• Title/Summary/Keyword: 인덱스자료구조

Search Result 36, Processing Time 0.023 seconds

Analysis of Construction and Searching Algorithms for Compressed Index Data Structures (압축된 인덱스 자료구조를 위한 구축 및 검색 알고리즘의 성능 분석)

  • 이분녀;김동규
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2004.05a
    • /
    • pp.640-643
    • /
    • 2004
  • 기하급수적으로 증가하고 있는 방대한 양의 데이터를 효율적으로 저장하고, 검색하기 위한 방법으로 압축된 인덱스 자료구조(compressed index data structure)가 제안되었고 현재 활발히 연구되고 있다. 압축된 인덱스 자료구조란 데이터를 적절한 방법으로 색인화(indexing)하여 이를 압축(compression)된 자료구조로 저장하여, 데이터 검색의 성능저하 없이, 저장 공간을 줄일 수 있는 방법이다. 본 논문에서는 대표적인 방법으로 Ferragina와 Manzini가 제시한 FM-index를 다룬다. 이 방법을 구현하여 전체적인 성능에 영향을 미치는 요소들을 실험을 통해 분석하였다. 이를 통하여 각 파라미터들의 상관관계를 분석하고 이상적인 설정 값을 실험하였다.

  • PDF

Some Characteristics of the Performance in Comparison with Indexing techniques for File Organization (화일조직을 위한 인덱싱 기법의 성능 특성 비교)

  • Lee, Gu-Nam
    • Journal of The Korean Association of Information Education
    • /
    • v.1 no.1
    • /
    • pp.49-59
    • /
    • 1997
  • In this thesis, To provide the base of effective data access methods, performance of some indexing techniques used gent-Tally are compared. They are classified as primary key and multikey. For primary key method, made a comparative analysis on Static index. Dynamic index and Hashing. For multikey indexing method K-d tree, K-d-B tree, Inverted file and Grid file of which characteristics are compared. In many applications, multikey indexings are more requested, but are not supplied enough. So, to satisfy users' request - more fast, more exact and to be applied according to the trend of being huge database systems, it is requested more study about multikey data access methods.

  • PDF

File Content Retrieval Program Using HashMap-based Trie (HashMap 기반의 트라이를 이용한 파일 내용 검색 프로그램)

  • Kim, Sung Wan;Lee, Woosoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.467-468
    • /
    • 2014
  • 본 논문에서는 파일 내용 기반 검색 프로그램을 설계하고 구현하였다. 역 인덱스 구조를 이용하여 설계하였으며 별도의 정보 검색 라이브러리 사용 없이 구현하였다. 인덱스 파일은 트라이 자료 구조를 직접 설계 및 구현 하였으며 자바 언어의 HashMap 구조를 중첩 형태로 구현하였다. 개발 시스템의 유용성을 테스트하기 위해 GRE 단어집에 수록된 약 3,300개의 단어를 사용하여 임의 생성한 텍스트 파일 집합을 사용하였다.

  • PDF

Implementation of Rank/Select Data Structure using Alphabet Frequency (문자의 빈도수를 고려한 Rank/Select 자료구조 구현)

  • Kwon, Yoo-Jin;Lee, Sun-Ho;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.4
    • /
    • pp.283-290
    • /
    • 2009
  • The rank/select data structure is a basic tool of succinct representations for several data structures such as trees, graphs and text indexes. For a given string sequence, it is used to answer the occurrence of characters up to a certain position. In previous studies, theoretical rank/select data structures were proposed, but they didn't support practical operational time and space. In this paper, we propose a simple solution for implementing rank/select data structures efficiently. According to experiments, our methods without complex encodings achieve nH$_0$ + O(n) bits of theoretical size and perform rank/select operations faster than the original HSS data structure.

Comparisons of Practical Performance for Constructing Compressed Suffix Arrays (압축된 써픽스 배열 구축의 실제적인 성능 비교)

  • Park, Chi-Seong;Kim, Min-Hwan;Lee, Suk-Hwan;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.5_6
    • /
    • pp.169-175
    • /
    • 2007
  • Suffix arrays, fundamental full-text index data structures, can be efficiently used where patterns are queried many times. Although many useful full-text index data structures have been proposed, their O(nlogn)-bit space consumption motivates researchers to develop more space-efficient ones. However, their space efficient versions such as the compressed suffix array and the FM-index have been developed; those can not reduce the practical working space because their constructions are based on the existing suffix array. Recently, two direct construction algorithms of compressed suffix arrays from the text without constructing the suffix array have been proposed. In this paper, we compare practical performance of these algorithms of compressed suffix arrays with that of various algorithms of suffix arrays by measuring the construction times, the peak memory usages during construction and the sizes of their final outputs.

Adaptive Path Index for Efficient U Query Processing (효율적인 XML 질의 처리를 위한 적응형 경로 인덱스)

  • 민준기;심규석;정진완
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.61-71
    • /
    • 2004
  • XML can describe a wide range of data, from regular to irregular and from flat to deeply nested. Thus, XML is rapidly emerging as the do facto standard for the Web document format since XML supports an efficient data exchange and integration. Also, to retrieve the data represented by XML, several XML query languages are proposed. XML query languages such as XPath and XQuery use path expressions to traverse irregularly structured data which comprise B% elements. To evaluate path expressions, various path indexes are proposed. However, traditional path indexes are constructed by utilizing only the XML data structure. Therefore, in this paper, we propose an adaptive path index which utilizes the XML data structure as well as query workloads. To improve the query performance, the adaptive path index proposed by this paper manages the frequently used paths and the structural summary of the XML data using a hash tree and a graph structure. Experimental results show that the adaptive path index improves the query performance typically 2 to 69 times compared with the existing indexes.

A XML Indexing Technique based on DTD's Element Types in RDBMS (RDBMS를 이용한 DTD 엘리먼트 타입 기반의 문서 색인 기법)

  • Park Kwan-Soon;Kim Tack-Gon;Kim Woo-Saeng
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06c
    • /
    • pp.55-57
    • /
    • 2006
  • 최근 XML 문서가 인터넷 기반의 애플리케이션 간의 자료 저장 및 교환을 위한 표준으로써 부상함에 따라 XML 문서의 저장 및 관리에 대한 연구가 활발히 이루어지고 있다. 하지만 XML 문서의 검색에 관련된 많은 연구들이 모든 XML 엘리먼트 경로에 대한 색인화로 인해 인덱스의 크기가 커지고 이에 비례하여 검색성능이 뜯어지는 문제를 보이고 있다. 본 논문에서는 이를 개선하기 위해 엘리먼트 타일을 기반으로 전통적인 역색인 방법을 XML 문서에 맞게 확장하고, RDBMS에 기반하여 계층구조를 갖는 XML 문서들의 자료를 구조적 넘버링(Numbering) 방법의 인덱스로 설계 하였다. 인덱스 테이블들은 엘리먼트 타입의 정보를 담고 있는 엘리먼트 타입 테이블, XML문서의 경로를 가지고 있는 경로 테이블, 역색인으로 구성된 Term테이블. Term 경로를 나타내는 Term경로 테이블을 생성한다. 이전의 XML 인덱싱 기법에 관련된 연구들에서 보이는 XML 문서상의 모든 경로에 대한 표현을 간소화 시키고, 이를 통해 보다 좋은 검색 성능을 보이고자 하였다.

  • PDF

An Index Data Structure for String Search in External Memory (외부 메모리에서 문자열을 효율적으로 탐색하기 위한 인덱스 자료 구조)

  • Na, Joong-Chae;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.598-607
    • /
    • 2005
  • We propose a new external-memory index data structure, the Suffix B-tree. The Suffix B-tree is a B-tree in which the key is a string like the String B-tree. While the node in the String B-tree is implemented with a Patricia trio, the node in the Suffix B-tree is implemented with an array. So the Suffix B-tree is simpler and easier to be Implemented than the String B-tree. Nevertheless, the branching algorithm of the Suffix B-tree is as efficient as that of the String B-tree. Consequently, the Suffix B-tree takes the same worst-case disk accesses as the String B-tree to solve the string matching problem, which is fundamental and important in the area of string algorithms.

A Study on Improvement of Digital National Survey Map System (디지털국토통계지도 시스템 개선에 관한 연구)

  • Lee, Jong-Yong;An, Jung-Cheon;Cho, Sung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.9 no.4
    • /
    • pp.60-70
    • /
    • 2006
  • National atlas map for provide various information is one part of National territorial Statatics Survey but National atlas map in 2004 year don't have stability and ability. National territorial Statatics Survey in 2005 years have octuple data compare with data in 2004 years but have only one map. One map is going to provide with stability and ability. We don't use DBMS, But We make a similarly struct in file based program. We programmed system of dynamic-linked data with spatial data. To dynamic-linked system, we make search engine to based index struct and make combobox search system. spatial data only have index codes(year, national terrial indicator, area). If spatial data request specfied data, search engine search index code and provide DB data. New system is middle step of using DBMS. We redraw map for display real Korea area (with dokdo). New map is shape and similar Korea map.

  • PDF

Fast Construction of Suffix Arrays for DNA Strings (DNA 스트링에 대하여 써픽스 배열을 구축하는 빠른 알고리즘)

  • Jo, Jun-Ha;Kim, Nam-Hee;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.319-326
    • /
    • 2007
  • To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.