Search | Korea Science

Efficient Indexing for Large DNA Sequence Databases (대용량 DNA 시퀀스 데이타베이스를 위한 효율적인 인덱싱)

Won Jung-Im;Yoon Jee-Hee;Park Sang-Hyun;Kim Sang-Wook
- Journal of KIISE:Databases
- /
- v.31 no.6
- /
- pp.650-663
- /
- 2004
In molecular biology, DNA sequence searching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, a fast indexing mechanism is essential for efficient processing of DNA sequence searches. In this paper, we first identify the problems of the suffix tree in aspects of the storage overhead, search performance, and integration with DBMSs. Then, we propose a new index structure that solves those problems. The proposed index consists of two parts: the primary part represents the trie as bit strings without any pointers, and the secondary part helps fast accesses of the leaf nodes of the trio that need to be accessed for post processing. We also suggest an efficient algorithm based on that index for DNA sequence searching. To verify the superiority of the proposed approach, we conducted a performance evaluation via a series of experiments. The results revealed that the proposed approach, which requires smaller storage space, achieves 13 to 29 times performance improvement over the suffix tree.
PDF KSCI

KDBcs-Tree : An Efficient Cache Conscious KDB-Tree for Multidimentional Data (KDBcs-트리 : 캐시를 고려한 효율적인 KDB-트리)

Yeo, Myung-Ho;Min, Young-Soo;Yoo, Jae-Soo
- Journal of KIISE:Databases
- /
- v.34 no.4
- /
- pp.328-342
- /
- 2007
We propose a new cache conscious indexing structure for processing frequently updated data efficiently. Our proposed index structure is based on a KDB-Tree, one of the representative index structures based on space partitioning techniques. In this paper, we propose a data compression technique and a pointer elimination technique to increase the utilization of a cache line. To show our proposed index structure's superiority, we compare our index structure with variants of the CR-tree(e.g. the FF CR-tree and the SE CR-tree) in a variety of environments. As a result, our experimental results show that the proposed index structure achieves about 85%, 97%, and 86% performance improvements over the existing index structures in terms of insertion, update and cache-utilization, respectively.
PDF KSCI

A DNA Index Structure using Frequency and Position Information of Genetic Alphabet (염기문자의 빈도와 위치정보를 이용한 DNA 인덱스구조)

Kim Woo-Cheol;Park Sang-Hyun;Won Jung-Im;Kim Sang-Wook;Yoon Jee-Hee
- Journal of KIISE:Databases
- /
- v.32 no.3
- /
- pp.263-275
- /
- 2005
In a large DNA database, indexing techniques are widely used for rapid approximate sequence searching. However, most indexing techniques require a space larger than original databases, and also suffer from difficulties in seamless integration with DBMS. In this paper, we suggest a space-efficient and disk-based indexing and query processing algorithm for approximate DNA sequence searching, specially exact match queries, wildcard match queries, and k-mismatch queries. Our indexing method places a sliding window at every possible location of a DNA sequence and extracts its signature by considering the occurrence frequency of each nucleotide. It then stores a set of signatures using a multi-dimensional index, such as R*-tree. Especially, by assigning a weight to each position of a window, it prevents signatures from being concentrated around a few spots in index space. Our query processing algorithm converts a query sequence into a multi-dimensional rectangle and searches the index for the signatures overlapped with the rectangle. The experiments with real biological data sets revealed that the proposed method is at least three times, twice, and several orders of magnitude faster than the suffix-tree-based method in exact match, wildcard match, and k- mismatch, respectively.
PDF KSCI

EPR : Enhanced Parallel R-tree Indexing Method for Geographic Information System (EPR : 지리 정보 시스템을 위한 향상된 병렬 R-tree 색인 기법)

Lee, Chun-Geun;Kim, Jeong-Won;Kim, Yeong-Ju;Jeong, Gi-Dong
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.9
- /
- pp.2294-2304
- /
- 1999
Our research purpose in this paper is to improve the performance of query processing in GIS(Geographic Information System) by enhancing the I/O performance exploiting parallel I/O and efficient disk access. By packing adjacent spatial data, which are very likely to be referenced concurrently, into one block or continuous disk blocks, the number of disk accesses and the disk access overhead for query processing can be decreased, and this eventually leads to the I/O time decrease. So, in this paper, we proposes EPR(Enhanced Parallel R-tree) indexing method which integrates the parallel I/O method of the previous Parallel R-tree method and a packing-based clustering method. The major characteristics of EPR method are as follows. First, EPR method arranges spatial data in the increasing order of proximity by using Hilbert space filling curve, and builds a packed R-tree by bottom-up manner. Second, with packing-based clustering in which arranged spatial data are clustered into continuous disk blocks, EPR method generates spatial data clusters. Third, EPR method distributes EPR index nodes and spatial data clusters on multiple disks through round-robin striping. Experimental results show that EPR method achieves up to 30% or more gains over PR method in query processing speed. In particular, the larger the size of disk blocks is and the smaller the size of spatial data objects is, the better the performance of query processing by EPR method is.
PDF

Fast Hilbert R-tree Bulk-loading Scheme using GPGPU (GPGPU를 이용한 Hilbert R-tree 벌크로딩 고속화 기법)

Yang, Sidong;Choi, Wonik
- Journal of KIISE
- /
- v.41 no.10
- /
- pp.792-798
- /
- 2014
In spatial databases, R-tree is one of the most widely used indexing structures and many variants have been proposed for its performance improvement. Among these variants, Hilbert R-tree is a representative method using Hilbert curve to process large amounts of data without high cost split techniques to construct the R-tree. This Hilbert R-tree, however, is hardly applicable to large-scale applications in practice mainly due to high pre-processing costs and slow bulk-load time. To overcome the limitations of Hilbert R-tree, we propose a novel approach for parallelizing Hilbert mapping and thus accelerating bulk-loading of Hilbert R-tree on GPU memory. Hilbert R-tree based on GPU improves bulk-loading performance by applying the inversed-cell method and exploiting parallelism for packing the R-tree structure. Our experimental results show that the proposed scheme is up to 45 times faster compared to the traditional CPU-based bulk-loading schemes.
https://doi.org/10.5626/JOK.2014.41.10.792 인용

Performance Evaluation of a Spatial Index Structure Supporting the Circular Property in Spatial Database Systems (공간 데이타베이스 시스템에서 순환 속성을 지원하는 공간색인구조의 성능평가)

김홍기;선휘준
- Journal of Korea Multimedia Society
- /
- v.4 no.3
- /
- pp.197-204
- /
- 2001
In order to increase the performance of spatial database systems, a spatial indexing method is necessary to manage spatial objects efficiently in both dynamic and static environments. A spatial indexing method considering a spatial locality is required to increase the retrieval performance. And the spatial locality is related to the location property of objects. The previous spatial indexing methods did not consider the circular location property of objects. In this paper, we introduce the CR-Tree that is a spatial index structure for clustering spatially adjacent objects in which a search space is constructed with the circular and linear domains. Using a spatial index structure considered a circular location property of objects, we show that high hit ratio and bucket utilization are increased through the simulation.
PDF

A Study on Natural Language Keyword Indexing for Web-based Information Retrieval (웹기반 정보검색을 위한 자연어 키워드 색인에 관한 연구)

윤성희
- Journal of the Korea Computer Industry Society
- /
- v.4 no.12
- /
- pp.1103-1111
- /
- 2003
Information retrieval system with indexing system matching single keyword is simple and popular. But with single keyword matching it is very hard to represent the exact meaning of documents and the set of documents from retrieval is very large, therefore it can't satisfy the user of the information retrieval systems. This paper proposes a phrase-based indexing system based on the phrase, the larger syntax unit than a single keyword. Web documents include lots of syntactic errors, the natural language parser with high Quality cannot be expected in Web. Partial trees, even not a full tree, from fully bottom-up parsing is still useful for extracting phrases, and they are much more discriminative than single keyword for index. It helps the information retrieval system enhance the efficiency and reduce the processing overhead.
PDF

T-Tree Index Structures Utilizing Prefetch Methods (프리패치 기법을 적용한 T.트리 인덱스 구조)

Lee, Ig-Hoon;Shim, Jun-Ho
- The Journal of Society for e-Business Studies
- /
- v.14 no.4
- /
- pp.119-131
- /
- 2009
During a decade, e-Commerce environments supporting real-time transaction processing have been getting larger. In telecommunication and financial environments, research and building for main memory database systems have been doing to support real-time transaction processing. A research on indexing for fast transaction support focuses on reducing cache misses or reducing memory access latency when cache misses happen. In the paper, we propose a prefetch method for tree index structures to reduce memory access latency. We present a prefetch-efficient pCST-tree and show superiority of the proposed tree by experiments.
PDF

Adapting R-Tree to Fuzzy Indexing (R-Tree를 이용한 퍼지 인덱스)

Min, Kyoung-In;Shin, Yae-Ho;Kim, Hong-Ki
- Proceedings of the Korea Information Processing Society Conference
- /
- 2001.10a
- /
- pp.51-54
- /
- 2001
퍼지 데이터의 일반적 특성인 불명확한 경계의 문제는 항상 명확한 데이터만을 전제로 데이터 관리를 할 수 있는 기존의 데이터베이스 시스템에서는 이를 효과적으로 저장 관리할 수 없다는 것이다. 실세계에 존재하는 많은 현상들은 항상 명확한 값들로 귀결되지 않고 불명확한 상태로 존재하는 경우가 상당하다. 따라서 데이터베이스 시스템 내에서 이와 같이 불명확한 상태를 반영하기 위한 노력의 일환으로 퍼지 데이터에 대한 표현 및 저장 관리 기법에 대한 연구가 다수 수행되었다. 그러나 기존 연구들은 주로 데이터의 상태변화가 거의 없는 정적 환경에 적합한 뿐 값의 갱신이 빈번히 발생하는 동적 환경에는 적합하지 않은 문제가 있다. 이에 본 논문에서는 데이터 갱신이 빈번히 발생하는 동적 환경하에서 경계가 불명확한 퍼지 데이터의 관리를 효과적으로 수행하도록 하기 위한 방안으로서 R-Tree를 이용한 퍼지 데이터 색인 방법을 제안한다.
PDF

Indexing of Moving Objects Based on Uncertainty for Telematics (텔레매틱스를 위한 불확실성 기반의 이동체 색인)

진희규;김동현;임덕성;조대수;홍봉희
- Proceedings of the Korean Information Science Society Conference
- /
- 2004.04b
- /
- pp.100-102
- /
- 2004
속도와 방향이 바뀔 때마다 이동체의 위치를 보고하는 TPR-tree는 이동체의 위치를 예측하는 오차가 적다. 그러나 긴 시간 간격으로 이동체의 위치를 보고하면 위치 예측의 불확실성이 높아져서 위치 예측의 오차값이 증가한다. 불화실성이 높은 이동체를 TPR-tree에 적용할 때 이동체의 위치 정보를 갱신하기 위한 색인 검색 비용이 증가하고, 질의 결과의 정확도가 낮아지는 문제가 발생한다. 이 논문에서는 긴 시간 간격으로 이동체 위치를 보고할 때 발생하는 이동체 위치의 불확실성을 고려하기 위해서 불확실성 영역(uncertainty region)을 이용한 확장 TPR-tree를 제시한다. 불확실성이 높은 이동체의 위치 데이터를 처리하기 위해서 이동체의 이동 가능한 영역을 위치 예측의 오차 값을 이용하여 계산한 불확실성 영역을 설정하고, 검색을 위하여 노드외 BR을 계산할 때 불확실성 영역을 이용하여 BR을 확장한다.
PDF

Search Result 211, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)