• Title/Summary/Keyword: efficient query processing

Search Result 473, Processing Time 0.027 seconds

A Practical Approximate Sub-Sequence Search Method for DNA Sequence Databases (DNA 시퀀스 데이타베이스를 위한 실용적인 유사 서브 시퀀스 검색 기법)

  • Won, Jung-Im;Hong, Sang-Kyoon;Yoon, Jee-Hee;Park, Sang-Hyun;Kim, Sang-Wook
    • Journal of KIISE:Databases
    • /
    • v.34 no.2
    • /
    • pp.119-132
    • /
    • 2007
  • In molecular biology, approximate subsequence search is one of the most important operations. In this paper, we propose an accurate and efficient method for approximate subsequence search in large DNA databases. The proposed method basically adopts a binary trie as its primary structure and stores all the window subsequences extracted from a DNA sequence. For approximate subsequence search, it traverses the binary trie in a breadth-first fashion and retrieves all the matched subsequences from the traversed path within the trie by a dynamic programming technique. However, the proposed method stores only window subsequences of the pre-determined length, and thus suffers from large post-processing time in case of long query sequences. To overcome this problem, we divide a query sequence into shorter pieces, perform searching for those subsequences, and then merge their results. To verify the superiority of the proposed method, we conducted performance evaluation via a series of experiments. The results reveal that the proposed method, which requires smaller storage space, achieves 4 to 17 times improvement in performance over the suffix tree based method. Even when the length of a query sequence is large, our method is more than an order of magnitude faster than the suffix tree based method and the Smith-Waterman algorithm.

Reordering Scheme of Location Identifiers for Indexing RFID Tags (RFID 태그의 색인을 위한 위치 식별자 재순서 기법)

  • Ahn, Sung-Woo;Hong, Bong-Hee
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.198-214
    • /
    • 2009
  • Trajectories of RFID tags can be modeled as a line, denoted by tag interval, captured by an RFID reader and indexed in a three-dimensional domain, with the axes being the tag identifier (TID), the location identifier (LID), and the time (TIME). Distribution of tag intervals in the domain space is an important factor for efficient processing of a query for tracing tags and is changed according to arranging coordinates of each domain. Particularly, the arrangement of LIDs in the domain has an effect on the performance of queries retrieving the traces of tags as times goes by because it provides the location information of tags. Therefore, it is necessary to determine the optimal ordering of LIDs in order to perform queries efficiently for retrieving tag intervals from the index. To do this, we propose LID proximity for reordering previously assigned LIDs to new LIDs and define the LID proximity function for storing tag intervals accessed together closely in index nodes when a query is processed. To determine the sequence of LIDs in the domain, we also propose a reordering scheme of LIDs based on LID proximity. Our experiments show that the proposed reordering scheme considerably improves the performance of Queries for tracing tag locations comparing with the previous method of assigning LIDs.

A Node Relocation Strategy of Trajectory Indexes for Efficient Processing of Spatiotemporal Range Queries (효율적인 시공간 영역 질의 처리를 위한 궤적 색인의 노드 재배치 전략)

  • Lim Duksung;Cho Daesoo;Hong Bonghee
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.664-674
    • /
    • 2004
  • The trajectory preservation property that stores only one trajectory in a leaf node is the most important feature of an index structure, such as the TB-tree for retrieving object's moving paths in the spatio-temporal space. It performs well in trajectory-related queries such as navigational queries and combined queries. But, the MBR of non-leaf nodes in the TB-tree have large amounts of dead space because trajectory preservation is achieved at the sacrifice of the spatial locality of trajectories. As dead space increases, the overlap between nodes also increases, and, thus, the classical range query cost increases. We present a new split policy and entry relocation policies, which have no deterioration of the performance for trajectory-related queries, for improving the performance of range queries. To maximally reduce the dead space of a non-leaf node's MBR, the Maximal Area Reduction (MAR) policy is used as a split policy for non-leaf nodes. The entry relocation policy induces entries in non-leaf nodes to exchange each other for the purpose of reducing dead spaces in these nodes. We propose two algorithms for the entry relocation policy, and evaluate the performance studies of new algorithms comparing to the TB-tree under a varying set of spatio-temporal queries.

Linear Resource Sharing Method for Query Optimization of Sliding Window Aggregates in Multiple Continuous Queries (다중 연속질의에서 슬라이딩 윈도우 집계질의 최적화를 위한 선형 자원공유 기법)

  • Baek, Seong-Ha;You, Byeong-Seob;Cho, Sook-Kyoung;Bae, Hae-Young
    • Journal of KIISE:Databases
    • /
    • v.33 no.6
    • /
    • pp.563-577
    • /
    • 2006
  • A stream processor uses resource sharing method for efficient of limited resource in multiple continuous queries. The previous methods process aggregate queries to consist the level structure. So insert operation needs to reconstruct cost of the level structure. Also a search operation needs to search cost of aggregation information in each size of sliding windows. Therefore this paper uses linear structure for optimization of sliding window aggregations. The method comprises of making decision, generation and deletion of panes in sequence. The decision phase determines optimum pane size for holding accurate aggregate information. The generation phase stores aggregate information of data per pane from stream buffer. At the deletion phase, panes are deleted that are no longer used. The proposed method uses resources less than the method where level structures were used as data structures as it uses linear data format. The input cost of aggregate information is saved by calculating only pane size of data though numerous stream data is arrived, and the search cost of aggregate information is also saved by linear searching though those sliding window size is different each other. In experiment, the proposed method has low usage of memory and the speed of query processing is increased.

A Genetic Algorithm for Materialized View Selection in Data Warehouses (데이터웨어하우스에서 유전자 알고리즘을 이용한 구체화된 뷰 선택 기법)

  • Lee, Min-Soo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.325-338
    • /
    • 2004
  • A data warehouse stores information that is collected from multiple, heterogeneous information sources for the purpose of complex querying and analysis. Information in the warehouse is typically stored In the form of materialized views, which represent pre-computed portions of frequently asked queries. One of the most important tasks of designing a warehouse is the selection of materialized views to be maintained in the warehouse. The goal is to select a set of views so that the total query response time over all queries can be minimized while a limited amount of time for maintaining the views is given(maintenance-cost view selection problem). In this paper, we propose an efficient solution to the maintenance-cost view selection problem using a genetic algorithm for computing a near-optimal set of views. Specifically, we explore the maintenance-cost view selection problem in the context of OR view graphs. We show that our approach represents a dramatic improvement in terms of time complexity over existing search-based approaches that use heuristics. Our analysis shows that the algorithm consistently yields a solution that only has an additional 10% of query cost of over the optimal query cost while at the same time exhibits an impressive performance of only a linear increase in execution time. We have implemented a prototype version of our algorithm that is used to evaluate our approach.

인터넷 질의 처리를 위한 웨이블릿 변환에 기반한 통합 요약정보의 관리

  • Joe, Moon-Jeung;Whang, Kyu-Young;Kim, Sang-Wook;Shim, Kyu-Seok
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.702-714
    • /
    • 2001
  • As Internet technology evolves, there is growing need of Internet queries involving multiple information sources. Efficient processing of such queries necessitates the integrated summary data that compactly represents the data distribution of the entire database scattered over many information sources. This paper presents an efficient method of managing the integrated summary data based on the wavelet transform and addresses Internet query processing using the integrated summary data. The simplest method for creating the integrated summary data would be to summarize the integrated data sidtribution obtained by merging the data distributions in multiple information sources. However, this method suffers from the high cost of transmitting storing and merging a large amount of data distribution. To overcome the drawbacks, we propose a new wavelet transform based method that creates the integrated summary data by merging multiple summary data and effective method for optimizing Internet queries using it A wavelet transformed summary data is converted to satisfy conditions for merging. Moreover i the merging process is very simpe owing to the properties of the wavelet transform. we formally derive the upper bound of the error of the wavelet transformed intergrated summary data. Compared with the histogram-based integrated summary data the wavelet transformedintegrated summary data provesto be 1.6~5.5 time more accurate when used for selectivity estimation in experiments. In processing Internet top-N queries involving 56 information sources using the integrated summary data reduces the processing cost to 1/44 of the cost of not using it.

  • PDF

Non Duplicated Extract Method of Heterogeneous Data Sources for Efficient Spatial Data Load in Spatial Data Warehouse (공간 데이터웨어하우스에서 효율적인 공간 데이터 적재를 위한 이기종 데이터 소스의 비중복 추출기법)

  • Lee, Dong-Wook;Baek, Sung-Ha;Kim, Gyoung-Bae;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.2
    • /
    • pp.143-150
    • /
    • 2009
  • Spatial data warehouses are a system managing manufactured data through ETL step with extracted spatial data from spatial DBMS or various data sources. In load period, duplicated spatial data in the same subject are not useful in extracted spatial data dislike aspatial data and waste the storage space by the feature of spatial data. Also, in case of extracting source data on heterogeneous system, as those have different spatial type and schema, the spatial extract method is required for them. Processing a step matching address about extracted spatial data using a standard Geocoding DB, the exiting methods load formal data set. However, the methods cause the comparison operation of extracted data with Geocoding DB, and according to integrate spatial data by subject it has problems which do not consider duplicated data among heterogeneous spatial DBMS. This paper proposes efficient extracting method to integrate update query extracted from heterogeneous source systems in data warehouse constructer. The method eliminates unnecessary extracting operation cost to choose related update queries like insertion or deletion on queries generated from loading to current point. Also, we eliminate and integrate extracted spatial data using update query in source spatial DBMS. The proposed method can reduce wasting storage space caused by duplicate storage and support rapidly analyzing spatial data by loading integrated data per loading point.

  • PDF

An Energy-Efficient and Destination-Sequenced Routing Algorithm by a Sink Node in Wireless Sensor Networks (무선 센서 네트워크에서의 싱크 노드에 의한 에너지 효율적인 목적지-순서적 라우팅 알고리즘)

  • Jung, Sang-Joon;Chung, Youn-Ky
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.10
    • /
    • pp.1347-1355
    • /
    • 2007
  • A sensor network is composed of a large number of tiny devices, scattered and deployed in a specified regions. Each sensing device has processing and wireless communication capabilities, which enable it to gather information from the sensing area and to transfer report messages to a base station. The energy-efficient routing paths are established when the base station requests a query, since each node has several characteristics such as low-power, constrained energy, and limited capacity. The established paths are recovered while minimizing the total transmit energy and maximizing the network lifetime when the paths are broken. In this paper, we propose a routing algorithm that each sensor node reports its adjacent link information to the sink node when a sink node broadcasts a query. The sink node manages the total topology and establishes routing paths. This algorithm has a benefit to find an alternative path by reducing the negotiating messages for establishing paths when the established paths are broken. To reduce the overhead of collection information, each node has a link information before reporting to the sink. Because the node recognizes which nodes are adjacent. The proposed algorithm reduces the number of required messages, because sensor nodes receive and report routing messages for establishment at the beginning of configuring routing paths, since each node keeps topology information to establish a routing path, which is useful to report sensing tasks in monitoring environments.

  • PDF

An Efficient Query Transformation for Multidimensional Data Views on Relational Databases (관계형 데이타베이스에서 다차원 데이타의 뷰를 위한 효율적인 질의 변환)

  • Shin, Sung-Hyun;Kim, Jin-Ho;Moon, Yang-Sae
    • Journal of KIISE:Databases
    • /
    • v.34 no.1
    • /
    • pp.18-34
    • /
    • 2007
  • In order to provide various business analysis methods, OLAP(On-Line Analytical Processing) systems represent their data with multidimensional structures. These multidimensional data are often delivered to users in the horizontal format of tables whose columns are corresponding to values of dimension attributes. Since the horizontal tables nay have a large number of columns, they cannot be stored directly in relational database systems. Furthermore, the tables are likely to have many null values (i.e., sparse tables). In order to manage the horizontal tables efficiently, we can store them as the vertical format of tables which has dimension attribute names as their columns thus transforms the columns of horizontal tables into rows. In this way, every queries for horizontal tables have to be transformed into those for vertical tables. This paper proposed a technique for transforming horizontal table queries into vertical table ones by utilizing not only traditional relational algebraic operators but also the PIVOT operator which recent DBMS versions are providing. For achieving this goal, we designed a relational algebraic expression equivalent to the PIVOT operator and we formally proved their equivalence. Then, we developed a transformation technique for horizontal table queries using the PIVOT operator. We also performed experiments to analyze the performance of the proposed method. From the experimental results, we revealed that the proposed method has better performance than existing methods.

An Index-Based Approach for Subsequence Matching Under Time Warping in Sequence Databases (시퀀스 데이터베이스에서 타임 워핑을 지원하는 효과적인 인덱스 기반 서브시퀀스 매칭)

  • Park, Sang-Hyeon;Kim, Sang-Uk;Jo, Jun-Seo;Lee, Heon-Gil
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.173-184
    • /
    • 2002
  • This paper discuss an index-based subsequence matching that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. In earlier work, Kim et al. suggested an efficient method for whole matching under time warping. This method constructs a multidimensional index on a set of feature vectors, which are invariant to time warping, from data sequences. For filtering at feature space, it also applies a lower-bound function, which consistently underestimates the time warping distance as well as satisfies the triangular inequality. In this paper, we incorporate the prefix-querying approach based on sliding windows into the earlier approach. For indexing, we extract a feature vector from every subsequence inside a sliding window and construct a multidimensional index using a feature vector as indexing attributes. For query processing, we perform a series of index searches using the feature vectors of qualifying query prefixes. Our approach provides effective and scalable subsequence matching even with a large volume of a database. We also prove that our approach does not incur false dismissal. To verify the superiority of our approach, we perform extensive experiments. The results reveal that our approach achieves significant speedup with real-world S&P 500 stock data and with very large synthetic data.