• Title/Summary/Keyword: efficient query processing

Search Result 473, Processing Time 0.025 seconds

A Space-Efficient Inverted Index Technique using Data Rearrangement for String Similarity Searches (유사도 검색을 위한 데이터 재배열을 이용한 공간 효율적인 역 색인 기법)

  • Im, Manu;Kim, Jongik
    • Journal of KIISE
    • /
    • v.42 no.10
    • /
    • pp.1247-1253
    • /
    • 2015
  • An inverted index structure is widely used for efficient string similarity search. One of the main requirements of similarity search is a fast response time; to this end, most techniques use an in-memory index structure. Since the size of an inverted index structure usually very large, however, it is not practical to assume that an index structure will fit into the main memory. To alleviate this problem, we propose a novel technique that reduces the size of an inverted index. In order to reduce the size of an index, the proposed technique rearranges data strings so that the data strings containing the same q-grams can be placed close to one other. Then, the technique encodes those multiple strings into a range. Through an experimental study using real data sets, we show that our technique significantly reduces the size of an inverted index without sacrificing query processing time.

An Efficient Database Design Method for Mobile Multimedia Services on Home Network Systems (홈 네트워크 시스템 상에서 모바일 멀티미디어 서비스를 위한 효과적인 데이타베이스 설계 방안)

  • Song, Hye-Ju;Park, Young-Ho;Kim, Jung-Tae;Paik, Eui-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.14D no.6
    • /
    • pp.615-622
    • /
    • 2007
  • Recently, users who want to be provided motile devices, such as PDP, PMP, and IPTV connected wireless internet, with multimedia contents are increasing due to an influence of multimedia contents. In the paper, we propose an efficient database design method for managing mobile multimedia services on home network systems. For this, we build relations using attributes required while providing multimedia services, and then design a database. Specially, we propose a database design method based on normalization theory to eliminate redundancies and update anomalies caused by a non trivial multi valued dependency in relations. In the experiments, we compare and analyze occurrence frequencies of data redundancies and update anomalies through query executions on the relation decomposed into normal forms. The results reveal that our database design is failrly effective.

Parallel Spatial Join Method Using Efficient Spatial Relation Partition In Distributed Spatial Database Systems (분산 공간 DBMS에서의 효율적인 공간 릴레이션 분할 기법을 이용한 병렬 공간 죠인 기법)

  • Ko, Ju-Il;Lee, Hwan-Jae;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.4 no.1 s.7
    • /
    • pp.39-46
    • /
    • 2002
  • In distributed spatial database systems, users nay issue a query that joins two relations stored at different sites. The sheer volume and complexity of spatial data bring out expensive CPU and I/O costs during the spatial join processing. This paper shows a new spatial join method which joins two spatial relation in a parallel way. Firstly, the initial join operation is divided into two distinct ones by partitioning one of two participating relations based on the region. This two join operations are assigned to each sites and executed simultaneously. Finally, each intermediate result sets from the two join operations are merged to an ultimate result set. This method reduces the number of spatial objects participating in the spatial operations. It also reduces the scope and the number of scanning spatial indices. And it does not materialize the temporary results by implementing the join algebra operators using the iterator. The performance test shows that this join method can lead to efficient use in terms of buffer and disk by narrowing down the joining region and decreasing the number of spatial objects.

  • PDF

Physical Database Design for DFT-Based Multidimensional Indexes in Time-Series Databases (시계열 데이터베이스에서 DFT-기반 다차원 인덱스를 위한 물리적 데이터베이스 설계)

  • Kim, Sang-Wook;Kim, Jin-Ho;Han, Byung-ll
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1505-1514
    • /
    • 2004
  • Sequence matching in time-series databases is an operation that finds the data sequences whose changing patterns are similar to that of a query sequence. Typically, sequence matching hires a multi-dimensional index for its efficient processing. In order to alleviate the dimensionality curse problem of the multi-dimensional index in high-dimensional cases, the previous methods for sequence matching apply the Discrete Fourier Transform(DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes of the multi-dimensional index. This paper first points out the problems in such simple methods taking the firs two or three coefficients, and proposes a novel solution to construct the optimal multi -dimensional index. The proposed method analyzes the characteristics of a target database, and identifies the organizing attributes having the best discrimination power based on the analysis. It also determines the optimal number of organizing attributes for efficient sequence matching by using a cost model. To show the effectiveness of the proposed method, we perform a series of experiments. The results show that the Proposed method outperforms the previous ones significantly.

  • PDF

A Cyclic Sliced Partitioning Method for Packing High-dimensional Data (고차원 데이타 패킹을 위한 주기적 편중 분할 방법)

  • 김태완;이기준
    • Journal of KIISE:Databases
    • /
    • v.31 no.2
    • /
    • pp.122-131
    • /
    • 2004
  • Traditional works on indexing have been suggested for low dimensional data under dynamic environments. But recent database applications require efficient processing of huge sire of high dimensional data under static environments. Thus many indexing strategies suggested especially in partitioning ones do not adapt to these new environments. In our study, we point out these facts and propose a new partitioning strategy, which complies with new applications' requirements and is derived from analysis. As a preliminary step to propose our method, we apply a packing technique on the one hand and exploit observations on the Minkowski-sum cost model on the other, under uniform data distribution. Observations predict that unbalanced partitioning strategy may be more query-efficient than balanced partitioning strategy for high dimensional data. Thus we propose our method, called CSP (Cyclic Spliced Partitioning method). Analysis on this method explicitly suggests metrics on how to partition high dimensional data. By the cost model, simulations, and experiments, we show excellent performance of our method over balanced strategy. By experimental studies on other indices and packing methods, we also show the superiority of our method.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Location Management System using CDMA Communications of Telematics Terminals (텔레매틱스 단말기의 CDMA 통신을 이용한 위치 관리 시스템)

  • Kim Jin-Deog;Choi Jin-Oh;Moon Sang-Ho;Lee Sang-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1843-1850
    • /
    • 2004
  • If the location information of a great number of cars kept for business with telematics terminals is acquired and managed efficiently, this information forms the foundation for controlling cars and traffic flows. The studies on the pure spatial indices have focused on the efficient retrievals. However, the acquisition and management of the terminal location of moving objects are more important than the efficiency of the query processing in the moving object databases. Therefore, it will be need to adopt parallel processing system for the moving object databases which should maintain the object's current location as precise as possible. This paper proposes a location management system using CDMA communications of telematics terminals. More precisely, we propose a architecture of spatial indexing mobile objects using multiple processors, and also newly propose a method of splitting buckets using the properties of moving objects in order to minimize the number of database updates. We also propose a acquisition method for gathering the location information of moving objects and passing the information of the bucket extents in order to reduce the amount of passed messages between processors.

Efficient Processing method of OLAP Range-Sum Queries in a dynamic warehouse environment (다이나믹 데이터 웨어하우스 환경에서 OLAP 영역-합 질의의 효율적인 처리 방법)

  • Chun, Seok-Ju;Lee, Ju-Hong
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.427-438
    • /
    • 2003
  • In a data warehouse, users typically search for trends, patterns, or unusual data behaviors by issuing queries interactively. The OLAP range-sum query is widely used in finding trends and in discovering relationships among attributes in the data warehouse. In a recent environment of enterprises, data elements in a data cube are frequently changed. The problem is that the cost of updating a prefix sum cube is very high. In this paper, we propose a novel algorithm which reduces the update cost significantly by an index structure called the Δ-tree. Also, we propose a hybrid method to provide either approximate or precise results to reduce the overall cost of queries. It is highly beneficial for various applications that need quick approximate answers rather than time consuming accurate ones, such as decision support systems. An extensive experiment shows that our method performs very efficiently on diverse dimensionalities, compared to other methods.

A Bitmap Index for Chunk-Based MOLAP Cubes (청크 기반 MOLAP 큐브를 위한 비트맵 인덱스)

  • Lim, Yoon-Sun;Kim, Myung
    • Journal of KIISE:Databases
    • /
    • v.30 no.3
    • /
    • pp.225-236
    • /
    • 2003
  • MOLAP systems store data in a multidimensional away called a 'cube' and access them using way indexes. When a cube is placed into disk, it can be Partitioned into a set of chunks of the same side length. Such a cube storage scheme is called the chunk-based MOLAP cube storage scheme. It gives data clustering effect so that all the dimensions are guaranteed to get a fair chance in terms of the query processing speed. In order to achieve high space utilization, sparse chunks are further compressed. Due to data compression, the relative position of chunks cannot be obtained in constant time without using indexes. In this paper, we propose a bitmap index for chunk-based MOLAP cubes. The index can be constructed along with the corresponding cube generation. The relative position of chunks is retained in the index so that chunk retrieval can be done in constant time. We placed in an index block as many chunks as possible so that the number of index searches is minimized for OLAP operations such as range queries. We showed the proposed index is efficient by comparing it with multidimensional indexes such as UB-tree and grid file in terms of time and space.

The XP-table: Runtime-efficient Region-based Structure for Collective Evaluation of Multiple Continuous XPath Queries (The XP-table: 다중 연속 XPath 질의의 집단 처리를 위한 실행시간 효율적인 영역 기반 구조체)

  • Lee, Hyun-Ho;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.35 no.4
    • /
    • pp.307-318
    • /
    • 2008
  • One of the primary issues confronting XML message brokers is the difficulty associated with processing a large set of continuous XPath queries over incoming XML seams. This paper proposes a novel system designed to present an effective solution to this problem. The proposed system transforms multiple XPath queries before their run-time into a new region-based data structure, called an XP-table, by sharing their common constraints. An XP-table is matched with a stream relation (SR) transformed from a target XML stream by a SAX parser. This arrangement is intended to minimize the runtime workload of continuous query processing. Also, system performance is estimated and verified through a variety of experiments, including comparisons with previous approaches such as YFilter and LazyDFA. The proposed system is practically linear- scalable and stable for evaluating a set of XPath queries in a continuous and timely fashion.