Search | Korea Science

A Data Mining Approach for Selecting Bitmap Join Indices

Bellatreche, Ladjel;Missaoui, Rokia;Necir, Hamid;Drias, Habiba
- Journal of Computing Science and Engineering
- /
- v.1 no.2
- /
- pp.177-194
- /
- 2007
Index selection is one of the most important decisions to take in the physical design of relational data warehouses. Indices reduce significantly the cost of processing complex OLAP queries, but require storage cost and induce maintenance overhead. Two main types of indices are available: mono-attribute indices (e.g., B-tree, bitmap, hash, etc.) and multi-attribute indices (join indices, bitmap join indices). To optimize star join queries characterized by joins between a large fact table and multiple dimension tables and selections on dimension tables, bitmap join indices are well adapted. They require less storage cost due to their binary representation. However, selecting these indices is a difficult task due to the exponential number of candidate attributes to be indexed. Most of approaches for index selection follow two main steps: (1) pruning the search space (i.e., reducing the number of candidate attributes) and (2) selecting indices using the pruned search space. In this paper, we first propose a data mining driven approach to prune the search space of bitmap join index selection problem. As opposed to an existing our technique that only uses frequency of attributes in queries as a pruning metric, our technique uses not only frequencies, but also other parameters such as the size of dimension tables involved in the indexing process, size of each dimension tuple, and page size on disk. We then define a greedy algorithm to select bitmap join indices that minimize processing cost and verify storage constraint. Finally, in order to evaluate the efficiency of our approach, we compare it with some existing techniques.
https://doi.org/10.5626/JCSE.2007.1.2.177 인용 PDF

Adaptive Image Content-Based Retrieval Techniques for Multiple Queries (다중 질의를 위한 적응적 영상 내용 기반 검색 기법)

Hong Jong-Sun;Kang Dae-Seong
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.3 s.303
- /
- pp.73-80
- /
- 2005
Recently there have been many efforts to support searching and browsing based on the visual content of image and multimedia data. Most existing approaches to content-based image retrieval rely on query by example or user based low-level features such as color, shape, texture. But these methods of query are not easy to use and restrict. In this paper we propose a method for automatic color object extraction and labelling to support multiple queries of content-based image retrieval system. These approaches simplify the regions within images using single colorizing algorithm and extract color object using proposed Color and Spatial based Binary tree map(CSB tree map). And by searching over a large of number of processed regions, a index for the database is created by using proposed labelling method. This allows very fast indexing of the image by color contents of the images and spatial attributes. Futhermore, information about the labelled regions, such as the color set, size, and location, enables variable multiple queries that combine both color content and spatial relationships of regions. We proved our proposed system to be high performance through experiment comparable with another algorithm using 'Washington' image database.
PDF KSCI

Use of Graph Database for the Integration of Heterogeneous Biological Data

Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
- Genomics & Informatics
- /
- v.15 no.1
- /
- pp.19-27
- /
- 2017
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
https://doi.org/10.5808/GI.2017.15.1.19 인용 PDF KSCI

Multi-query Indexing Technique for Efficient Query Processing on Stream Data in Sensor Networks (센서 네트워크에서 스트림 데이터 질의의 효율적인 처리를 위한 다중 질의 색인 기법)

Lee, Min-Soo;Kim, Yearn-Jeong;Yoon, Hye-Jung
- Journal of Korea Multimedia Society
- /
- v.10 no.11
- /
- pp.1367-1383
- /
- 2007
A sensor network consists of a network of sensors that can perform computation and also communicate with each other through wireless communication. Some important characteristics of sensor networks are that the network should be self administered and the power efficiency should be greatly considered due to the fact that it uses battery power. In sensor networks, when large amounts of various stream data is produced and multiple queries need to be processed simultaneously, the power efficiency should be maximized. This work proposes a technique to create an index on multiple monitoring queries so that the multi-query processing performance could be increased and the memory and power could be efficiently used. The proposed SMILE tree modifies and combines the ideas of spatial indexing techniques such as k-d trees and R+-trees. The k-d tree can divide the dimensions at each level, while the R+-tree improves the R-tree by dividing the space into a hierarchical manner and reduces the overlapping areas. By applying the SMILE tree on multiple queries and using it on stream data in sensor networks, the response time for finding an indexed query takes in some cases 50% of the time taken for a linear search to find the query.
PDF

MD-TIX: Multidimensional Type Inheritance Indexing for Efficient Execution of XML Queries (MD-TIX: XML 질의의 효율적 처리를 위한 다차원 타입상속 색인기법)

Lee, Jong-Hak
- Journal of Korea Multimedia Society
- /
- v.10 no.9
- /
- pp.1093-1105
- /
- 2007
This paper presents a multidimensional type inheritance indexing technique (MD-TIX) for XML databases. We use a multidimensional file organization as the index structure. In conventional XML database indexing techniques using one-dimensional index structures, they do not efficiently handle complex queries involving both nested elements and type inheritance hierarchies. We extend a two-dimensional type hierarchy indexing technique(2D-THI) for indexing the nested elements of XML databases. 2D-THI is an indexing scheme that deals with the problem of clustering elements in a two-dimensional domain space consisting of the key value domain and the type identifier domain for indexing a simple element in a type hierarchy. In our extended scheme, we handle the clustering of the index entries in a multidimensional domain space consisting of a key value domain and multiple type identifier domains that include one type identifier domain per type hierarchy on a path expression. This scheme efficiently supports queries that involve search conditions on the nested element represented by an extended path expression. An extended path expression is a path expression in which every type hierarchy on a path can be substituted by an individual type or a subtype hierarchy.
PDF

An Interdependent Data Allocation Scheme Using Square Root Rule of Data Access Probability (데이터 액세스 확률의 제곱근 법칙을 이용한 상호 관련 데이터 할당 기법)

Kwon, Hyeokmin
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.15 no.5
- /
- pp.75-84
- /
- 2015
A data allocation technique is essential to improve the performance of data broadcast systems. This paper explores the issues for allocating data items on broadcast channels to process multiple-data queries in the environment where query profiles and query request rates are given, and proposes a new data allocation scheme named IDAS. The proposed scheme employs the strategy that the broadcast frequency of each data is determined by the square root value of its relative access probability. IDAS could enhance the performance of query response time since it can process queries of high request rate fast and show a resonable degree of query data adjacency. Simulation is performed to evaluate the performance of the proposed scheme. The simulation results show that IDAS outperforms other schemes in terms of the average response time.
https://doi.org/10.7236/JIIBC.2015.15.5.75 인용 PDF KSCI

Efficiently Processing Skyline Query on Multi-Instance Data

Chiu, Shu-I;Hsu, Kuo-Wei
- Journal of Information Processing Systems
- /
- v.13 no.5
- /
- pp.1277-1298
- /
- 2017
Related to the maximum vector problem, a skyline query is to discover dominating tuples from a set of tuples, where each defines an object (such as a hotel) in several dimensions (such as the price and the distance to the beach). A tuple, an instance of an object, dominates another tuple if it is equally good or better in all dimensions and better in at least one dimension. Traditionally, skyline queries are defined upon single-instance data or upon objects each of which is associated with an instance. However, in some cases, an object is not associated with a single instance but rather by multiple instances. For example, on a review website, many users assign scores to a product or a service, and a user's score is an instance of the object representing the product or the service. Such data is an example of multi-instance data. Unlike most (if not all) others considering the traditional setting, we consider skyline queries defined upon multi-instance data. We define the dominance calculation and propose an algorithm to reduce its computational cost. We use synthetic and real data to evaluate the proposed methods, and the results demonstrate their utility.
https://doi.org/10.3745/JIPS.04.0049 인용 PDF KSCI

Efficient Processing of Continuous Join Queries between a Data Stream and Multiple Relations for Real-Time Analysis of E-Commerce Data (전자상거래 데이터의 실시간 분석을 위한 데이터 스트림과 다수 릴레이션 간의 효율적인 연속 조인 처리 기법)

Kim, Haeri;Lee, Ki Yong
- The Journal of Society for e-Business Studies
- /
- v.18 no.3
- /
- pp.159-175
- /
- 2013
Recently, as real-time availability of e-commerce data becomes possible, the requirement of real-time analysis of e-commerce increases significantly. In the real-time analysis of e-commerce data, it is very important to efficiently process continuous join queries between an e-commerce data stream and disk-based large relations. In this paper, we propose an efficient method for processing a continuous join query between an e-commerce data stream and multiple disk-based relations. The proposed method improves the service rate significantly, while reducing the amount of required memory substantially. Through analysis and various experiments, we show the efficiency of the proposed method compared with the previous one in terms of service rate and memory usage.
https://doi.org/10.7838/jsebs.2013.18.3.159 인용 PDF KSCI

A Query-Based Data Allocation Scheme for Multiple Broadcast-Channel Environments (다중 방송 채널 환경을 위한 질의 기반 데이터 할당 기법)

Kwon, Hyeokmin
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.16 no.6
- /
- pp.165-175
- /
- 2016
A data allocation technique is essential to improve the performance of data broadcast systems. This paper explores the issues for allocating data items on broadcast channels to process multiple-data queries in the environment where query profiles and query request rates are given, and proposes a new data allocation scheme named QBDA. The proposed scheme allows the query with higher request rate to have higher priority to schedule its data items and introduces the concept of marking to reduce data conflicts. Simulation is performed to evaluate the performance of QBDA. The simulation results show that the proposed scheme outperforms other schemes in terms of the average response time since it can process queries with high request rate fast and show a very desirable characteristics in the aspects of query data adjacency and data conflict probability.
https://doi.org/10.7236/JIIBC.2016.16.6.165 인용 PDF KSCI

Distributed Continuous Query Processing Scheme for RFID Data Stream (RFID 데이터 스트림에 대한 분산 연속질의 처리 기법)

Ahn, Sung-Woo;Hong, Bong-Hee;Jung, Dong-Gyu
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.46 no.4
- /
- pp.1-12
- /
- 2009
An RFID application needs to collect product's information scattered over the RFID network efficiently according to the globalization of RFID applied enterprises. To be informed of the stock status of products promptly in the supply chain network, especially, it is necessary to support queries that retrieve statistical information of tagged products. Since existing RFID network does not provide these kinds of queries, however, an application should request a query to several RFID middleware systems and analyze collected data directly. This process makes an application do a heavy computation for retrieving statistical information. To solve this problem, we define a new Distributed Continuous Query that finds information of tagged products from the global RFID network and provides statistical information to RFID applications. We also propose a Distributed Continuous Query System to process the distributed continuous query efficiently. To find out the movement of products via multiple RFID systems in real time, our proposed system uses Pedigree which represents trade information of items. Our system can also reduce the cost of query processing for removing duplicated data from multiple middleware systems by using Pedigree.
PDF KSCI

Search Result 124, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)