Search | Korea Science

Efficient Processing of Multiple Group-by Queries in MapReduce for Big Data Analysis (맵리듀스에서 빅데이터 분석을 위한 다중 Group-by 질의의 효율적인 처리 기법)

Park, Eunju;Park, Sojeong;Oh, Sohyun;Choi, Hyejin;Lee, Ki Yong;Shim, Junho
- KIISE Transactions on Computing Practices
- /
- v.21 no.5
- /
- pp.387-392
- /
- 2015
MapReduce is a framework used to process large data sets in parallel on a large cluster. A group-by query is a query that partitions the input data into groups based on the values of the specified attributes, and then evaluates the value of the specified aggregate function for each group. In this paper, we propose an efficient method for processing multiple group-by queries using MapReduce. Instead of computing each group-by query independently, the proposed method computes multiple group-by queries in stages with one or more MapReduce jobs in order to reduce the total execution cost. We compared the performance of this method with the performance of a less sophisticated method that computes each group-by query independently. This comparison showed that the proposed method offers better performance in terms of execution time.
https://doi.org/10.5626/KTCP.2015.21.5.387 인용 KSCI

Development of a CUBRID-Based Distributed Parallel Query Processing System

Kim, Hyeong-Il;Yang, HyeonSik;Yoon, Min;Chang, Jae-Woo
- Journal of Information Processing Systems
- /
- v.13 no.3
- /
- pp.518-532
- /
- 2017
Due to the rapid growth of the amount of data, research on bigdata processing has been highlighted. For bigdata processing, CUBRID Shard is able to support query processing in parallel way by dividing the database into a number of CUBRID servers. However, CUBRID Shard can answer a user's query only when the query is required to gain accesses to a single CUBRID server, instead of multiple ones. To solve the problem, in this paper we propose a CUBRID based distributed parallel query processing system that can answer a user's query in parallel and distributed manner. Finally, through the performance evaluation, we show that our proposed system provides 2-3 times better performance on query processing time than the existing CUBRID Shard.
https://doi.org/10.3745/JIPS.01.0016 인용 PDF KSCI

Processing of Multiple Regular Path Expressions using PID (경로 식별자를 이용한 다중 정규경로 처리기법)

Kim, Jong-Ik;Jeong, Tae-Seon;Kim, Hyeong-Ju
- Journal of KIISE:Databases
- /
- v.29 no.4
- /
- pp.274-284
- /
- 2002
Queries on XML are based on paths in the data graph, which is represented as an edge labeled graph model. All proposed query languages for XML express queries using regular expressions to traverse arbitrary paths in the data graph. A meaningful query usually has several regular path expressions in it, but much of recent research is more concerned with optimizing a single path expression. In this paper, we present an efficient technique to process multiple path expressions in a query. We developed a data structure named as the path identifier(PID) to identify whether two given nodes lie on the fame path in the data graph or not, and utilized the PID for efficient processing of multiple path expressions. We implement our technique and present preliminary performance results.
PDF KSCI

Shredding XML Documents into Relations using Structural Redundancy (구조적 중복을 사용한 XML 문서의 릴레이션으로의 분할저장)

Kim Jaehoon;Park Seog
- Journal of KIISE:Databases
- /
- v.32 no.2
- /
- pp.177-192
- /
- 2005
In this paper, we introduce a structural redundancy method. It reduces the query processing cost incurred when reconfiguring an XML document from divided XML data in shredding XML documents into relations. The fundamental idea is that query performance can be enhanced by analyzing query patterns and replicating data essential for the query performance. For the practical and effective structural redundancy, we analyzed three types of ID, VALUE, and SUBTREE replication. In addition, if given XML data and queries are very large and complex, it can be very difficult to search optimal redundancy set. Therefore, a heuristic search method is introduced in this paper. Finally, XML query processing cost arising by employing the structural redundancy, and the efficiency of proposed search method arc analyzed experimentally It is manifest that XML read query is performed more quick]y but XML update query is performed more slowly due to the additional update consistency cost for replicas. However, experimental results showed that in-place ID replication is useful even in having excessive update cost. It was also observed that multiple-place SUBTREE replication can enhance read query performance remarkably if only update cost is not excessive.
PDF KSCI

An R-tree Index Scheduling Method for kNN Query Processing in Multiple Wireless Broadcast Channels (다중 무선 방송채널에서 kNN 질의 처리를 위한 R-tree 인덱스 스케줄링 기법)

Jung, Eui-Jun;Jung, Sung-Won
- Journal of KIISE:Databases
- /
- v.37 no.2
- /
- pp.121-126
- /
- 2010
This paper proposes an efficient index scheduling technique for kNN query processing in multiple wireless broadcast channel environment. Previous works have to wait for the next cycle if the required child nodes of the same parent node are allocated in the same time slot on multiple channel. Our proposed method computes the access frequencies of each node of R tree at the server before the generation of the R-tree index broadcast schedule. If they have high frequencies, we allocate them serially on the single channel. If they have low frequencies, we allocate them in parallel on the multiple channels. As a result, we can reduce the index node access conflicts and the long broadcast cycle. The performance evaluation shows that our scheme gives the better performance than the existing schemes.
PDF KSCI

CONTINUOUS QUERY PROCESSING IN A DATA STREAM ENVIRONMENT

Lee, Dong-Gyu;Lee, Bong-Jae;Ryu, Keun-Ho
- Proceedings of the KSRS Conference
- /
- 2007.10a
- /
- pp.3-5
- /
- 2007
Many continuous queries are important to be process efficiently in a data stream environment. It is applied a query index technique that takes linear performance irrespective of the number and width of intervals for processing many continuous queries. Previous researches are not able to support the dynamic insertion and deletion to arrange intervals for constructing an index previously. It shows that the insertion and search performance is slowed by the number and width of interval inserted. Many intervals have to be inserted and searched linearly in a data stream environment. Therefore, we propose Hashed Multiple Lists in order to process continuous queries linearly. Proposed technique shows fast linear search performance. It can be utilized the systems applying a sensor network, and preprocessing technique of spatiotemporal data mining.
PDF

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
- Proceedings of the KSRS Conference
- /
- 2007.10a
- /
- pp.11-13
- /
- 2007
There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.
PDF

A Query Pruning Technique for Optimizing Regular Path Expressions in Semistructured Databases (준구조적 데이타베이스에서의 정규경로표현 최적화를 위한 질의전지 기법)

Park, Chang-Won;Jeong, Jin-Wan
- Journal of KIISE:Databases
- /
- v.29 no.3
- /
- pp.217-229
- /
- 2002
Regular path expressions are primary elements for formulating queries over the semistructured data that does not assume the conventional schemas. In addition, the query pruning is an important optimization technique to avoid useless traversals in evaluating regular path expressions. However, the existing query pruning often fails to fully optimize multiple regular path expressions, and the previous methods that post-process the result of the existing query pruning must check exponential combinations of sub-results. In this paper, we present a new query pruning technique that consists of the preprocessing phase and the pruning phase. Our two-phase query pruning is affective in optimizing multiple regular path expressions, and is more scalable than the previous methods in that it never check the exponential combinations of sub-results.
PDF KSCI

Database Segment Distributing Algorithm using Graph Theory (그래프이론에 의한 데이터베이스 세그먼트 분산 알고리즘)

Kim, Joong Soo
- Journal of Korea Multimedia Society
- /
- v.22 no.2
- /
- pp.225-230
- /
- 2019
There are several methods which efficiencies of database are uprise. One of the well-known methods is that segments of database satisfying a query was rapidly accessed and processed. So if it is possible to search completely parallel multiple database segment types which satisfy a query, the response time of the query will be reduced. The matter of obtaining CPS(Completely Parallel Searchable) distribution without redundancy can be viewed as graph theoretic problem, and the operation of ring sum on the graph is used for CPS. In this paper, the parallel algorithm is proposed.
https://doi.org/10.9717/kmms.2019.22.2.225 인용 PDF KSCI HTML

Spatial Selectivity Estimation for Intersection region Information Using Cumulative Density Histogram

Kim byung Cheol;Moon Kyung Do;Ryu Keun Ho
- Proceedings of the KSRS Conference
- /
- 2004.10a
- /
- pp.721-725
- /
- 2004
Multiple-count problem is occurred when rectangle objects span across several buckets. The Cumulative Density (CD) histogram is a technique which solves multiple-count problem by keeping four sub-histograms corresponding to the four points of rectangle. Although it provides exact results with constant response time, there is still a considerable issue. Since it is based on a query window which aligns with a given grid, a number of errors may be occurred when it is applied to real applications. In this paper, we proposed selectivity estimation techniques using the generalized cumulative density histogram based on two probabilistic models: (1) probabilistic model which considers the query window area ratio, (2) probabilistic model which considers intersection area between a given grid and objects. In order to evaluate the proposed methods, we experimented with real dataset and experimental results showed that the proposed technique was superior to the existing selectivity estimation techniques. The proposed techniques can be used to accurately quantify the selectivity of the spatial range query on rectangle objects.
PDF

Search Result 253, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)