Search | Korea Science

A Clustered Dwarf Structure to Speed up Queries on Data Cubes

Bao, Yubin;Leng, Fangling;Wang, Daling;Yu, Ge
- Journal of Computing Science and Engineering
- /
- v.1 no.2
- /
- pp.195-210
- /
- 2007
Dwarf is a highly compressed structure, which compresses the cube by eliminating the semantic redundancies while computing a data cube. Although it has high compression ratio, Dwarf is slower in querying and more difficult in updating due to its structure characteristics. We all know that the original intention of data cube is to speed up the query performance, so we propose two novel clustering methods for query optimization: the recursion clustering method which clusters the nodes in a recursive manner to speed up point queries and the hierarchical clustering method which clusters the nodes of the same dimension to speed up range queries. To facilitate the implementation, we design a partition strategy and a logical clustering mechanism. Experimental results show our methods can effectively improve the query performance on data cubes, and the recursion clustering method is suitable for both point queries and range queries.
https://doi.org/10.5626/JCSE.2007.1.2.195 인용 PDF

Matrix-based Filtering and Load-balancing Algorithm for Efficient Similarity Join Query Processing in Distributed Computing Environment (분산 컴퓨팅 환경에서 효율적인 유사 조인 질의 처리를 위한 행렬 기반 필터링 및 부하 분산 알고리즘)

Yang, Hyeon-Sik;Jang, Miyoung;Chang, Jae-Woo
- The Journal of the Korea Contents Association
- /
- v.16 no.7
- /
- pp.667-680
- /
- 2016
As distributed computing platforms like Hadoop MapReduce have been developed, it is necessary to perform the conventional query processing techniques, which have been executed in a single computing machine, in distributed computing environments efficiently. Especially, studies on similarity join query processing in distributed computing environments have been done where similarity join means retrieving all data pairs with high similarity between given two data sets. But the existing similarity join query processing schemes for distributed computing environments have a problem of skewed computing load balance between clusters because they consider only the data transmission cost. In this paper, we propose Matrix-based Load-balancing Algorithm for efficient similarity join query processing in distributed computing environment. In order to uniform load balancing of clusters, the proposed algorithm estimates expected computing cost by using matrix and generates partitions based on the estimated cost. In addition, it can reduce computing loads by filtering out data which are not used in query processing in clusters. Finally, it is shown from our performance evaluation that the proposed algorithm is better on query processing performance than the existing one.
https://doi.org/10.5392/JKCA.2016.16.07.667 인용 PDF KSCI

Study on Continuous Nearest Neighbor Query on Trajectory of Moving Objects (이동객체의 궤적에 대한 연속 최근접 질의에 관한 연구)

Jeong, Ji-Mun
- 한국디지털정책학회:학술대회논문집
- /
- 2005.06a
- /
- pp.517-530
- /
- 2005
Researches for NN(nearest neighbor) query which is often used in LBS system, have been worked. However, Conventional NN query processing techniques are usually meaningless in moving object management system for LBS since their results may be invalidated as soon as the query and data objects move. To solve these problems, in this paper we propose a new nearest neighbor query processing technique, called CTNN, which is possible to meet continuous trajectory nearest neighbor query processing. The proposed technique consists of Approximate CTNN technique which has quick response time, and Exact CTNN technique which makes it possible to search accurately nearest neighbor objects. Experimental results using GSTD datasets showed that the Exact CTNN technique has high accuracy, but has a little low performance for response time. They also showed that the Approximate CTNN technique has low accuracy comparing with the Exact CTNN, but has high response time.
PDF

An SVD-Based Approach for Generating High-Dimensional Data and Query Sets (SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성)

김상욱
- The Journal of Information Technology and Database
- /
- v.8 no.2
- /
- pp.91-101
- /
- 2001
Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.
PDF

DISSECTION TECHNIQUE FOR EFFICIENT JOIN OPERATION ON SEMI-STRUCTURED DOCUMENT STREAM

Seo, Dong-Hyeok;Lee, Dong-Gyu;Ryu, Keun-Ho
- Proceedings of the KSRS Conference
- /
- 2007.10a
- /
- pp.11-13
- /
- 2007
There has been much interest in stream query processing. Various index techniques and advanced join techniques have been proposed to efficiently process data stream queries. Previous proposals support rapid and advanced response to the data stream queries. However, the amount of data stream is increasing and the data stream query processing needs more speedup than before. In this paper, we proposed novel query processing techniques for large number of incoming documents stream. We proposed Dissection Technique for efficient query processing in the data stream environment. We focused on the dissection technique in join query processing. Our technique shows efficient operation performance comparing with the other proposal in the data stream. Proposed technique is applied to the sensor network system and XML database.
PDF

Study on Continuous Nearest Neighbor Query on Trajectory of Moving Objects (이동객체의 궤적에 대한 연속 최근접 질의에 관한 연구)

Chung, Ji-Moon
- Journal of Digital Convergence
- /
- v.3 no.1
- /
- pp.149-163
- /
- 2005
Researches for NN(nearest neighbor) query which is often used in LBS system, have been worked. However. Conventional NN query processing techniques are usually meaningless in moving object management system for LBS since their results may be invalidated as soon as the query and data objects move. To solve these problems, in this paper we propose a new nearest neighbor query processing technique, called CTNN, which is possible to meet continuous trajectory nearest neighbor query processing. The proposed technique consists of Approximate CTNN technique which has quick response time, and Exact CTNN technique which makes it possible to search accurately nearest neighbor objects. Experimental results using GSTD datasets shows that the Exact CTNN technique has high accuracy, but has a little low performance for response time. They also shows that the Approximate CTNN technique has low accuracy comparing with the Exact CTNN, but has high response time.
PDF

Evaluating the Performance Quality of Open Source Database Management Systems (오픈소스 DBMS의 성능 품질 평가)

Min, Meekyung
- Journal of Korean Society for Quality Management
- /
- v.45 no.4
- /
- pp.933-942
- /
- 2017
Purpose: The purpose of this paper is to evaluate the performance quality of the open source DBMSs. Performance quality is defined as processing time for Join queries. Query processing time is measured and compared in the most widely used open source DBMSs and commercial DBMS. Methods: By varying the number of tuples of two relations to be joined, the average processing time(seconds) of a Join query in each DBMS was obtained experimentally. ANOVA and Tukey HSD test were used in order to compare the performance quality of DBMSs. Results: There was a significant difference between the performance qualities of the three DBMSs at all experimental levels where the number of tuples was 100, 1,000, 2,000, 10,000, and 50,000. As a result of the Tukey HSD test, two open source DBMSs (MariaDB, MySQL) were classified in the same group only at the tuple level of 100. The commercial DBMS (MS-SQL Server) belonged to another group. At level of more than 1,000 tuples, all three DBMSs belonged to different groups. Conclusion: Within the open source DBMS group, MariaDB showed the better performance quality except for a small number of tuples. Thus the results show that MariaDB can be the alternative to MySQL which is currently most widely used. Between open source DBMS and commercial DBMS groups, MS-SQL Server always shows the best performance quality, but the less number of tuples, the less the difference.
https://doi.org/10.7469/JKSQM.2017.45.4.933 인용 PDF KSCI

Query Processing System for Multi-Dimensional Data in Sensor Networks (센서 네트워크에서 다차원 데이타를 위한 쿼리 처리 시스템)

Kim, Jang-Soo;Kim, Jeong-Joon;Kim, Young-Gon;Lee, Chang-Hoon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.17 no.1
- /
- pp.139-144
- /
- 2017
As technologies related to sensor network are currently emerging and the use of GeoSensor is increasing along with the development of IoT technology, spatial query processing systems to efficiently process spatial sensor data are being actively studied. However, existing spatial query processing systems do not support a spatial-temporal data type and a spatial-temporal operator for processing spatial-temporal sensor data. Therefore, they are inadequate for processing spatial-temporal sensor data like GeoSensor. Accordingly, this paper developed a spatial-temporal query processing system, for efficient spatial-temporal query processing of spatial-temporal sensor data in a sensor network. Lastly, this paper verified the utility of System through a scenario, and proved that this system's performance is better than existing systems through performance assessment of performance time and memory usage.
https://doi.org/10.7236/JIIBC.2017.17.1.139 인용 PDF KSCI

A Study on the Efficient Feature Vector Extraction for Music Information Retrieval System (음악 정보검색 시스템을 위한 효율적인 특징 벡터 추출에 관한 연구)

윤원중;이강규;박규식
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.7
- /
- pp.532-539
- /
- 2004
In this Paper, we propose a content-based music information retrieval (MIR) system base on the query-by-example (QBE) method. The proposed system is implemented to retrieve queried music from a dataset where 60 music samples were collected for each of the four genres in Classical, Hiphop. Jazz. and Reck. resulting in 240 music files in database. From each query music signal, the system extracts 60 dimensional feature vectors including spectral centroid. rolloff. flux base on STFT and also the LPC. MFCC and Beat information. and retrieves queried music from a trained database set using Euclidean distance measure. In order to choose optimum features from the 60 dimension feature vectors, SFS method is applied to draw 10 dimension optimum features and these are used for the Proposed system. From the experimental result. we can verify the superior performance of the proposed system that provides success rate of 84% in Hit Rate and 0.63 in MRR which means near 10% improvements over the previous methods. Additional experiments regarding system Performance to random query Patterns (or portions) and query lengths have been investigated and a serious instability problem of system Performance is Pointed out.
PDF KSCI

Efficient Processing of Multiple Group-by Queries in MapReduce for Big Data Analysis (맵리듀스에서 빅데이터 분석을 위한 다중 Group-by 질의의 효율적인 처리 기법)

Park, Eunju;Park, Sojeong;Oh, Sohyun;Choi, Hyejin;Lee, Ki Yong;Shim, Junho
- KIISE Transactions on Computing Practices
- /
- v.21 no.5
- /
- pp.387-392
- /
- 2015
MapReduce is a framework used to process large data sets in parallel on a large cluster. A group-by query is a query that partitions the input data into groups based on the values of the specified attributes, and then evaluates the value of the specified aggregate function for each group. In this paper, we propose an efficient method for processing multiple group-by queries using MapReduce. Instead of computing each group-by query independently, the proposed method computes multiple group-by queries in stages with one or more MapReduce jobs in order to reduce the total execution cost. We compared the performance of this method with the performance of a less sophisticated method that computes each group-by query independently. This comparison showed that the proposed method offers better performance in terms of execution time.
https://doi.org/10.5626/KTCP.2015.21.5.387 인용 KSCI

Search Result 950, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)