• Title/Summary/Keyword: Join

Search Result 1,155, Processing Time 0.023 seconds

A Similarity Join Algorithm Using a Median as a Filter (중앙값을 필터로 이용한 유사도 조인 알고리즘)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.71-76
    • /
    • 2015
  • In similarity join processing, a general technique employs a generation-verification framework, which includes two phases: the first phase generates a set of candidate pairs from a collection of records; and the second phase verifies each candidate pair by computing real similarity. In order to reduce the number of candidate pairs in the verification phase, the median of one record of each candidate pair is used as a filter in this paper to test whether the other record can has the proper number of overlapped tokens. We propose a similarity join algorithm with the median filter, and show that the proposed algorithm has better performance in execution time than recent algorithms without the filter through extensive experiments on real-world datasets.

A Sampling-based Algorithm for Top-${\kappa}$ Similarity Joins (Top-${\kappa}$ 유사도 조인을 위한 샘플링 기반 알고리즘)

  • Park, Jong Soo
    • Journal of KIISE:Databases
    • /
    • v.41 no.4
    • /
    • pp.256-261
    • /
    • 2014
  • The problem of top-${\kappa}$ set similarity joins finds the top-${\kappa}$ pairs of records ranked by their similarities between two sets of input records. We propose an efficient algorithm to return top-${\kappa}$ similarity join pairs using a sampling technique. From a sample of the input records, we construct a histogram of set similarity joins, and then compute an estimated similarity threshold in the histogram for top-${\kappa}$ join pairs within the error bound of 95% confidence level based on statistical inference. Finally, the estimated threshold is applied to the traditional similarity join algorithm which uses the min-heap structure to get top-${\kappa}$ similarity joins. The experimental results show the good performance of the proposed algorithm on large real datasets.

Join Operation of Parallel Database System with Large Main Memory (대용량 메모리를 가진 병렬 데이터베이스 시스템의 조인 연산)

  • Park, Young-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.3
    • /
    • pp.51-58
    • /
    • 2007
  • The shared-nothing multiprocessor architecture has advantages in scalability, this architecture has been adopted in many multiprocessor database system. But, if the data are not uniformly distributed across the processors, load will be unbalanced. Therefore, the whole system performance will deteriorate. This is the data skew problem, which usually occurs in processing parallel hash join. Balancing the load before performing join will resolve this problem efficiently and the whole system performance can be improved. In this paper, we will present an algorithm using merit of very large memory to reduce disk access overhead in performing load balancing and to efficiently solve the data skew problem. Also, we will present analytical model of our new algorithm and present the result of some performance study we made comparing our algorithm with the other algorithms in handling data skew.

  • PDF

An Efficient Method of Document Store and Version Management for XML Repository System (XML 저장 관리 시스템에서 효율적인 버전 관리 및 문서 저장 방안)

  • Jung, Hyun-Joo;Kim, Kweon-Yang;Choi, Jae-Hyuk
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.4
    • /
    • pp.11-21
    • /
    • 2003
  • In rapidly changing an information=oriented society, it is essential to control massive document information by electronic file. In relation to these electronic document, it is also important to keep and maintain all kinds of information without any losses. It should be allowed to trace previous contents as well as recently updated contents by controlling updated contents with version. For these, XML is recommendable. In this thesis, we intend to save the document storing space by saving only updated contents with version without saving whole documentation, when document is updated. In case of controlling the history of document update by version, we designed system so as to omit "JOIN operation" if document size is under a certainspecific size. Therefore, we implemented a new XML document repository system which is possible for quick search and efficient XML document saving by reducing perfomance deterioration caused by JOIN operation.

  • PDF

Device Security Bootstrapping Mechanism on the IEEE 802.15.4-Based LoWPAN (IEEE 802.15.4 기반 LoWPAN에서의 디바이스 보안 설정 메커니즘)

  • Lee, Jong-Hoon;Park, Chang-seop
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.6
    • /
    • pp.1561-1569
    • /
    • 2016
  • As the use of the sensor device increases in IoT environment, the need for device security is becoming more and more important When a sensor device is deployed in IEEE 802.15.4-based LoWPAN, it has to perform the join operation with PAN Coordinator and the binding operation with another device. In the join and binding process, authentication and key distribution of the device are performed using the pre-distributed network key or certificate. However, the network key used in the conventional method has problems that it's role is limited to the group authentication and individual identification is not applied in certificate issuing. In this paper, we propose a secure join and binding protocol in LoWPAN environment that solves the problems of pre-distributed network key.

Implementation and Evaluation of Time Interval Partitioning Algorithm in Temporal Databases (시간 데이타베이스에서 시간 간격 분할 알고리즘의 구현 및 평가)

  • Lee, Kwang-Kyu;Shin, Ye-Ho;Ryu, Keun-Ho;Kim, Hong-Gi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.9-16
    • /
    • 2002
  • Join operation exert a great effect on the performance of system in temporal database as in the relational database. Especially, as for the temporal join, the optimization of interval partition decides the performance of query processing. In this paper, to improve the efficiency of parallel join query in temporal database. I proposed Minimum Interval Partition(MIP) scheme that time interval partitioning. The validity of this MIP algorithm that decides minimum breakpoint of the partition is proved by example scenario and I confirmed improved efficiency as compared with existing partition algorithm.

Performance Evaluation of Hash Join Algorithms Supporting Dynamic Load Balancing for a Database Sharing System (데이타베이스 공유 시스템에서 동적 부하분산을 지원하는 해쉬 조인 알고리즘들의 성능 평가)

  • Moon, Ae-Kyung;Cho, Haeng-Rae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3456-3468
    • /
    • 1999
  • Most of previous parallel join algorithms assume a database partition system(DPS), where each database partition is owned by a single processing node. While the DPS is novel in the sense that it can interconnect a large number of nodes and support a geographically distributed environment, it may suffer from poor facility for load balancing and system availability compared to the database sharing system(DSS). In this paper, we propose a dynamic load balancing strategy by exploiting the characteristics of the DSS, and then extend the conventional hash join algorithms to the DSS by using the dynamic load balancing strategy. With simulation studies under a wide variety of system configurations and database workloads, we analyze the effects of the dynamic load balancing strategy and differences in the performances of hash join algorithms in the DSS.

  • PDF

An Effective Multicasting using Pre-join Technique in Mobile Computing Environments (이동 컴퓨팅 환경에서의 예측 가입 기법을 이용한 효율적인 멀티캐스팅)

  • Ryu, Ki-Seon;Kim, Joong-Bae;Eom, Young-Ik
    • Journal of KIISE:Information Networking
    • /
    • v.27 no.1
    • /
    • pp.88-97
    • /
    • 2000
  • Applied with multicast transmission techniques in mobile computing environments, a mobile host will experience join and graft delay, happened when a host wants to join a multicast group in the fixed network, if there are no same multicast group member in the new cell the mobile host enters. Due to low bandwidth and higher error rate, there happens many additional traffic. In this paper, we propose a pre-join technique which new mobile support station joins the multicast group in advance based on signal strength hint in the current cell. We use the multiple level acknowledgement strategy that executes acknowledgment separately between the fixed part and the wireless transmission path. Using our strategy, it is an efficient technique in case there are more cells that has no multicast group members and less mobile host movements.

  • PDF

Grid-based Index Generation and k-nearest-neighbor Join Query-processing Algorithm using MapReduce (맵리듀스를 이용한 그리드 기반 인덱스 생성 및 k-NN 조인 질의 처리 알고리즘)

  • Jang, Miyoung;Chang, Jae Woo
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1303-1313
    • /
    • 2015
  • MapReduce provides high levels of system scalability and fault tolerance for large-size data processing. A MapReduce-based k-nearest-neighbor(k-NN) join algorithm seeks to produce the k nearest-neighbors of each point of a dataset from another dataset. The algorithm has been considered important in bigdata analysis. However, the existing k-NN join query-processing algorithm suffers from a high index-construction cost that makes it unsuitable for the processing of bigdata. To solve the corresponding problems, we propose a new grid-based, k-NN join query-processing algorithm. Our algorithm retrieves only the neighboring data from a query cell and sends them to each MapReduce task, making it possible to improve the overhead data transmission and computation. Our performance analysis shows that our algorithm outperforms the existing scheme by up to seven-fold in terms of the query-processing time, while also achieving high extent of query-result accuracy.

Effective Parallel Hash Join Algorithm Based on Histoftam Equalization in the Presence of Data Skew (데이터 편재 하에서 히스토그램 변환기법에 기초한 효율적인 병렬 해쉬 결합 알고리즘)

  • Park, Ung-Gyu;Choe, Hwang-Gyu;Kim, Tak-Gon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.338-348
    • /
    • 1997
  • In this pater, we first propose a data distribution framework to resolve load imbalance and bucket oerflow in parallel hash join.Using the histogram equalization technique, the framework transforms a histogram of skewed data to the desired uniform distribution that corresponds to the relative computing power of node processors in the system.Next we propose an effcient parallel hash join algorithm for handing skwed data based on the proposed data distribution methodology.For performance comparison of our algorithm with other hash join algorithms.we perform similation experiments and actual exeution on COREDB database computer with 8-node hyperube architecture. In these experiments, skwed data distebution of the join atteibute is modeled using a Zipf-like distribution.The perfomance studies undicate that our algorithm outperforms other algorithms in the skewed cases.

  • PDF