• 제목/요약/키워드: Join Algorithm

검색결과 138건 처리시간 0.027초

A Differential Data Replicator in Distributed Environments

  • Lee, Wookey;Park, Jooseok;Sukho Kang
    • The Journal of Information Technology and Database
    • /
    • 제3권2호
    • /
    • pp.3-24
    • /
    • 1996
  • In this paper a data replicator scheme with a distributed join architecture is suggested with its cost functions and the performance results. The contribution of this scheme is not only minimizing the number of base relation locks in distributed database tables but also reducing the remote transmission amount remarkably, which will be able to embellish the distributed databse system practical. The differential files that are derived from the active log of the DBMS are mainly forcing the scheme to reduce the number of base relation locks. The amount of transportation between relevant sites could be curtailed by the tuple reduction procedures. Then we prescribe an algorithm of data replicator with its cost function and show the performance results compared with the semi-join scheme in their distributed environments.

  • PDF

Rule extraction from trained neural network using NofM algorithm with improved clustering step (개선된 군집화 단계의 NofM 알고리즘을 이용한 훈련된 신경망으로부터의 규칙추출)

  • Lee, Han-Yul;Ra, Jong-Hei;Kim, Moon-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 한국정보처리학회 2001년도 추계학술발표논문집 (상)
    • /
    • pp.581-584
    • /
    • 2001
  • 신경망이 만들어내는 출력에 대한 정보는 수치적으로 분산되어 신경망에 저장되므로, 인간이 직접 해석하기가 힘들다. 본 논문에서는 LRE(link rule extraction)기법인 NofM 알고리즘의 6단계 중에서 초기 단계인 가중치 군집화 단계를 개선하여 추출되는 규칙들의 전제부에 들어가는 규칙 조건들의 수를 조절함으로써, 추출된 규칙이 입력 특성에 대한 정보를 과잉 일반화하거나, 과잉 구체화하는 것을 피할 수 있음을 실험을 통해 보였다. 일반적으로 NofM 알고리즘에서 가중치들을 군집화한 때는 Join 알고리즘을 사용하는데, 본 논문에서는 Join 알고리즘의 Join condition을 0.05부터 0.25까지 0.05씩 점진적으로 확대하여 클러스터링을 하여줌으로써 신경망의 출력에 중요한 역할을 하는 가중치들을 효과적으로 군집화함을 보였다.

  • PDF

An Efficient Multiple Event Detection in Sensor Networks (센서 네트워크에서 효율적인 다중 이벤트 탐지)

  • Yang, Dong-Yun;Chung, Chin-Wan
    • Journal of KIISE:Databases
    • /
    • 제36권4호
    • /
    • pp.292-305
    • /
    • 2009
  • Wireless sensor networks have a lot of application areas such as industrial process control, machine and resource management, environment and habitat monitoring. One of the main objects of using wireless sensor networks in these areas is the event detection. To detect events at a user's request, we need a join processing between sensor data and the predicates of the events. If there are too many predicates of events compared with a node's capacity, it is impossible to store them in a node and to do an in-network join with the generated sensor data This paper proposes a predicate-merge based in-network join approach to efficiently detect multiple events, considering the limited capacity of a sensor node and many predicates of events. It reduces the number of the original predicates of events by substituting some pairs of original predicates with some merged predicates. We create an estimation model of a message transmission cost and apply it to the selection algorithm of targets for merged predicates. The experiments validate the cost estimation model and show the superior performance of the proposed approach compared with the existing approaches.

Multiple Pipelined Hash Joins using Synchronization of Page Execution Time (페이지 실행시간 동기화를 이용한 다중 파이프라인 해쉬 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • 제27권7호
    • /
    • pp.639-649
    • /
    • 2000
  • In the relational database systems, the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed to reduce the execution time. Multiple hash join algorithm using allocation tree is one of most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. In this paper, to solve the performance degrading problem by the delay, we develop a join algorithm using the concept of 'synchronization of page execution time' for multiple hash joins. We reduce the processing time of each nodes in the allocation tree and improve the total system performance. In addition, we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Effective Load Shedding for Multi-Way windowed Joins Based on the Arrival Order of Tuples on Data Streams (다중 윈도우 조인을 위한 튜플의 도착 순서에 기반한 효과적인 부하 감소 기법)

  • Kwon, Tae-Hyung;Lee, Ki-Yong;Son, Jin-Hyun;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • 제37권1호
    • /
    • pp.1-11
    • /
    • 2010
  • Recently, there has been a growing interest in the processing of continuous queries over multiple data streams. When the arrival rates of tuples exceed the memory capacity of the system, a load shedding technique is used to avoid the system becoming overloaded by dropping some subset of input tuples. In this paper, we propose an effective load shedding algorithm for multi-way windowed joins over multiple data streams. Most previous load shedding algorithms estimate the productivity of each tuple, i.e., the number of join output tuples produced by the tuple, based on its "join attribute value" and drop tuples with the lowest productivity. However, the productivity of a tuple cannot be accurately estimated from its join attribute value when the join attribute values are unique and do not repeat, or the distribution of the join attribute values changes over time. For these cases, we estimate the productivity of a tuple based on its "arrival order" on data streams, rather than its join attribute value. The proposed method can effectively estimate the productivity of a tuple even when the productivity of a tuple cannot be accurately estimated from its join attribute value. Through extensive experiments and analysis, we show that our proposed method outperforms the previous methods in terms of effectiveness and efficiency.

Performance Evaluation of Hash Join Algorithm on Flash Memory SSDs (플래쉬 메모리 SSD 기반 해쉬 조인 알고리즘의 성능 평가)

  • Park, Jang-Woo;Park, Sang-Shin;Lee, Sang-Won;Park, Chan-Ik
    • Journal of KIISE:Computing Practices and Letters
    • /
    • 제16권11호
    • /
    • pp.1031-1040
    • /
    • 2010
  • Hash join is one of the core algorithms in databases management systems. If a hash join cannot complete in one-pass because the available memory is insufficient (i.e., hash table overflow), however, it may incur a few sequential writes and excessive random reads. With harddisk as the tempoary storage for hash joins, the I/O time would be dominated by slow random reads in its probing phase. Meanwhile, flash memory based SSDs (flash SSDs) are becoming popular, and we will witness in the foreseeable future that flash SSDs replace harddisks in enterprise databases. In contrast to harddisk, flash SSD without any mechanical component has fast latency in random reads, and thus it can boost hash join performance. In this paper, we investigate several important and practical issues when flash SSD is used as tempoary storage for hash join. First, we reveal the va patterns of hash join in detail and explain why flash SSD can outperform harddisk by more than an order of magnitude. Second, we present and analyze the impact of cluster size (i.e., va unit in hash join) on performance. Finally, we emperically demonstrate that, while a commerical query optimizer is error-prone in predicting the execution time with harddisk as temporary storage, it can precisely estimate the execution time with flash SSD. In summary, we show that, when used as temporary storage for hash join, flash SSD will provide more reliable cost estimation as well as fast performance.

Efficient Similarity Joins by Adaptive Prefix Filtering (맞춤 접두 필터링을 이용한 효율적인 유사도 조인)

  • Park, Jong Soo
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제2권4호
    • /
    • pp.267-272
    • /
    • 2013
  • As an important operation with many applications such as data cleaning and duplicate detection, the similarity join is a challenging issue, which finds all pairs of records whose similarities are above a given threshold in a dataset. We propose a new algorithm that uses the prefix filtering principle as strong constraints on generation of candidate pairs for fast similarity joins. The candidate pair is generated only when the current prefix token of a probing record shares one prefix token of an indexing record within the constrained prefix tokens by the principle. This generation method needs not to compute an upper bound of the overlap between two records, which results in reduction of execution time. Experimental results show that our algorithm significantly outperforms the previous prefix filtering-based algorithms on real datasets.

A Study on the Efficiency of Join Operation On Stream Data Using Sliding Windows (스트림 데이터에서 슬라이딩 윈도우를 사용한 조인 연산의 효율에 관한 연구)

  • Yang, Young-Hyoo
    • Journal of the Korea Society of Computer and Information
    • /
    • 제17권2호
    • /
    • pp.149-157
    • /
    • 2012
  • In this thesis, the problem of computing approximate answers to continuous sliding-window joins over data streams when the available memory may be insufficient to keep the entire join state. One approximation scenario is to provide a maximum subset of the result, with the objective of losing as few result tuples as possible. An alternative scenario is to provide a random sample of the join result, e.g., if the output of the join is being aggregated. It is shown formally that neither approximation can be addressed effectively for a sliding-window join of arbitrary input streams. Previous work has addressed only the maximum-subset problem, and has implicitly used a frequency based model of stream arrival. There exists a sampling problem for this model. More importantly, it is shown that a broad class of applications for which an age-based model of stream arrival is more appropriate, and both approximation scenarios under this new model are addressed. Finally, for the case of multiple joins being executed with an overall memory constraint, an algorithm for memory allocation across the join that optimizes a combined measure of approximation in all scenarios considered is provided.

Personal Broadcasting System Using mOBCP-based Overlay Multicast Tree Construction Method (개인 방송 시스템을 위한 mOBCP 기반의 오버레이 멀티캐스트 트리 구성 방안)

  • Nam, Ji-Seung;Kang, Mi-Young;Jeon, Jin-Han;Son, Seung-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제32권8B호
    • /
    • pp.539-546
    • /
    • 2007
  • For better performance and to avoid member service annoyance that results due to joining-clients' waiting durations and time-outs when there are more than one client wanting to join concurrently for Personal Broadcasting System service, there is a need for improving concurrent member joining mechanism. For a more efficient and better performing, this paper apply Overlay Multicast based mini-Overlay Broadcasting Control Protocol(mOBCP) Algorithm on Personal Broadcasting System. mOBCP proposed is performance-effective mechanism, since it considers the case of how fast will children, concurrently, find and join new parents when paths to existing parents are in Failure. The performance comparison, in terms of tree construction time variation and Latency are done through simulations and the results conclude in favour of the Proposed mOBCP.

Optimization Methods of Adaptive Multi-Stage Distance Joins (적응적 다단계 거리 조인의 최적화 기법)

  • Shin, Hyo-Seop;Moon, Bong-Ki;Lee, Suk-Ho
    • Journal of KIISE:Databases
    • /
    • 제28권3호
    • /
    • pp.373-383
    • /
    • 2001
  • The distance join is a spatial join which finds data pairs in the order of distance when associating two spatial data sets. This paper proposes several methods to optimize the adaptive multi-stage distance join, presented in [1]. First, we optimize the sweeping index formula which is used for selecting sweeping axis during plane sweeping. Second, to improve the performance of a priority queue used for maintaining node pairs, we propose to use the maximum distance of a node pair as the second priority of the queue. Moreover, we compare trade-offs in estimating the cut-off distance between under uniformity assumption of data distribution and non-uniformity assumption. The experiments show that the proposed methods greatly improve the performance of the algorithm in CPU cost as well as in I/O cost.

  • PDF