• Title/Summary/Keyword: distributed parallel processing

Search Result 257, Processing Time 0.025 seconds

Performance Evaluation of Hash Join Algorithms Supporting Dynamic Load Balancing for a Database Sharing System (데이타베이스 공유 시스템에서 동적 부하분산을 지원하는 해쉬 조인 알고리즘들의 성능 평가)

  • Moon, Ae-Kyung;Cho, Haeng-Rae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.12
    • /
    • pp.3456-3468
    • /
    • 1999
  • Most of previous parallel join algorithms assume a database partition system(DPS), where each database partition is owned by a single processing node. While the DPS is novel in the sense that it can interconnect a large number of nodes and support a geographically distributed environment, it may suffer from poor facility for load balancing and system availability compared to the database sharing system(DSS). In this paper, we propose a dynamic load balancing strategy by exploiting the characteristics of the DSS, and then extend the conventional hash join algorithms to the DSS by using the dynamic load balancing strategy. With simulation studies under a wide variety of system configurations and database workloads, we analyze the effects of the dynamic load balancing strategy and differences in the performances of hash join algorithms in the DSS.

  • PDF

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

  • Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.9
    • /
    • pp.4063-4086
    • /
    • 2016
  • Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.

A Hybrid Mechanism of Particle Swarm Optimization and Differential Evolution Algorithms based on Spark

  • Fan, Debin;Lee, Jaewan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.5972-5989
    • /
    • 2019
  • With the onset of the big data age, data is growing exponentially, and the issue of how to optimize large-scale data processing is especially significant. Large-scale global optimization (LSGO) is a research topic with great interest in academia and industry. Spark is a popular cloud computing framework that can cluster large-scale data, and it can effectively support the functions of iterative calculation through resilient distributed datasets (RDD). In this paper, we propose a hybrid mechanism of particle swarm optimization (PSO) and differential evolution (DE) algorithms based on Spark (SparkPSODE). The SparkPSODE algorithm is a parallel algorithm, in which the RDD and island models are employed. The island model is used to divide the global population into several subpopulations, which are applied to reduce the computational time by corresponding to RDD's partitions. To preserve population diversity and avoid premature convergence, the evolutionary strategy of DE is integrated into SparkPSODE. Finally, SparkPSODE is conducted on a set of benchmark problems on LSGO and show that, in comparison with several algorithms, the proposed SparkPSODE algorithm obtains better optimization performance through experimental results.

MRSPAKE : A Web-Scale Spatial Knowledge Extractor Using Hadoop MapReduce (MRSPAKE : Hadoop MapReduce를 이용한 웹 규모의 공간 지식 추출기)

  • Lee, Seok-Jun;Kim, In-Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.569-584
    • /
    • 2016
  • In this paper, we present a spatial knowledge extractor implemented in Hadoop MapReduce parallel, distributed computing environment. From a large spatial dataset, this knowledge extractor automatically derives a qualitative spatial knowledge base, which consists of both topological and directional relations on pairs of two spatial objects. By using R-tree index and range queries over a distributed spatial data file on HDFS, the MapReduce-enabled spatial knowledge extractor, MRSPAKE, can produce a web-scale spatial knowledge base in highly efficient way. In experiments with the well-known open spatial dataset, Open Street Map (OSM), the proposed web-scale spatial knowledge extractor, MRSPAKE, showed high performance and scalability.

Mutual Authentication Protocol for Safe Data Transmission of Multi-distributed Web Cluster Model (다중 분산 웹 클러스터모델의 안전한 데이터 전송을 위한 상호 인증 프로토콜)

  • Lee, Kee-Jun;Kim, Chang-Won;Jeong, Chae-Yeong
    • The KIPS Transactions:PartC
    • /
    • v.8C no.6
    • /
    • pp.731-740
    • /
    • 2001
  • Multi-distributed web cluster model expanding conventional cluster system is the cluster system which processes large-scaled work demanded from users with parallel computing method by building a number of system nodes on open network into a single imaginary network. Multi-distributed web cluster model on the structured characteristics exposes internal system nodes by an illegal third party and has a potential that normal job performance is impossible by the intentional prevention and attack in cooperative work among system nodes. This paper presents the mutual authentication protocol of system nodes through key division method for the authentication of system nodes concerned in the registration, requirement and cooperation of service code block of system nodes and collecting the results and then designs SNKDC which controls and divides symmetrical keys of the whole system nodes safely and effectively. SNKDC divides symmetrical keys required for performing the work of system nodes and the system nodes transmit encoded packet based on the key provided. Encryption packet given and taken between system nodes is decoded by a third party or can prevent the outflow of information through false message.

  • PDF

Implementation and Performance Evaluation of Socket and RMI based Java Message Passing Systems (소켓 및 RMI 기반 자바 메시지 전달 시스템의 구현 및 성능평가)

  • Bang, Seung-Jun;Ahn, Jin-Ho
    • Journal of Internet Computing and Services
    • /
    • v.8 no.5
    • /
    • pp.11-20
    • /
    • 2007
  • This paper designs and implements a message passing library called JMPI (Java Message Passing Interface) which complies with MPJ (Message Passing in Java), the MPI standard Specification for Java language, This library provides some graphic user interface tools to enable parallel computing environments to be configured very simply by their administrators and JMPI applications to be executed very conveniently. Also in this paper, we implement two versions of systems using Socket and RPC which are both typical distributed system communication mechanisms and with three benchmark applications, compare performance of these systems with that of an existing system JPVM depending on the increasing number of the computers. Experimental results show that our systems outperform JPVM system in terms of various aspects and that the most efficient processing speedup can be obtained by increasing the number of the computers in consideration of network traffic through processing evaluation. Finally, we can see that, as the number of computers increases, using RMI to transmit a message is more effective than using object streams attached to sockets to transmit a message.

  • PDF

Real-time Hand Gesture Recognition System based on Vision for Intelligent Robot Control (지능로봇 제어를 위한 비전기반 실시간 수신호 인식 시스템)

  • Yang, Tae-Kyu;Seo, Yong-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.10
    • /
    • pp.2180-2188
    • /
    • 2009
  • This paper is study on real-time hand gesture recognition system based on vision for intelligent robot control. We are proposed a recognition system using PCA and BP algorithm. Recognition of hand gestures consists of two steps which are preprocessing step using PCA algorithm and classification step using BP algorithm. The PCA algorithm is a technique used to reduce multidimensional data sets to lower dimensions for effective analysis. In our simulation, the PCA is applied to calculate feature projection vectors for the image of a given hand. The BP algorithm is capable of doing parallel distributed processing and expedite processing since it take parallel structure. The BP algorithm recognized in real time hand gestures by self learning of trained eigen hand gesture. The proposed PCA and BP algorithm show improvement on the recognition compared to PCA algorithm.

Web-based Distributed Parallel Computing Environment with Multi-Managing Method (멀티 매니징 기법을 이용한 웹기반 분산 병렬 컴퓨팅 환경)

  • Maeng, Hye-Seon;Han, Tak-Don;Kim, Sin-Deok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.7
    • /
    • pp.1777-1788
    • /
    • 1999
  • The portability of Java language makes it possible to use heterogeneous computers without re-compiling of application programs. Java applet can also be transported to other computers via Web browser. In this research, a Cooperative Web Computing Environment(CWCE) that uses idle computers on the Intranet for cooperative parallel computing work is suggested. The CWCE allows to use more than a manager computer that sends applets and manages communication between other computers. The number of manager computers can be determined according to the characteristics of computing environment and any chosen application program. It can reduce the amount of communication overhead for the application programs especially with synchronized communication. For the CWCE, a decision function to determine the managing level is provided. The CWCE turns out to be useful computing environment for the applications with less computation request ratio and multi-managing can help to reduce the communication overhead especially for the applications with a high ratio of synchronization purpose communications.

  • PDF

Conversion of Large RDF Data using Hash-based ID Mapping Tables with MapReduce Jobs (맵리듀스 잡을 사용한 해시 ID 매핑 테이블 기반 대량 RDF 데이터 변환 방법)

  • Kim, InA;Lee, Kyu-Chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.236-239
    • /
    • 2021
  • With the growth of AI technology, the scale of Knowledge Graphs continues to be expanded. Knowledge Graphs are mainly expressed as RDF representations that consist of connected triples. Many RDF storages compress and transform RDF triples into the condensed IDs. However, if we try to transform a large scale of RDF triples, it occurs the high processing time and memory overhead because it needs to search the large ID mapping table. In this paper, we propose the method of converting RDF triples using Hash-based ID mapping tables with MapReduce, which is the software framework with a parallel, distributed algorithm. Our proposed method not only transforms RDF triples into Integer-based IDs, but also improves the conversion speed and memory overhead. As a result of our experiment with the proposed method for LUBM, the size of the dataset is reduced by about 3.8 times and the conversion time was spent about 106 seconds.

  • PDF

A Design of the Preprocess Module for the Distributed Process of the ECG signals (ECG 신호의 분산처리를 위한 Preprocess Module에 관한 연구)

  • Song, H.B.;Lee, K.J.;Yoon, H.R.;Lee, M.H.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1338-1340
    • /
    • 1987
  • This paper describes the design of ECG data preprocess module for the ECG signals. This module process the data obtained from two channels. It is composed of the AID converter, QRS detector, one chip micro-computer and memory. This module performs the following functions;digital filtering, R wave detection and determination of reference point for the ST segment. The measured points are transfered to the next data module by the interrupt process. This preprocessor data module is available to the basis for the parallel data processing for the real time automatic diagnosis.

  • PDF