• Title/Summary/Keyword: 분산 병렬 처리

Search Result 411, Processing Time 0.029 seconds

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Workflow-based Bio Data Analysis System for HPC (HPC 환경을 위한 워크플로우 기반의 바이오 데이터 분석 시스템)

  • Ahn, Shinyoung;Kim, ByoungSeob;Choi, Hyun-Hwa;Jeon, Seunghyub;Bae, Seungjo;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.97-106
    • /
    • 2013
  • Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.

HWbF(Hit and WLC based Firewall) Design using HIT technique for the parallel-processing and WLC(Weight Least Connection) technique for load balancing (병렬처리 HIT 기법과 로드밸런싱 WLC기법이 적용된 HWbF(Hit and WLC based Firewall) 설계)

  • Lee, Byung-Kwan;Kwon, Dong-Hyeok;Jeong, Eun-Hee
    • Journal of Internet Computing and Services
    • /
    • v.10 no.2
    • /
    • pp.15-28
    • /
    • 2009
  • This paper proposes HWbF(Hit and WLC based Firewall) design which consists of an PFS(Packet Filter Station) and APS(Application Proxy Station). PFS is designed to reduce bottleneck and to prevent the transmission delay of them by distributing packets with PLB(Packet Load Balancing) module, and APS is designed to manage a proxy cash server by using PCSLB(Proxy Cash Server Load Balancing) module and to detect a DoS attack with packet traffic quantity. Therefore, the proposed HWbF in this paper prevents packet transmission delay that was a drawback in an existing Firewall, diminishes bottleneck, and then increases the processing speed of the packet. Also, as HWbF reduce the 50% and 25% of the respective DoS attack error detection rate(TCP) about average value and the fixed critical value to 38% and 17%. with the proposed expression by manipulating the critical value according to the packet traffic quantity, it not only improve the detection of DoS attack traffic but also diminishes the overload of a proxy cash server.

  • PDF

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

Performance Analysis of MVDR and RLS Beamforming Using Systolic Array Structure (시스토릭 어레이 구조를 갖는 최소분산 비왜곡응답 및 최소자승 회귀 빔형성기법 성능 분석)

  • 이호중;서상우;이원철
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.1
    • /
    • pp.1-6
    • /
    • 2003
  • This paper analyses the performance of either the minimum variance distortionless response (MVDR) or the recursive least square (RLS) beamformer structured on the systolic array. Provided that the snapshot vector including the desired user's signal and the interferences with the noise is received at the array antenna. In order to improve the quality of received signal, MVDR or RLS algorithm can be utilized to update the beamformer weights recursively. Furthermore to increase the channel capacity, by the usage of the above schemes, the effect of the spatial filtering can be obtained which constructively combining multipath components corresponding to the desired user whereas the multiple access interferences (MAI) is nulled out on spatial domain. This paper introduces the MVDR and RLS beamformer structured on systolic array conducting the spatial filtering, and its performance under the multipath fading channel in the presence of multiple access interferences will be analyzed. To show the superior spatial filtering performances of the proposed scheme employing the systolic way structured beamformer, the computer simulations are carried out. And the validity of practical deployment of the proposed scheme will be confirmed throughout showing the BER behaviors and the beampatterns.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Effcient Neural Network Architecture for Fat Target Detection and Recognition (목표물의 고속 탐지 및 인식을 위한 효율적인 신경망 구조)

  • Weon, Yong-Kwan;Baek, Yong-Chang;Lee, Jeong-Su
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2461-2469
    • /
    • 1997
  • Target detection and recognition problems, in which neural networks are widely used, require translation invariant and real-time processing in addition to the requirements that general pattern recognition problems need. This paper presents a novel architecture that meets the requirements and explains effective methodology to train the network. The proposed neural network is an architectural extension of the shared-weight neural network that is composed of the feature extraction stage followed by the pattern recognition stage. Its feature extraction stage performs correlational operation on the input with a weight kernel, and the entire neural network can be considered a nonlinear correlation filter. Therefore, the output of the proposed neural network is correlational plane with peak values at the location of the target. The architecture of this neural network is suitable for implementing with parallel or distributed computers, and this fact allows the application to the problems which require realtime processing. Net training methodology to overcome the problem caused by unbalance of the number of targets and non-targets is also introduced. To verify the performance, the proposed network is applied to detection and recognition problem of a specific automobile driving around in a parking lot. The results show no false alarms and fast processing enough to track a target that moves as fast as about 190 km per hour.

  • PDF

Multi-threaded Web Crawling Design using Queues (큐를 이용한 다중스레드 방식의 웹 크롤링 설계)

  • Kim, Hyo-Jong;Lee, Jun-Yun;Shin, Seung-Soo
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.2
    • /
    • pp.43-51
    • /
    • 2017
  • Background/Objectives : The purpose of this study is to propose a multi-threaded web crawl using queues that can solve the problem of time delay of single processing method, cost increase of parallel processing method, and waste of manpower by utilizing multiple bots connected by wide area network Design and implement. Methods/Statistical analysis : This study designs and analyzes applications that run on independent systems based on multi-threaded system configuration using queues. Findings : We propose a multi-threaded web crawler design using queues. In addition, the throughput of web documents can be analyzed by dividing by client and thread according to the formula, and the efficiency and the number of optimal clients can be confirmed by checking efficiency of each thread. The proposed system is based on distributed processing. Clients in each independent environment provide fast and reliable web documents using queues and threads. Application/Improvements : There is a need for a system that quickly and efficiently navigates and collects various web sites by applying queues and multiple threads to a general purpose web crawler, rather than a web crawler design that targets a particular site.

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

Analysis of Ultimate Bearing Capacity of Piles Using Artificial Neural Networks Theory (I) -Theory (인공 신경망 이론을 이용한 말뚝의 극한지지력 해석(I)-이론)

  • 이정학;이인모
    • Geotechnical Engineering
    • /
    • v.10 no.4
    • /
    • pp.17-28
    • /
    • 1994
  • It is well known that human brain has the advantage of handling disperse and parallel distributed data efficiently. On the basic of this fact, artificial neural networks theory was developed and has been applied to various fields of science successfully. In this study, error back propagation algorithm which is one of the teaching technique of artificial neural networks is applied to predict ultimate bearing capacity of pile foundations. For the verification of applicability of this system, a total of 28 data of model pile test results are used. The 9, 14 and 21 test data respectively out of the total 28 data are used for training the networks, and the others are used for the comparison between the predicted and the measured. The results show that the developed system can provide a good matching with model pile test results by training with data more than 14. These limited results show the possibility of utilizing the neural networks for pile capacity prediction problems.

  • PDF