• 제목/요약/키워드: distributed computing cluster

검색결과 89건 처리시간 0.031초

Apache Spark를 활용한 대용량 데이터의 처리 (Processing large-scale data with Apache Spark)

  • 고세윤;원중호
    • 응용통계연구
    • /
    • 제29권6호
    • /
    • pp.1077-1094
    • /
    • 2016
  • 아파치 스파크는 빠르고 범용성이 뛰어난 클러스터 컴퓨팅 패키지로, 복구 가능한 분산 데이터셋이라는 새로운 추상화를 통해 데이터를 인메모리에 유지하면서도 결함 감내성을 얻을 수 있는 방법을 제공한다. 이러한 추상화는 하드디스크에 직접 데이터를 읽고 쓰는 방식으로 결함 감내성을 제공하는 기존의 대표적인 대용량 데이터 분석 기술인 맵 리듀스 프레임워크에 비해 상당한 속도 향상을 거두었다. 특히 로지스틱 회귀 분석이나 K-평균 군집화와 같은 반복적인 기계 학습 알고리즘이나 사용자가 실시간으로 데이터에 관한 질의를 하는 대화형 자료 분석에서 스파크는 매우 효율적인 성능을 보인다. 뿐만 아니라, 높은 범용성을 바탕으로 하여 기계 학습, 스트리밍 자료 처리, SQL, 그래프 자료 처리와 같은 다양한 고수준 라이브러리를 제공한다. 이 논문에서는 스파크의 개념과 프로그래밍 모형에 대해 소개하고, 이를 통해 몇 가지 통계 분석 알고리즘을 구현하는 방법에 대해 소개한다. 아울러, 스파크에서 제공하는 기계 학습 라이브러리인 MLlib과 R 언어 인터페이스인 SparkR에 대해 다룬다.

UltraSPARC(64bit-RISC processor)을 위한 고성능 컴퓨터 리눅스 클러스터링 (HPC(High Performance Computer) Linux Clustering for UltraSPARC(64bit-RISC processor))

  • 김기영;조영록;장종권
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2003년도 컴퓨터소사이어티 추계학술대회논문집
    • /
    • pp.45-48
    • /
    • 2003
  • We can easily buy network system for high performance micro-processor, progress computer architecture is caused of high bandwidth and low delay time. Coupling PC-based commodity technology with distributed computing methodologies provides an important advance in the development of single-user dedicated systems. Lately Network is joined PC or workstation by computers of high performance and low cost. Than it make intensive that Cluster system is resembled supercomputer. Unix, Linux, BSD, NT(Windows series) can use Cluster system OS(operating system). I'm chosen linux gain low cost, high performance and open technical documentation. This paper is benchmark performance of Beowulf clustering by UltraSPARC-1K(64bit-RISC processor). Benchmark tools use MPI(Message Passing Interface) and NetPIPE. Beowulf is a class of experimental parallel workstations developed to evaluate and characterize the design space of this new operating point in price-performance.

  • PDF

가상 공장 시뮬레이션을 위한 PC 클러스터 기반의 멀티채널 가시화 모듈의 설계와 구현 (Design and Implementation of Multichannel Visualization Module on PC Cluster for Virtual Manufacturing)

  • 김용식;한순흥;양정삼
    • 대한기계학회논문집A
    • /
    • 제30권3호
    • /
    • pp.231-240
    • /
    • 2006
  • Immersive virtual reality (VR) for the manufacturing planning helps to shorten the planning times as well as to improve the quality of planning results. However, VR equipment is expensive, both in terms of development efforts and device. Engineers also spend time to manually repair erroneous 3-D shape because of imperfect translation between 3-D engineering CAD model and VR system format. In this paper a method is proposed to link 3-D engineering CAD model to a multichannel visualization system with PC clusters. The multichannel visualization module enables distributed computing for PC clusters, which can reduce the cost of VR experience while offering high performance. Each PC in a cluster renders a particular viewpoint of a scene. Scenes are synchronized by reading parameters from the master scene control module and passing them to client scenes.

MPI를 이용한 PSC 프레임 비선형해석 프로그램의 병렬화 (Parallel Implementation of Nonlinear Analysis Program of PSC Frame Using MPI)

  • 이재석;최규천
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 2001년도 봄 학술발표회 논문집
    • /
    • pp.61-68
    • /
    • 2001
  • A parallel nonlinear analysis program of prestressed concrete frame is migrated on a PC cluster system and a massively parallel processing system, CRAY T3E system, using MPI. The PC cluster system is configured with Pentium Ⅲ class PCs and fast ethernet. The CRAY T3E system is composed of a set of nodes each containing one Processing Element (PE), a memory subsystem and its distributed memory interconnect network. Parallel computing algorithms are implemented on element-wise processing parts including the calculation of stiffness matrix, element stresses and determination of material states, check of material failure and calculation of unbalanced loads. Parallel performance of the migrated program is evaluated through typical numerical examples.

  • PDF

A Hybrid Mechanism of Particle Swarm Optimization and Differential Evolution Algorithms based on Spark

  • Fan, Debin;Lee, Jaewan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권12호
    • /
    • pp.5972-5989
    • /
    • 2019
  • With the onset of the big data age, data is growing exponentially, and the issue of how to optimize large-scale data processing is especially significant. Large-scale global optimization (LSGO) is a research topic with great interest in academia and industry. Spark is a popular cloud computing framework that can cluster large-scale data, and it can effectively support the functions of iterative calculation through resilient distributed datasets (RDD). In this paper, we propose a hybrid mechanism of particle swarm optimization (PSO) and differential evolution (DE) algorithms based on Spark (SparkPSODE). The SparkPSODE algorithm is a parallel algorithm, in which the RDD and island models are employed. The island model is used to divide the global population into several subpopulations, which are applied to reduce the computational time by corresponding to RDD's partitions. To preserve population diversity and avoid premature convergence, the evolutionary strategy of DE is integrated into SparkPSODE. Finally, SparkPSODE is conducted on a set of benchmark problems on LSGO and show that, in comparison with several algorithms, the proposed SparkPSODE algorithm obtains better optimization performance through experimental results.

Grid-Enabled Parallel Simulation Based on Parallel Equation Formulation

  • Andjelkovic, Bojan;Litovski, Vanco B.;Zerbe, Volker
    • ETRI Journal
    • /
    • 제32권4호
    • /
    • pp.555-565
    • /
    • 2010
  • Parallel simulation is an efficient way to cope with long runtimes and high computational requirements in simulations of modern complex integrated electronic circuits and systems. This paper presents an algorithm for parallel simulation based on parallelization in equation formulation and simultaneous calculation of matrix contributions for nonlinear analog elements. In addition, the paper describes the development of a grid interface for a parallel simulator that enables a designer to perform simulations on distant computer clusters. Performances of the developed parallel simulation algorithm are evaluated by simulation of a microelectromechanical system.

Dynamic Fog-Cloud Task Allocation Strategy for Smart City Applications

  • Salim, Mikail Mohammed;Kang, Jungho;Park, Jong Hyuk
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.128-130
    • /
    • 2021
  • Smart cities collect data from thousands of IoT-based sensor devices for intelligent application-based services. Centralized cloud servers support application tasks with higher computation resources but introduce network latency. Fog layer-based data centers bring data processing at the edge, but fewer available computation resources and poor task allocation strategy prevent real-time data analysis. In this paper, tasks generated from devices are distributed as high resource and low resource intensity tasks. The novelty of this research lies in deploying a virtual node assigned to each cluster of IoT sensor machines serving a joint application. The node allocates tasks based on the task intensity to either cloud-computing or fog computing resources. The proposed Task Allocation Strategy provides seamless allocation of jobs based on process requirements.

An Efficient Implementation of Mobile Raspberry Pi Hadoop Clusters for Robust and Augmented Computing Performance

  • Srinivasan, Kathiravan;Chang, Chuan-Yu;Huang, Chao-Hsi;Chang, Min-Hao;Sharma, Anant;Ankur, Avinash
    • Journal of Information Processing Systems
    • /
    • 제14권4호
    • /
    • pp.989-1009
    • /
    • 2018
  • Rapid advances in science and technology with exponential development of smart mobile devices, workstations, supercomputers, smart gadgets and network servers has been witnessed over the past few years. The sudden increase in the Internet population and manifold growth in internet speeds has occasioned the generation of an enormous amount of data, now termed 'big data'. Given this scenario, storage of data on local servers or a personal computer is an issue, which can be resolved by utilizing cloud computing. At present, there are several cloud computing service providers available to resolve the big data issues. This paper establishes a framework that builds Hadoop clusters on the new single-board computer (SBC) Mobile Raspberry Pi. Moreover, these clusters offer facilities for storage as well as computing. Besides the fact that the regular data centers require large amounts of energy for operation, they also need cooling equipment and occupy prime real estate. However, this energy consumption scenario and the physical space constraints can be solved by employing a Mobile Raspberry Pi with Hadoop clusters that provides a cost-effective, low-power, high-speed solution along with micro-data center support for big data. Hadoop provides the required modules for the distributed processing of big data by deploying map-reduce programming approaches. In this work, the performance of SBC clusters and a single computer were compared. It can be observed from the experimental data that the SBC clusters exemplify superior performance to a single computer, by around 20%. Furthermore, the cluster processing speed for large volumes of data can be enhanced by escalating the number of SBC nodes. Data storage is accomplished by using a Hadoop Distributed File System (HDFS), which offers more flexibility and greater scalability than a single computer system.

분산 컴퓨팅 환경에서의 제어/감시 시스템 개발을 위한 기반 구조 설계 (A Design of Infrastructure for Control/Monitoring System in the Distributed Computing Environment)

  • 이원구;박재현
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2000년도 제15차 학술회의논문집
    • /
    • pp.547-547
    • /
    • 2000
  • Recently, due to the advance of computer, network and Internet technology, control/monitoring systems are required to process the massive data, At the same time, the software development environment uses more and more component-based methodology. This paper proposes the services for the control /monitoring domain. Especially we define domain-specific interfaces and categories to acquire compatibility between products, and implement architecture for lightweight event service. As it is very important to support compatibility between heterogeneous systems, the proposed system provides modules for the web service and communication protocols based on the XML. And as proposed architecture consists of cluster of servers and Windows 2000's NLB service, it can guarantee more stable operation,

  • PDF

SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법 (Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter)

  • 윤다빈;김덕환
    • 한국차세대컴퓨팅학회논문지
    • /
    • 제14권5호
    • /
    • pp.60-70
    • /
    • 2018
  • 클라우드 환경에서 다수의 사용자가 물리적 서버를 가상화하여 사용할 수 있도록 편의성을 제공하는 Software Defined Storage(SDS)를 적용하고 있지만 한정된 물리적 자원을 고려하여 공간 효율성을 최적화하는 솔루션이 필요하다. 기존의 데이터 중복제거 시스템에서는 서로 다른 스토리지에 업로드 된 중복 데이터가 중복제거되기 어렵다는 단점이 있다. 본 논문에서는 유사도기반 클러스터링과 다중 계층 블룸 필터를 적용한 분산 중복제거 기법을 제안한다. 라빈 해시를 이용하여 가상 머신 서버들 간의 유사도를 판단하고 유사도가 높은 가상머신들을 클러스터 함으로써 개별 스토리지 노드별 중복제거 효율에 비하여 성능을 향상시킨다. 또한 중복제거 프로세스에 다중 계층 블룸 필터를 접목하여 처리 시간을 단축하고 긍정오류를 감소시킬 수 있다. 실험결과 제안한 방법은 IP주소 기반 클러스터를 이용한 중복제거 기법에 비해 처리 시간의 차이가 없으면서, 중복제거율이 9% 높아짐을 확인하였다.