• Title/Summary/Keyword: Cluster computing environment

Search Result 109, Processing Time 0.03 seconds

Design and Verification of Connected Data Architecture Concept employing DataLake Framework over Abyss Storage Cluster (Abyss Storage Cluster 기반 DataLake Framework의 Connected Data Architecture 개념 설계 및 검증)

  • Cha, ByungRae;Cha, Yun-Seok;Park, Sun;Shin, Byeong-Chun;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.3
    • /
    • pp.57-63
    • /
    • 2018
  • With many types of data generated in the shift of business environment as a result of growth of an organization or enterprise, there is a need to improve the data-processing efficiency in smarter means with a single domain model such as Data Lake. In particular, creating a logical single domain model from physical partitioned multi-site data by the finite resources of nature and shared economy is very important in terms of efficient operation of computing resources. Based on the advantages of the existing Data Lake framework, we define the CDA-Concept (connected data architecture concept) and functions of Data Lake Framework over Abyss Storage for integrating multiple sites in various application domains and managing the data lifecycle. Also, it performs the interface design and validation verification for Interface #2 & #3 of the connected data architecture-concept.

Grid-based Energy Efficient Routing Protocol for Sensor Networks (센서 네트워크를 위한 그리드 기반의 에너지 효율절인 라우팅 프로토콜)

  • Jung, Sung-Young;Lee, Dong-Wook;Kim, Jai-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.2
    • /
    • pp.216-220
    • /
    • 2008
  • Sensor nodes in wireless network have several limitations such as lack of energy resource and network bandwidth. There are many researches to extend lifetime of sensor network and enhance availability. However, most of the previous researches didn't consider the mobile sink node. Those researches aren't suitable in the environment having mobile sinks. In this paper. we propose a scheme that reduces communication overheads and energy consumptions and improves reliability in routing path setup. Proposed scheme has excellent scalability without degrading performance in environment where many sink nodes exist and/or the network size is huge. Proposed scheme saves the energy consumption up to 70% in comparison with the previous grid-based and cluster-based protocol. As a result, proposed scheme increases the lifetime of sensor network and enhances availability of wireless sensor network.

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Workflow-based Bio Data Analysis System for HPC (HPC 환경을 위한 워크플로우 기반의 바이오 데이터 분석 시스템)

  • Ahn, Shinyoung;Kim, ByoungSeob;Choi, Hyun-Hwa;Jeon, Seunghyub;Bae, Seungjo;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.97-106
    • /
    • 2013
  • Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.

Distributed Assumption-Based Truth Maintenance System for Scalable Reasoning (대용량 추론을 위한 분산환경에서의 가정기반진리관리시스템)

  • Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1115-1123
    • /
    • 2016
  • Assumption-based truth maintenance system (ATMS) is a tool that maintains the reasoning process of inference engine. It also supports non-monotonic reasoning based on dependency-directed backtracking. Bookkeeping all the reasoning processes allows it to quickly check and retract beliefs and efficiently provide solutions for problems with large search space. However, the amount of data has been exponentially grown recently, making it impossible to use a single machine for solving large-scale problems. The maintaining process for solving such problems can lead to high computation cost due to large memory overhead. To overcome this drawback, this paper presents an approach towards incrementally maintaining the reasoning process of inference engine on cluster using Spark. It maintains data dependencies such as assumption, label, environment and justification on a cluster of machines in parallel and efficiently updates changes in a large amount of inferred datasets. We deployed the proposed ATMS on a cluster with 5 machines, conducted OWL/RDFS reasoning over University benchmark data (LUBM) and evaluated our system in terms of its performance and functionalities such as assertion, explanation and retraction. In our experiments, the proposed system performed the operations in a reasonably short period of time for over 80GB inferred LUBM2000 dataset.

A Reconfigurable Load and Performance Balancing Scheme for Parallel Loops in a Clustered Computing Environment (클러스터 컴퓨팅 환경에서 병렬루프 처리를 위한 재구성 가능한 부하 및 성능 균형 방법)

  • 김태형
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.1
    • /
    • pp.49-56
    • /
    • 2004
  • Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in a network of workstations because of order-of-magniture differences in communication overheads. Moreover, improvements of basis loops scheduling methods have not effectively dealt with irregularly distributed workloads in parallel loops, which commonly occur in applications for a network of workstation. In this paper, we present a new reconfigurable and decentralized balancing method for parallel loops on a network of workstations. Since our method supplements performance balancing with those tranditional load balancing methods, it minimizes the overall execution time.

A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku;Jeong, Changsung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2827-2848
    • /
    • 2012
  • Previously, we described a social media cloud computing service environment (SMCCSE). This SMCCSE supports the development of social networking services (SNSs) that include audio, image, and video formats. A social media cloud computing PaaS platform, a core component in a SMCCSE, processes large amounts of social media in a parallel and distributed manner for supporting a reliable SNS. Here, we propose a Hadoop-based multimedia system for image and video transcoding processing, necessary functions of our PaaS platform. Our system consists of two modules, including an image transcoding module and a video transcoding module. We also design and implement the system by using a MapReduce framework running on a Hadoop Distributed File System (HDFS) and the media processing libraries Xuggler and JAI. In this way, our system exponentially reduces the encoding time for transcoding large amounts of image and video files into specific formats depending on user-requested options (such as resolution, bit rate, and frame rate). In order to evaluate system performance, we measure the total image and video transcoding time for image and video data sets, respectively, under various experimental conditions. In addition, we compare the video transcoding performance of our cloud-based approach with that of the traditional frame-level parallel processing-based approach. Based on experiments performed on a 28-node cluster, the proposed Hadoop-based multimedia transcoding system delivers excellent speed and quality.

Implementation of Data processing of the High Availability for Software Architecture of the Cloud Computing (클라우드 서비스를 위한 고가용성 대용량 데이터 처리 아키텍쳐)

  • Lee, Byoung-Yup;Park, Junho;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.32-43
    • /
    • 2013
  • These days, there are more and more IT research institutions which foresee cloud services as the predominant IT service in the near future and there, in fact, are actual cloud services provided by some IT leading vendors. Regardless of physical location of the service and environment of the system, cloud service can provide users with storage services, usage of data and software. On the other hand, cloud service has challenges as well. Even though cloud service has its edge in terms of the extent to which the IT resource can be freely utilized regardless of the confinement of hardware, the availability is another problem to be solved. Hence, this paper is dedicated to tackle the aforementioned issues; prerequisites of cloud computing for distributed file system, open source based Hadoop distributed file system, in-memory database technology and high availability database system. Also the author tries to body out the high availability mass distributed data management architecture in cloud service's perspective using currently used distributed file system in cloud computing market.

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark (Spark 기반 공간 분석에서 공간 분할의 성능 비교)

  • Yang, Pyoung Woo;Yoo, Ki Hyun;Nam, Kwang Woo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.25 no.1
    • /
    • pp.29-36
    • /
    • 2017
  • In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

Design of MAHA Supercomputing System for Human Genome Analysis (대용량 유전체 분석을 위한 고성능 컴퓨팅 시스템 MAHA)

  • Kim, Young Woo;Kim, Hong-Yeon;Bae, Seungjo;Kim, Hag-Young;Woo, Young-Choon;Park, Soo-Jun;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.81-90
    • /
    • 2013
  • During the past decade, many changes and attempts have been tried and are continued developing new technologies in the computing area. The brick wall in computing area, especially power wall, changes computing paradigm from computing hardwares including processor and system architecture to programming environment and application usage. The high performance computing (HPC) area, especially, has been experienced catastrophic changes, and it is now considered as a key to the national competitiveness. In the late 2000's, many leading countries rushed to develop Exascale supercomputing systems, and as a results tens of PetaFLOPS system are prevalent now. In Korea, ICT is well developed and Korea is considered as a one of leading countries in the world, but not for supercomputing area. In this paper, we describe architecture design of MAHA supercomputing system which is aimed to develop 300 TeraFLOPS system for bio-informatics applications like human genome analysis and protein-protein docking. MAHA supercomputing system is consists of four major parts - computing hardware, file system, system software and bio-applications. MAHA supercomputing system is designed to utilize heterogeneous computing accelerators (co-processors like GPGPUs and MICs) to get more performance/$, performance/area, and performance/power. To provide high speed data movement and large capacity, MAHA file system is designed to have asymmetric cluster architecture, and consists of metadata server, data server, and client file system on top of SSD and MAID storage servers. MAHA system softwares are designed to provide user-friendliness and easy-to-use based on integrated system management component - like Bio Workflow management, Integrated Cluster management and Heterogeneous Resource management. MAHA supercomputing system was first installed in Dec., 2011. The theoretical performance of MAHA system was 50 TeraFLOPS and measured performance of 30.3 TeraFLOPS with 32 computing nodes. MAHA system will be upgraded to have 100 TeraFLOPS performance at Jan., 2013.