• Title/Summary/Keyword: 분산 인-메모리 프레임워크

Search Result 8, Processing Time 0.024 seconds

Distributed In-Memory Caching Method for ML Workload in Kubernetes (쿠버네티스에서 ML 워크로드를 위한 분산 인-메모리 캐싱 방법)

  • Dong-Hyeon Youn;Seokil Song
    • Journal of Platform Technology
    • /
    • v.11 no.4
    • /
    • pp.71-79
    • /
    • 2023
  • In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.

  • PDF

SWAT: A Study on the Efficient Integration of SWRL and ATMS based on a Distributed In-Memory System (SWAT: 분산 인-메모리 시스템 기반 SWRL과 ATMS의 효율적 결합 연구)

  • Jeon, Myung-Joong;Lee, Wan-Gon;Jagvaral, Batselem;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.45 no.2
    • /
    • pp.113-125
    • /
    • 2018
  • Recently, with the advent of the Big Data era, we have gained the capability of acquiring vast amounts of knowledge from various fields. The collected knowledge is expressed by well-formed formula and in particular, OWL, a standard language of ontology, is a typical form of well-formed formula. The symbolic reasoning is actively being studied using large amounts of ontology data for extracting intrinsic information. However, most studies of this reasoning support the restricted rule expression based on Description Logic and they have limited applicability to the real world. Moreover, knowledge management for inaccurate information is required, since knowledge inferred from the wrong information will also generate more incorrect information based on the dependencies between the inference rules. Therefore, this paper suggests that the SWAT, knowledge management system should be combined with the SWRL (Semantic Web Rule Language) reasoning based on ATMS (Assumption-based Truth Maintenance System). Moreover, this system was constructed by combining with SWRL reasoning and ATMS for managing large ontology data based on the distributed In-memory framework. Based on this, the ATMS monitoring system allows users to easily detect and correct wrong knowledge. We used the LUBM (Lehigh University Benchmark) dataset for evaluating the suggested method which is managing the knowledge through the retraction of the wrong SWRL inference data on large data.

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

Image Vector Extraction Method using Spark Framework for Image Retrieval System (이미지 검색 시스템을 위한 Spark 기반의 이미지 벡터 추출 기법)

  • Kim, Tae Yeon;Seo, HoJin;Lee, Young-Koo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.726-729
    • /
    • 2015
  • 최근 네트워크 및 카메라 모듈의 발전으로 인해 생성되는 이미지 데이터의 양이 대용량화 되고 있으며, 이미지 데이터를 이용한 이미지 검색 서비스가 제공되고 있다. 이미지 검색 서비스를 제공하기 위해 이미지 데이터베이스 구축이 요구된다. 효율적인 데이터베이스 구축을 위해 Bow 기법을 이용하여 데이터의 차수를 낮춘 후 이미지 벡터를 저장하는 방식을 사용한다. 그러나 이미지 데이터의 수가 급격히 증가하여 오랜 수행 시간을 요구한다. 본 논문에서 인-메모리 기반 분산 프레임워크인 스파크를 이용한 이미지 벡터 생성 과정을 분산 설계하였다. 실험을 통해 제안하는 분산 처리 기법이 기존방법에 비해 효율적임을 보인다.

Design of a Large-Scale Qualitative Spatial Reasoner Based on Hadoop Clusters (하둡 클러스터 기반의 대용량 정성 공간 추론기의 설계)

  • Kim, Jonghwan;Kim, Jonghoon;Kim, Incheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.1316-1319
    • /
    • 2015
  • 본 논문에서는 대규모 분산 병렬 컴퓨팅 환경인 하둡 클러스터 시스템을 이용하여, 공간 객체들 간의 위상 관계를 효율적으로 추론하는 대용량 정성 공간 추론기를 제안한다. 본 논문에서 제안하는 공간 추론기는 추론 작업의 순차성과 반복성을 고려하여, 작업들 간의 디스크 입출력을 최소화할 수 있는 인-메모리 기반의 아파치 스파크 프레임워크를 이용하여 개발하였다. 따라서 본 추론기에서는 추론의 대상이 되는 대용량 공간 지식들을 아파치 스파크의 분산 데이터 집합 형태인 PairRDD와 RDD로 변환하고, 이들에 대한 데이터 오퍼레이션들로 추론 작업들을 구현하였다. 또한, 본 추론기에서는 추론 시간의 많은 부분을 차지하는 이행 관계 추론에 필요한 조합표를 효과적으로 축소함으로써, 공간 추론 작업의 성능을 크게 향상시켰다. 대용량의 공간 지식 베이스를 이용한 성능 분석 실험을 통해, 본 논문에서 제안한 정성 공간 추론기의 높은 성능을 확인할 수 있었다.

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

Ontology and Sequential Rule Based Streaming Media Event Recognition (온톨로지 및 순서 규칙 기반 대용량 스트리밍 미디어 이벤트 인지)

  • Soh, Chi-Seung;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.470-479
    • /
    • 2016
  • As the number of various types of media data such as UCC (User Created Contents) increases, research is actively being carried out in many different fields so as to provide meaningful media services. Amidst these studies, a semantic web-based media classification approach has been proposed; however, it encounters some limitations in video classification because of its underlying ontology derived from meta-information such as video tag and title. In this paper, we define recognized objects in a video and activity that is composed of video objects in a shot, and introduce a reasoning approach based on description logic. We define sequential rules for a sequence of shots in a video and describe how to classify it. For processing the large amount of increasing media data, we utilize Spark streaming, and a distributed in-memory big data processing framework, and describe how to classify media data in parallel. To evaluate the efficiency of the proposed approach, we conducted an experiment using a large amount of media ontology extracted from Youtube videos.

Spatial Computation on Spark Using GPGPU (GPGPU를 활용한 스파크 기반 공간 연산)

  • Son, Chanseung;Kim, Daehee;Park, Neungsoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.8
    • /
    • pp.181-188
    • /
    • 2016
  • Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.