Draft Design of DataLake Framework based on Abyss Storage Cluster

Cha, ByungRae;Park, Sun;Shin, Byeong-Chun;Kim, JongWon;

doi:10.30693/SMJ.2018.7.1.9

Smart Media Journal (스마트미디어저널)

Volume 7 Issue 1
/
Pages.9-15
/
2018
/
2287-1322(pISSN)
/
2288-9671(eISSN)

THE KOREAN INSTITUTE OF SMART MEDIA (한국스마트미디어학회)

DOI QR Code

Draft Design of DataLake Framework based on Abyss Storage Cluster

Abyss Storage Cluster 기반의 DataLake Framework의 설계

차병래 (광주과학기술원 전지전자컴퓨터공학부) ;
박선 (광주과학기술원 전지전자컴퓨터공학부) ;
신병춘 (전남대학교 수학과) ;
김종원 (광주과학기술원 전지전자컴퓨터공학부)

Received : 2018.01.08
Accepted : 2018.02.20
Published : 2018.03.31

https://doi.org/10.30693/SMJ.2018.7.1.9 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

As an organization or organization grows in size, many different types of data are being generated in different systems. There is a need for a way to improve efficiency by processing data smarter in different systems. Just like DataLake, we are creating a single domain model that accurately describes the data and can represent the most important data for the entire business. In order to realize the benefits of a DataLake, it is import to know how a DataLake may be expected to work and what components architecturally may help to build a fully functional DataLake. DataLake components have a life cycle according to the data flow. And while th data flows into a DataLake from the point of acquisition, its meta-data is captured and managed along with data traceability, data lineage, and security aspects based on data sensitivity across its life cycle. According to this reason, we have designed the DataLake Framework based on Abyss Storage Cluster.

기관 또는 조직은 비즈니스 시스템의 규모가 커지면서 이들과 관련된 서로 다른 시스템에서 다양한 대량의 데이터들이 생성되고 있다. 이와 같이 비즈니스 환경에서 서로 다른 시스템에서 데이터를 보다 스마트하게 처리하여 효율성을 높일 수 있는 방법이 필요하다. 이를 위한 가장 기본적인 접근 방법 중 하나는 DataLake와 같이 데이터를 정확하게 설명하고 전체 비즈니스에 대한 가장 중요한 데이터를 나타낼 수 있는 단일 도메인 모델을 만드는 것이다. DataLake의 장점을 구현하기 위해서는 다양하게 요구되어진 기능들을 어떤 구조로, 어떻게 작동 할 것인지에 대한 DataLake의 구성 요소들을 정의하는 게 중요하며, DataLake의 구성 요소들에 의해서 데이터 흐름에 따른 라이프 사이클을 갖게 된다. 또한 데이터 획득 시점에서 DataLake로 유입되는 동안 메타 데이터는 데이터 추적 가능성, 데이터 계보 및 라이프 사이클 전반의 데이터 민감도에 기반 한 보안 측면과 함께 캡처 및 관리되어야 하며, 이러한 이유로 Abyss Storage Cluster 기반의 DataLake Framework를 설계하였다.

Keywords

데이터레이크 프레임워크;

References

Tomcy John and Pankaj Misra, "Data Lake for Enterprises - Leveraging Lambda Architecture for Building Enterprise Data Lake," Packt Publishing, May 2017.
IBM의 빅데이터 정의, http://www.ibmbigdatahub.com/infographic/four-vs-big-data
장동인, "빅데이터로 일하는 기술," 한빛미디어, 2014년 12월 16일.
Mike barlow, "Real-Time Big Data Analytics: Emerging Architecture," 1st Edition, O'Reilly, Feb. 2013.
Pradeep Pasupuleti, Beulah Salome Purra, "Data Lake Development with Big Data," PACKT Publishing, 2015.
John Mallory and Robbie Wright, "Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility," Amazon Web Service, July 2017.
AWS, http://docs.aws.amazon.com/solutions/latest/data-lake- solution/architecture.html
AWS, https://aws.amazon.com/ko/big-data/datalake-on-aws/
차윤석 외 4인, "Abyss Storage의 Disk 타입에 의한 Ceph RADOS의 Benchmarking," 2017 한국통신학회 동계학술대회.
차병래 외 4인, "대용량 Abyss Storage의 KOREN 네트워크 기반 국내 및 해외 실증 테스트," 스마트미디어학회저널 Vol.6, no.1, pp.9-15, 2017년 3월호.
Lambda Architecture, http://searchbusinessanalytics.techtarget.com/definition/Lambda-architecture
차병래 외 4인, "Idea Sketch to Improvement Image Learning based on Machine Learning using Topology Theory," SMA 2017.
Cloud Bursting, http://searchcloudcomputing.techtarget.com/definition/cloud-bursting
Cloud Spanning, http://searchcloudcomputing.techtarget.com/definition/cloud-spanning
R, https://www.r-project.org/

Smart Media Journal (스마트미디어저널)

Draft Design of DataLake Framework based on Abyss Storage Cluster

Abyss Storage Cluster 기반의 DataLake Framework의 설계

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)