• Title/Summary/Keyword: DataLake Framework

Search Result 8, Processing Time 0.029 seconds

Draft Design of DataLake Framework based on Abyss Storage Cluster (Abyss Storage Cluster 기반의 DataLake Framework의 설계)

  • Cha, ByungRae;Park, Sun;Shin, Byeong-Chun;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.1
    • /
    • pp.9-15
    • /
    • 2018
  • As an organization or organization grows in size, many different types of data are being generated in different systems. There is a need for a way to improve efficiency by processing data smarter in different systems. Just like DataLake, we are creating a single domain model that accurately describes the data and can represent the most important data for the entire business. In order to realize the benefits of a DataLake, it is import to know how a DataLake may be expected to work and what components architecturally may help to build a fully functional DataLake. DataLake components have a life cycle according to the data flow. And while th data flows into a DataLake from the point of acquisition, its meta-data is captured and managed along with data traceability, data lineage, and security aspects based on data sensitivity across its life cycle. According to this reason, we have designed the DataLake Framework based on Abyss Storage Cluster.

Design and Verification of Connected Data Architecture Concept employing DataLake Framework over Abyss Storage Cluster (Abyss Storage Cluster 기반 DataLake Framework의 Connected Data Architecture 개념 설계 및 검증)

  • Cha, ByungRae;Cha, Yun-Seok;Park, Sun;Shin, Byeong-Chun;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.3
    • /
    • pp.57-63
    • /
    • 2018
  • With many types of data generated in the shift of business environment as a result of growth of an organization or enterprise, there is a need to improve the data-processing efficiency in smarter means with a single domain model such as Data Lake. In particular, creating a logical single domain model from physical partitioned multi-site data by the finite resources of nature and shared economy is very important in terms of efficient operation of computing resources. Based on the advantages of the existing Data Lake framework, we define the CDA-Concept (connected data architecture concept) and functions of Data Lake Framework over Abyss Storage for integrating multiple sites in various application domains and managing the data lifecycle. Also, it performs the interface design and validation verification for Interface #2 & #3 of the connected data architecture-concept.

Draft Design of AI Services through Concept Extension of Connected Data Architecture (Connected Data Architecture 개념의 확장을 통한 AI 서비스 초안 설계)

  • Cha, ByungRae;Park, Sun;Oh, Su-Yeol;Kim, JongWon
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.30-36
    • /
    • 2018
  • Single domain model like DataLake framework is in spotlight because it can improve data efficiency and process data smarter in big data environment, where large scaled business system generates huge amount of data. In particular, efficient operation of network, storage, and computing resources in logical single domain model is very important for physically partitioned multi-site data process. Based on the advantages of Data Lake framework, we define and extend the concept of Connected Data Architecture and functions of DataLake framework for integrating multiple sites in various domains and managing the lifecycle of data. Also, we propose the design of CDA-based AI service and utilization scenarios in various application domain.

Apache NiFi-based ETL Process for Building Data Lakes (데이터 레이크 구축을 위한 Apache NiFi기반 ETL 프로세스)

  • Lee, Kyoung Min;Lee, Kyung-Hee;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.145-151
    • /
    • 2021
  • In recent years, digital data has been generated in all areas of human activity, and there are many attempts to safely store and process the data to develop useful services. A data lake refers to a data repository that is independent of the source of the data and the analytical framework that leverages the data. In this paper, we designed a tool to safely store various big data generated by smart cities in a data lake and ETL it so that it can be used in services, and a web-based tool necessary to use it effectively. Implement. A series of processes (ETLs) that quality-check and refine source data, store it safely in a data lake, and manage it according to data life cycle policies are often significant for costly infrastructure and development and maintenance. It is a labor-intensive technology. The mounting technology makes it possible to set and execute ETL work monitoring and data life cycle management visually and efficiently without specialized knowledge in the IT field. Separately, a data quality checklist guide is needed to store and use reliable data in the data lake. In addition, it is necessary to set and reserve data migration and deletion cycles using the data life cycle management tool to reduce data management costs.

Application of a Decision Support System for Total Maximum Daily Loads (오염총량관리를 위한 의사결정 지원시스템 적용)

  • Lee, Hye-Young;Park, Seok-Soon
    • Journal of Korean Society on Water Environment
    • /
    • v.20 no.2
    • /
    • pp.151-156
    • /
    • 2004
  • A decision support system, Watershed Analysis Risk Management Framework(WARMF), was applied to the Kyungan Stream watershed, a tributary of Lake Paldang, for calculation of total maximum daily loads(TMDL). The WARMF system was developed by Systech Engineering, USA, and has been successfully used in several watersheds, for TMDL studies. The study area was divided into 14 sub-basins, based on digital elevation model(DEM). The integrated watershed and stream model of WARMF was validated by flow and BOD data measured during the year of 1999. There were reasonable agreements between model results and field data, both in water flow and BOD. The validated Kyungan WARMF was extensively utilized to study the quantitative relationship between waste loads and receiving water quality. Based on TMDL guideline at Paldang Lake and Kyungan Stream, the water quality criterion were set to be 3.0mg/L, 3.5mg/L, and 4.0mg/L at the watershed outlet. The allowable waste loads of BOD, both from point and non-point sources, were determined at each water quality criterion. From this study, it was concluded that the WARMF provided several advantages over the conventional application of watershed and stream models for TMDL study, such as time variable simulations, multiple possible soutions, and reduction loads for goal water quality, etc.

Three-dimensional Algal Dynamics Modeling Study in Lake Euiam Based on Limited Monitoring Data (제한된 측정 자료 기반 의암호 3차원 조류 예측 모델링 연구)

  • Choi, Jungkyu;Min, Joong-Hyuk;Kim, Deok-Woo
    • Journal of Korean Society on Water Environment
    • /
    • v.31 no.2
    • /
    • pp.181-195
    • /
    • 2015
  • Algal blooms in lakes are one of major environmental issues in Korea. A three-dimensional, hydrodynamic and water quality model was developed and tested in Lake Euiam to assess the performance and limitations of numerical modeling with multiple algal groups using field data commonly collected for algal management. In this study, EFDC was adopted as the basic model framework. Simulated vertical profiles of water temperature, dissolved oxygen and nutrients monitored at five water quality monitoring stations from March to October 2013, which are closely related to algal dynamics simulation, showed good agreement with those of observed data. The overall spatio-temporal variations of three algal groups were reasonably simulated against the chlorophyll-a levels of those estimated from the limited monitoring data (chlorophyll-a level and cell numbers of algal species) with the RMSEs ranging from 2.6 to $17.5mg/m^3$. Also, note that $PO_4-P$ level in the water column was a key limiting factor controlling the growth of three algal groups during most of simulation period. However, the algal modeling results were not fully attainable to the levels of observation during short periods of time showing abrupt increase in algae throughout the lake. In particular, the green algae/cyanobacteria and diatom simulations were underestimated in late June to early July and early October, respectively. The results shows that better understanding of internal algal processes, neglected in most algal modeling studies, is necessary to predict the sudden algal blooms more accurately because the concentrations of external $PO_4-P$ and specific algal groups originated from the tributaries (mainly, dam water releases) during the periods were too low to fully capture the sharp rise of internal algal levels. In this respect, this study suggests that future modeling efforts should be focused on the quantification of internal cycling processes including vertical movement of algal species with respect to changes in environmental conditions to enhance the modeling performance on complex algal dynamics.

Application of BASINS for the water quality prediction in rural watersheds - on HSPF model - (농촌유역의 수질예측을 위한 BASINS의 적용 - HSPF모형을 중심으로 -)

  • Ham, Jong-Hwa;Yoon, Chun-Gyeong
    • Proceedings of the Korean Society of Agricultural Engineers Conference
    • /
    • 2001.10a
    • /
    • pp.403-407
    • /
    • 2001
  • For the water quality management of stream and lake, it is important to estimate and control nonpoint source loading to meet the water quality standard. So, integrated watershed management is required. BASINS is a multipurpose environmental analysis system for use by regional, state, and local agencies in performing watershed and water quality based studies. BASINS was developed by the USEPA to facilitate examination of environmental information, to support analysis of environmental systems and to provide a framework for examining management alternatives. BASINS contains HSPF which is one of the watershed runoff model. By using HSPF, nonpoint source loading from upper stream watershed was estimated. As a result, the simulated runoff was in a good agreement with the observed data and indicated reasonable applicability for whole watershed.

  • PDF

Development and Evaluation of a Finding Meaning in Life CD Program for Life-esteemed Education of Older School-age Children (초등학생의 생명존중교육을 위한 삶의 의미발견 CD 프로그램 개발 및 효과)

  • Kang, Kyung-Ah;Kim, Shin-Jeong;Song, Mi-Kyung
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.17 no.3
    • /
    • pp.487-500
    • /
    • 2011
  • Purpose: The purpose of this study was to develop a finding meaning in life CD program about life-esteemed education and to identify the effect of the program. Methods: The life-esteemed education philosophy and the concepts of logotherapy were applied as a theoretical framework for this program. This program was developed through the process of planning, designing, developing, and evaluating with a content validity test. To identify the effect of the program, one experimental group design was applied to 54 students. Data were collected before the program started and one week and five weeks after the program finished. Results: The program was developed based on the students' needs and evaluation of the CD's content and consists of five periods: Dinosaur Park of Promise, Hill of Fragrance, Garden of Love, Forest of Acceptance, and My Lake. Each post-test score of knowledge, attitude, and practice on meaning of life was significantly higher than pretest scores. Conclusion: This program can be effective for life-esteemed education in elementary school students. Moreover, it is encouraged that the program will be utilized in more life-esteemed education for elementary school students.