Browse > Article
http://dx.doi.org/10.36498/kbigdt.2021.6.1.145

Apache NiFi-based ETL Process for Building Data Lakes  

Lee, Kyoung Min ((주)코드와이즈)
Lee, Kyung-Hee ((주)빅데이터랩스)
Cho, Wan-Sup (충북대학교 경영정보학과)
Publication Information
The Journal of Bigdata / v.6, no.1, 2021 , pp. 145-151 More about this Journal
Abstract
In recent years, digital data has been generated in all areas of human activity, and there are many attempts to safely store and process the data to develop useful services. A data lake refers to a data repository that is independent of the source of the data and the analytical framework that leverages the data. In this paper, we designed a tool to safely store various big data generated by smart cities in a data lake and ETL it so that it can be used in services, and a web-based tool necessary to use it effectively. Implement. A series of processes (ETLs) that quality-check and refine source data, store it safely in a data lake, and manage it according to data life cycle policies are often significant for costly infrastructure and development and maintenance. It is a labor-intensive technology. The mounting technology makes it possible to set and execute ETL work monitoring and data life cycle management visually and efficiently without specialized knowledge in the IT field. Separately, a data quality checklist guide is needed to store and use reliable data in the data lake. In addition, it is necessary to set and reserve data migration and deletion cycles using the data life cycle management tool to reduce data management costs.
Keywords
Smart City; Data Lake; ETL; NiFi; Bigdata;
Citations & Related Records
연도 인용수 순위
  • Reference
1 이경민, "스마트시티를 위한 데이터 레이크의 ETL 프로세스 설계 및 구현", 충북대학교 석사학위논문, 2020.
2 김정욱, 최연석, 권준철, 부창진, "스마트시티", 제주, 제주대학교출판부, 2015.
3 삼정KPMG 경제연구원, "데이터 중심의 도시 운영, Data-Driven 스마트 시티를 주목하라", 삼정PKMG 경제연구원, 제103호, 2019.
4 최종근, "데이터 마이그레이션을 위한 오픈소스 ETL도구 평가", 숭실대학교 정보과학대학원, 2011.
5 Alapati Sam R, "Expert Hadoop Administration: Managing, Tuning, and Securing Spark, YARN, and HDFS", Boston, MA: Addison Wesley, 2016
6 Xplent, https://www.xplenty.com/
7 Talend, https://www.talend.com/
8 Stitch, https://www.stitchdata.com/
9 Informatica Powercenter, https://www.informatica.com/products/data-integration/powercenter.html
10 Pogiatzis, A.; Samakovitis, G. "An Event-Driven Serverless ETL Pipeline on AWS". Appl. Sci. 2021, 11, 191.   DOI