Browse > Article
http://dx.doi.org/10.3745/KTCCS.2020.9.9.207

Lambda Architecture Used Apache Kudu and Impala  

Hwang, Yun-Young (숭실대학교 컴퓨터학과)
Lee, Pil-Won (숭실대학교 컴퓨터학과)
Shin, Yong-Tae (숭실대학교 컴퓨터학부)
Publication Information
KIPS Transactions on Computer and Communication Systems / v.9, no.9, 2020 , pp. 207-212 More about this Journal
Abstract
The amount of data has increased significantly due to advances in technology, and various big data processing platforms are emerging, to handle it. Among them, the most widely used platform is Hadoop developed by the Apache Software Foundation, and Hadoop is also used in the IoT field. However, the existing Hadoop-based IoT sensor data collection and analysis environment has a problem of overloading the name node due to HDFS' Small File, which is Hadoop's core project, and it is impossible to update or delete the imported data. This paper uses Apache Kudu and Impala to design Lambda Architecture. The proposed Architecture classifies IoT sensor data into Cold-Data and Hot-Data, stores it in storage according to each personality, and uses Batch-View created through Batch and Real-time View generated through Apache Kudu and Impala to solve problems in the existing Hadoop-based IoT sensor data collection analysis environment and shorten the time users access to the analyzed data.
Keywords
Apache Hadoop; HDFS; Apahce Kudu; Apache Impala; Lambda Architecture; IoT;
Citations & Related Records
연도 인용수 순위
  • Reference
1 S. Bende and R. Shedge, "Dealing with small files problem in hadoop distributed file system," Procedia Computer Science, Vol.79, pp.1001-1012, 2016.   DOI
2 M. Kiran, P. Murphy, I. Monga, J. Dugan, and S. Baveja "Lambda architecture for cost- effective batch and speed big data processing," In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp.2785-2792, 2015.
3 T. Lipcon, D. Alves, D. Burkert, J.Cryans, A. Dembo, M. Percy, S. Rus, D. Wang, M. Bertozzi, C. McCabe, and A. Wang "Kudu: Storage for fast analytics on fast data," Cloudera, inc, Vol.28, 2015.
4 M. Kornacker and J. Erickson, "Cloudera impala: Real time queries in apache hadoop, for real," Ht Tpblog Cloudera Comblog201210cloudera-Impala-Real-Time-Queries- Apache-Hadoop--Real, 2012.