• Title/Summary/Keyword: Hadoop project

Search Result 7, Processing Time 0.021 seconds

A Novel Method of Improving Cache Hit-rate in Hadoop MapReduce using SSD Cache

  • Kim, Jong-Chan;An, Jae-Hoon;Kim, Young-Hwan;Jeon, Ki-Man
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.8
    • /
    • pp.1-6
    • /
    • 2015
  • The MapReduce Program of Hadoop Distributed File System operates on any unspecified nodes due to distributed-parallel process and block replicate for data stability. Since it is difficult to guarantee the cache locality when a Solid State Drive is used as a cache in hadoop, cache hit-rate is decreased. In this paper, we suggest a method to improve cache hit rate by pre-loading the input data of the MapReduce onto the SSD cache. To perform this method, we estimated the blocks that are used on each node by using capacity scheduler and block metadata. Eventually we could increase the performance of SSD cache by loading the blocks onto SSD cache before the Map Task run.

Performance Comparison of DW System Tajo Based on Hadoop and Relational DBMS (하둡 기반 DW시스템 타조와 관계형 DBMS의 성능 비교)

  • Liu, Chen;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.349-354
    • /
    • 2014
  • Since Hadoop which is the Big-data processing platform was announced, SQL-on-Hadoop is the spotlight as the technique to analyze data using SQL on Hadoop. Tajo created by Korean programmers has recently been promoted to Top-Level-Project status by the Apache in April and has been paid attention all around world. Despite a sensible change caused by Hadoop's appearance in DW market, researches of those performance is insufficient. Thus, this study has been conducted to help choose a DW solution based on SQL-on-Hadoop as progressing the test on comparison analysis of RDBMS and Tajo. It has shown that Tajo based on Hadoop is more superior than RDBMS if it is used with accurate strategy. In addition, open-source project Tajo is expected not only to achieve improvements in technique due to active participation of many developers but also to be in charge of an important role of DW in the filed of data analysis.

Lambda Architecture Used Apache Kudu and Impala (Apache Kudu와 Impala를 활용한 Lambda Architecture 설계)

  • Hwang, Yun-Young;Lee, Pil-Won;Shin, Yong-Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.207-212
    • /
    • 2020
  • The amount of data has increased significantly due to advances in technology, and various big data processing platforms are emerging, to handle it. Among them, the most widely used platform is Hadoop developed by the Apache Software Foundation, and Hadoop is also used in the IoT field. However, the existing Hadoop-based IoT sensor data collection and analysis environment has a problem of overloading the name node due to HDFS' Small File, which is Hadoop's core project, and it is impossible to update or delete the imported data. This paper uses Apache Kudu and Impala to design Lambda Architecture. The proposed Architecture classifies IoT sensor data into Cold-Data and Hot-Data, stores it in storage according to each personality, and uses Batch-View created through Batch and Real-time View generated through Apache Kudu and Impala to solve problems in the existing Hadoop-based IoT sensor data collection analysis environment and shorten the time users access to the analyzed data.

Addressing Big Data solution enabled Connected Vehicle services using Hadoop (Hadoop을 이용한 스마트 자동차 서비스용 빅 데이터 솔루션 개발)

  • Nkenyereye, Lionel;Jang, Jong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.3
    • /
    • pp.607-612
    • /
    • 2015
  • As the amount of vehicle's diagnostics data increases, the actors in automotive ecosystem will encounter difficulties to perform a real time analysis in order to simulate or to design new services according to the data gathered from the connected cars. In this paper, we have conducted a study of a Big Data solution that expresses the essential deep analytics to process and analyze vast quantities of vehicles on board diagnostics data generated by cars. Hadoop and its ecosystems have been deployed to process a large data and delivered useful outcomes that may be used by actors in automotive ecosystem to deliver new services to car owners. As the Intelligent transport system is involved to guarantee safety, reduce rate of crash and injured in the accident due to speed, addressing big data solution based on vehicle diagnostics data is upcoming to monitor real time outcome from it and making collection of data from several connected cars, facilitating reliable processing and easier storage of data collected.

Search for a user-centered system design and implementation (사용자 중심 검색 시스템 설계 및 구현)

  • Kim, A-Yong;Park, Man-Seub;Kim, Jong-Moon;Jeong, Dae-Jin;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.619-621
    • /
    • 2014
  • addition to the advances in information technology and the latest IT technology for their issue. To enable users who are using the Web to find need the information your search data they're sifting through about how many are struggling. In this paper, we propose a user-centered search system. Lucene search system to offer Hadoop's MapReduce with the Apache project Nutch, Solr, HDFS, utilizing design and implementation. This is the Web search users who wish to use depending on the intentions of the data that you want to collect and index information will be utilized in the search field.

  • PDF

An Implementation of Web-Enabled OLAP Server in Korean HealthCare BigData Platform (한국 보건의료 빅데이터 플랫폼에서 웹 기반 OLAP 서버 구현)

  • Ly, Pichponreay;Kim, jin-hyuk;Jung, seung-hyun;Lee, kyung-hee Lee;Cho, wan-sup
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2017.05a
    • /
    • pp.33-34
    • /
    • 2017
  • In 2015, Ministry of Health and Welfare of Korea announced a research and development plan of using Korean healthcare data to support decision making, reduce cost and enhance a better treatment. This project relies on the adoption of BigData technology such as Apache Hadoop, Apache Spark to store and process HealthCare Data from various institution. Here we present an approach a design and implementation of OLAP server in Korean HealthCare BigData platform. This approach is used to establish a basis for promoting personalized healthcare research for decision making, forecasting disease and developing customized diagnosis and treatment.

  • PDF

Big data-based piping material analysis framework in offshore structure for contract design

  • Oh, Min-Jae;Roh, Myung-Il;Park, Sung-Woo;Chun, Do-Hyun;Myung, Sehyun
    • Ocean Systems Engineering
    • /
    • v.9 no.1
    • /
    • pp.79-95
    • /
    • 2019
  • The material analysis of an offshore structure is generally conducted in the contract design phase for the price quotation of a new offshore project. This analysis is conducted manually by an engineer, which is time-consuming and can lead to inaccurate results, because the data size from previous projects is too large, and there are so many materials to consider. In this study, the piping materials in an offshore structure are analyzed for contract design using a big data framework. The big data technologies used include HDFS (Hadoop Distributed File System) for data saving, Hive and HBase for the database to handle the saved data, Spark and Kylin for data processing, and Zeppelin for user interface and visualization. The analyzed results show that the proposed big data framework can reduce the efforts put toward contract design in the estimation of the piping material cost.