• Title/Summary/Keyword: Tajo

Search Result 5, Processing Time 0.024 seconds

Performance Comparison of DW System Tajo Based on Hadoop and Relational DBMS (하둡 기반 DW시스템 타조와 관계형 DBMS의 성능 비교)

  • Liu, Chen;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.349-354
    • /
    • 2014
  • Since Hadoop which is the Big-data processing platform was announced, SQL-on-Hadoop is the spotlight as the technique to analyze data using SQL on Hadoop. Tajo created by Korean programmers has recently been promoted to Top-Level-Project status by the Apache in April and has been paid attention all around world. Despite a sensible change caused by Hadoop's appearance in DW market, researches of those performance is insufficient. Thus, this study has been conducted to help choose a DW solution based on SQL-on-Hadoop as progressing the test on comparison analysis of RDBMS and Tajo. It has shown that Tajo based on Hadoop is more superior than RDBMS if it is used with accurate strategy. In addition, open-source project Tajo is expected not only to achieve improvements in technique due to active participation of many developers but also to be in charge of an important role of DW in the filed of data analysis.

External Merge Sorting in Tajo with Variable Server Configuration (매개변수 환경설정에 따른 타조의 외부합병정렬 성능 연구)

  • Lee, Jongbaeg;Kang, Woon-hak;Lee, Sang-won
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.820-826
    • /
    • 2016
  • There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Efficient Multimedia Data File Management and Retrieval Strategy on Big Data Processing System

  • Lee, Jae-Kyung;Shin, Su-Mi;Kim, Kyung-Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.8
    • /
    • pp.77-83
    • /
    • 2015
  • The storage and retrieval of multimedia data is becoming increasingly important in many application areas including record management, video(CCTV) management and Internet of Things (IoT). In these applications, the files containing multimedia that need to be stored and managed is tremendous and constantly scaling. In this paper, we propose a technique to retrieve a very large number of files, in multimedia format, using the Hadoop Framework. Our strategy is based on the management of metadata that describes the characteristic of files that are stored in Hadoop Distributed File System (HDFS). The metadata schema is represented in Hbase and looked up using SQL On Hadoop (Hive, Tajo). Both the Hbase, Hive and Tajo are part of the Hadoop Ecosystem. Preliminary experiment on multimedia data files stored in HDFS shows the viability of the proposed strategy.

Research on the Analysis System based on the Big Data for Matlab (Matlab을 활용한 빅데이터 기반 분석 시스템 연구)

  • Joo, Moon-il;Kim, Hee-cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.96-98
    • /
    • 2016
  • Recently, big data technology develop due to the rapid data generation. Thus big data analysis tools for analyzing big data has been developed. Typical big data tools are the R program, Hive, Tajo and more. But data analysis based on Matlab is still common used. And it is still used in big data analysis. In this paper, it research into big data analysis system based on the Matlab for analyzing vital signals.

  • PDF

Analysis of Foundation Procedure for Chosun Dynasty Based on Network (네트워크 기반 조선왕조 건국과정 분석)

  • Kim, Hak Yong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.582-591
    • /
    • 2015
  • Late-Koryeo people networks were constructed from four different history books that were written by various historic aspects in the period from king Kongmin to the final king of the Koryeo, Kongyang. All networks constructed in this study show scale free network properties as if most social networks do. Tajo-sillok preface is one of subjectively written history book that described personal history of the Lee Seong-gye and his ancestors. It is confirmed that the book is one of the most biased-written history books through network study. Jeong Do-jeon known as a Chosun dynasty projector is not greatly contributed for founding of a Chosun dynasty in network study and various historical documents as well. In this network study, we provide objective historical information in the historical situations of the late-Koryeo and during establishment procedure of Chosun dynasty. Hub nodes in network is denoted highly linked nodes, called degree. Stress centrality is a unit to measure positional importancy in the network. If we employ two factors, degree and stress centrality to determine hub node, it represents high connectivity and importancy as well. As comparing values of the degree and stress centrality, we elucidate more objective historical facts from late-Koryeo situations in this study. If we further develop and employ a new algorithm that is considered both degree and stress centrality, it is a very useful tool for determining hub node.