• Title/Summary/Keyword: Hive

Search Result 81, Processing Time 0.028 seconds

Research on the Analysis System based on the Big Data for Matlab (Matlab을 활용한 빅데이터 기반 분석 시스템 연구)

  • Joo, Moon-il;Kim, Hee-cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.96-98
    • /
    • 2016
  • Recently, big data technology develop due to the rapid data generation. Thus big data analysis tools for analyzing big data has been developed. Typical big data tools are the R program, Hive, Tajo and more. But data analysis based on Matlab is still common used. And it is still used in big data analysis. In this paper, it research into big data analysis system based on the Matlab for analyzing vital signals.

  • PDF

An Empirical Performance Analysis on Hadoop via Optimizing the Network Heartbeat Period

  • Lee, Jaehwan;Choi, June;Roh, Hongchan;Shin, Ji Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5252-5268
    • /
    • 2018
  • To support a large-scale Hadoop cluster, Hadoop heartbeat messages are designed to deliver the significant messages, including task scheduling and completion messages, via piggybacking to reduce the number of messages received by the NameNode. Although Hadoop is designed and optimized for high-throughput computing via batch processing, the real-time processing of large amounts of data in Hadoop is increasingly important. This paper evaluates Hadoop's performance and costs when the heartbeat period is controlled to support latency sensitive applications. Through an empirical study based on Hadoop 2.0 (YARN) architecture, we improve Hadoop's I/O performance as well as application performance by up to 13 percent compared to the default configuration. We offer a guideline that predicts the performance, costs and limitations of the total system by controlling the heartbeat period using simple equations. We show that Hive performance can be improved by tuning Hadoop's heartbeat periods through extensive experiments.

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.128-133
    • /
    • 2017
  • In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.

A Study on Possible Construction of Big Data Analysis System Applied to the Offline Market (오프라인 마켓에 적용 가능한 빅데이터 분석 시스템 구축 방안에 관한 연구)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of Digital Convergence
    • /
    • v.14 no.9
    • /
    • pp.317-323
    • /
    • 2016
  • Big Data is now seen as a major asset in the company's competitiveness, its influence in the future is expected to grow. Companies that recognize the importance are already actively engaged with Big Data in product development and marketing, which are increasingly applied across sectors of society, including politics, sports. However, lack of knowledge of the system implementation and high costs are still a big obstacles to the introduction of Big Data and systems. It is an objective in this study to build a Big Data system, which is based on open source Hadoop and Hive among Big Data systems, utilizing POS sales data of small and medium-sized offline markets. This approach of convergence is expected to improve existing sales systems that have been simply focusing on profit and loss analysis. It will also be able to use it as the basis for the decisions of the executive to enable prediction of the consumption patterns of customer preference and demand in advance.

Anomaly Detection of Hadoop Log Data Using Moving Average and 3-Sigma (이동 평균과 3-시그마를 이용한 하둡 로그 데이터의 이상 탐지)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae;Won, Hee-Sun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.283-288
    • /
    • 2016
  • In recent years, there have been many research efforts on Big Data, and many companies developed a variety of relevant products. Accordingly, we are able to store and analyze a large volume of log data, which have been difficult to be handled in the traditional computing environment. To handle a large volume of log data, which rapidly occur in multiple servers, in this paper we design a new data storage architecture to efficiently analyze those big log data through Apache Hive. We then design and implement anomaly detection methods, which identify abnormal status of servers from log data, based on moving average and 3-sigma techniques. We also show effectiveness of the proposed detection methods by demonstrating that our methods identifies anomalies correctly. These results show that our anomaly detection is an excellent approach for properly detecting anomalies from Hadoop log data.

Calibration of Apis Mellifera Hives for Pollination of Brassica Crop at Rawalpindi

  • ABBASI, Khalida Hamid;RAZZAQ, Asif;JAMAL, Muhammad;KHANUM, Saeeda;JAWAD, Khawer;ULLAH, Muhammad Arshad
    • The Korean Journal of Food & Health Convergence
    • /
    • v.6 no.2
    • /
    • pp.17-21
    • /
    • 2020
  • The response of honeybee (Apis mellifera L.) pollination on canola yield with reference to most suitable number of bee hive need per unit area of crops in order to meet optimum pollination needs and better economic yields by comparing number of hives and yield components an experiment was conducted at Beekeeping and Hill Fruit Pests Research, Station Rawalpindi during 2017-18 in complete randomized block design with two sets of four treatments for comparison: 1 hive acre-1, 2 hives acre-1, 3 hives acre-1 and 0 hive acre-1. The hives were kept inside the experimental area. Parameters were assessed: pollination density, pollinator's diversity, agronomic and economic yield. In case of pollination density, the cumulative mean abundance bee species revealed that at 1200 hours, Apis mellifera was the most abundant and frequent visitor with a mean population of 8.69 bees/plant followed by A. dorsata (0.72), Syrphid fly (0.2) and other pollinators. Minimum bee population was observed during 1400 hours, mainly due to the closure of flowers and partially due to high temperature (>35℃). Pollinator diversity revealed that A. mellifera was the most dominant pollinator of Brassica crop with highest abundance (71%). A. dosata ranked 2nd (16%) followed by A. florea (6%) respectively.

Big Data-based Medical Clinical Results Analysis (빅데이터 기반 의료 임상 결과 분석)

  • Hwang, Seung-Yeon;Park, Ji-Hun;Youn, Ha-Young;Kwak, Kwang-Jin;Park, Jeong-Min;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.187-195
    • /
    • 2019
  • Recently, it has become possible to collect, store, process, and analyze data generated in various fields by the development of the technology related to the big data. These big data technologies are used for clinical results analysis and the optimization of clinical trial design will reduce the costs associated with health care. Therefore, in this paper, we are going to analyze clinical results and present guidelines that can reduce the period and cost of clinical trials. First, we use Sqoop to collect clinical results data from relational databases and store in HDFS, and use Hive, a processing tool based on Hadoop, to process data. Finally we use R, a big data analysis tool that is widely used in various fields such as public sector or business, to analyze associations.

Development of Rapid Detection System for Small Hive Beetle (Aethina tumida) by using Ultra-Rapid PCR (초고속 유전자 증폭법을 이용한 벌집꼬마밑빠진벌레 (Aethina tumida)의 신속한 검출 기법 개발)

  • Kim, Jung-Min;Lim, Su-Jin;Tai, Truong A;Hong, Ki-Jeong;Yoon, Byoung-Su
    • Journal of Apiculture
    • /
    • v.32 no.2
    • /
    • pp.119-131
    • /
    • 2017
  • For the Rapid detection of small hive beetle (SHB; Aethina tumida) and for the mass-survey against SHB invasion, SHB-specific ultra-rapid PCR system was developed. Three different pairs of Aethina tumida-specific primers were deduced from cytochrome oxidase subunit I (COI) gene in mitochondrial DNA of SHB. Using optimized SHB-specific ultra-rapid PCR, $2.1{\times}10^1$ molecules of COI gene belonged to SHB could be detected specifically and quantitatively within 18 minutes 40 seconds. For the purpose of the application in apiary field, a DNA extraction method from bee debris was separatedly developed. When $10^5$ SHB-specific COI molecules (1/1000 body of SHB larvae) are existed in 1g of bee debris, it could be verified inner 10 minutes as qualitative and quantitative manner. SHB-specific ultra-rapid PCR we proposed would be expected to apply widely, either in apiary field or laboratory, for the rapid detections and the control against SHB-invasion.

IoT-based Smart Photo Frame Containing Widget and Security Functions(BeeHiveFrame) (위젯과 보안기능을 탑재한 IoT기반 스마트액자(BeeHiveFrame))

  • Kwon, Yong-Jin;Kim, Pan-Gyeom;Kim, Woo-Cheol;Park, Yea-Un;Kim, Bong-Jae;Hwang, Young-Sup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.880-881
    • /
    • 2016
  • 디지털액자가 고전적 액자의 향취를 주며 또한 사진을 바꿀 수 있는 기능도 제공하지만 아직 새 흐름이 되는 못했다. 그 이유는 비싼 가격과 사진을 전송하기가 불편하기 때문이다. 우리는 디지털 액자로 사진 전송을 쉽게 하고, 거기에 더하여 위젯과 보안 기능을 추가하는 연구를 하였다. 사진 전송을 위하여 AWS(Amazon Web Service) 서버를 사용하는데 AWS 서버는 언제 어디서나 원할 때면 사진을 WiFi로 전송할 수 있게 한다. 이는 현재 사용하는 USB나 SD 카드를 이용하여 디지털 사진을 전송하는 것보다 훨씬 편리하다. 우리의 디지털 액자를 사용하면 다른 사람과 사진 교환이 쉽고 따라서 가족, 친구, 동료 사이의 친밀감도 쉽게 높일 수 있다.