• Title/Summary/Keyword: HBASE

Search Result 32, Processing Time 0.022 seconds

GPS Trajectory Big Data Map Matching System using HBase (HBase를 이용한 GPS궤적 빅데이터 맵매칭 시스템)

  • Cho, Wonhee;Choi, Eunmi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.125-128
    • /
    • 2015
  • 최근 GPS가 기본 탑재된 스마트폰이 활성화된 이후 대량의 GPS 궤적 데이터를 전자지도에 매칭하여 분석하는 요구가 대두되고 있다. 그러나 기존에 연구된 맵매칭 기법은 주로 내비게이션용 알고리즘으로 대량의 GPS궤적을 서버에서 분석하기에는 속도 및 시스템 성능의 이슈가 있다. 본 연구는 대표적인 분산 NoSQL DB인 하듐 에코시스템의 HBase를 이용한 맵매칭 시스템에 대한 연구이다. 맵매칭을 위한 전자지도를 HBase탑재하기 위한 테이블 사양을 정의하였고, HBase와 연동하여 분석하는 맵매칭 알고리즘을 제시하고 Java로 구현하여 분석하였다. 이를 통해 대량의 GPS궤적을 NoSQL 기반 방법론을 통하여 효율적으로 빅데이터를 분석하였다.

HBase-based Automatic Summary System using Twitter Trending Topics (트위터 트랜딩 토픽을 이용한 HBase 기반 자동 요약 시스템)

  • Lee, Sanghoon;Moon, Seung-Jin
    • Journal of Internet Computing and Services
    • /
    • v.15 no.5
    • /
    • pp.63-72
    • /
    • 2014
  • Twitter has been a popular social media platform where people post short messages of 140 characters or less via the web. A hashtag is a word or acronym created by Twitter users to open a discussion about certain topics and issues that have a very high percentage of trending. Since the hashtag posts are sorted by time, not relevancy, people who firstly use Twitter have had difficulty understanding their context. In this paper, we propose a HBase-based automatic summary system in order to reduce the difficulty of understanding. The proposed system combines an automatic summary method with a fuzzy system after storing the streaming data provided by Twitter API to the HBase. Throughout this procedure, we have eliminated the duplicate of contents in the hashtag posts and have computed scores between posts so that the users can access to the trending topics with relevancy.

Digital Forensic Investigation of HBase (HBase에 대한 디지털 포렌식 조사 기법 연구)

  • Park, Aran;Jeong, Doowon;Lee, Sang Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.2
    • /
    • pp.95-104
    • /
    • 2017
  • As the technology in smart device is growing and Social Network Services(SNS) are becoming more common, the data which is difficult to be processed by existing RDBMS are increasing. As a result of this, NoSQL databases are getting popular as an alternative for processing massive and unstructured data generated in real time. The demand for the technique of digital investigation of NoSQL databases is increasing as the businesses introducing NoSQL database in their system are increasing, although the technique of digital investigation of databases has been researched centered on RDMBS. New techniques of digital forensic investigation are needed as NoSQL Database has no schema to normalize and the storage method differs depending on the type of database and operation environment. Research on document-based database of NoSQL has been done but it is not applicable as itself to other types of NoSQL Database. Therefore, the way of operation and data model, grasp of operation environment, collection and analysis of artifacts and recovery technique of deleted data in HBase which is a NoSQL column-based database are presented in this paper. Also the proposed technique of digital forensic investigation to HBase is verified by an experimental scenario.

An Analysis of Utilization on Virtualized Computing Resource for Hadoop and HBase based Big Data Processing Applications (Hadoop과 HBase 기반의 빅 데이터 처리 응용을 위한 가상 컴퓨팅 자원 이용률 분석)

  • Cho, Nayun;Ku, Mino;Kim, Baul;Xuhua, Rui;Min, Dugki
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.4
    • /
    • pp.449-462
    • /
    • 2014
  • In big data era, there are a number of considerable parts in processing systems for capturing, storing, and analyzing stored or streaming data. Unlike traditional data handling systems, a big data processing system needs to concern the characteristics (format, velocity, and volume) of being handled data in the system. In this situation, virtualized computing platform is an emerging platform for handling big data effectively, since virtualization technology enables to manage computing resources dynamically and elastically with minimum efforts. In this paper, we analyze resource utilization of virtualized computing resources to discover suitable deployment models in Apache Hadoop and HBase-based big data processing environment. Consequently, Task Tracker service shows high CPU utilization and high Disk I/O overhead during MapReduce phases. Moreover, HRegion service indicates high network resource consumption for transfer the traffic data from DataNode to Task Tracker. DataNode shows high memory resource utilization and Disk I/O overhead for reading stored data.

The suggestion of new big data platform for the strengthening of privacy and enabled of big data (개인정보 보안강화 및 빅데이터 활성화를 위한 새로운 빅데이터 플랫폼 제시)

  • Song, Min-Gu
    • Journal of Digital Convergence
    • /
    • v.14 no.12
    • /
    • pp.155-164
    • /
    • 2016
  • In this paper, we investigate and analyze big data platform published at home and abroad. The results had a problem with personal information security on each platform. In particular, there was a vulnerability in the encryption of personal information stored in big data representative of HBase NoSQL DB that is commonly used for big data platform. However, data encryption and decryption cause the system load. In this paper, we propose a method of encryption with HBase, encryption and decryption systems, and methods for applying the personal information management system (PMIS) for each step of the way and big data platform to reduce the load on the network to communicate. And we propose a new big data platform that reflects this. Therefore, the proposed Big Data platform will greatly contribute to the activation of Big Data used to obtain personal information security and system performance efficiency.

A Study on Distributed Semantic Web Data Repository Using HBase (HBase를 이용한 분산 시맨틱 웹 데이터 저장소에 대한 연구)

  • Jo, Daewoong;Kim, Myung Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.111-114
    • /
    • 2012
  • 실시간으로 발생되는 대량의 데이터를 효율적으로 저장하기 위한 연구는 분산/병렬 처리를 위한 하둡 및 NoSQL과 관련한 빅 데이터 처리 기술을 통해 진행 중에 있다. 하지만 시맨틱 웹 분야에서 발생되는 대량의 데이터를 처리하기 위한 모델은 현재 연구가 진행되고 있지 않다. 본 논문에서는 시맨틱 웹 환경에서 발생되는 대량의 온톨로지 데이터를 빅 데이터 처리가 가능한 NoSQL 분야인 HBase 데이터베이스에 분산 저장할 수 있는 매핑 규칙을 제안한다. 이와 같은 매핑 규칙을 통해 시맨틱 웹 환경에서도 대량으로 발생될 수 있는 데이터들을 효율적으로 분산 저장 할 수 있다.

Video Big Data Processing Scheme for Spatio-Temporal Analysis of Moving Objects (움직이는 물체의 시공간 분석을 위한 동영상 빅 데이터 처리 방안)

  • Jung, Seungwon;Kim, Yongsung;Jung, Sangwon;Kim, Yoonki;Hwang, Eenjun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.833-836
    • /
    • 2017
  • 최근 블랙박스 및 CCTV 같은 영상 촬영 장치가 보편화되면서, 방대한 양의 영상 데이터가 실시간으로 생성되고 있다. 만약 이 대용량 데이터 안의 차량 정보를 추출할 수 있다면 범죄 차량 추적, 교통 혼잡도 측정 등의 활용이 가능할 것이다. 이를 구현하기 위해서는 수많은 자동차에서 실시간으로 생성되는 영상 데이터를 처리할 수 있는 시스템이 필수적이나, 이러한 시스템을 찾기 힘든 것이 현실이다. 이를 위해 이 논문에서는 아파치 카프카, Hbase를 이용한 영상 빅데이터 처리 시스템을 제안한다. 아파치 카프카는 시스템 내에서 영상 손실이 없는 전송과 영상 처리 노드의 스케줄링을 수행하며, Hbase는 처리된 데이터를 테이블로 저장하고 사용자가 보낸 쿼리를 처리한다. 더불어, Hbase에 인덱스를 구성하여 빠른 쿼리 처리가 가능하도록 만든다. 실험 결과, 제안된 시스템은 인덱스가 없을 때보다 뛰어난 쿼리 처리 성능을 보이는 것을 확인할 수 있었다.

A Design of SNS and Web Data Analysis System for Company Marketing Strategy (기업 마케팅 전략을 위한 SNS 및 Web 데이터 분석 시스템 설계)

  • Lee, ByungKwan;Jeong, EunHee;Jung, YiNa
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.4
    • /
    • pp.195-200
    • /
    • 2013
  • This paper proposes an SNS and Web Data Analytics System which can utilize a business marketing strategy by analyzing negative SNS and Web Data that can do great damage to a business image. It consists of the Data Collection Module collecting SNS and Web Data, the Hbase Module storing the collected data, the Data Analysis Module estimating and classifying the meaning of data after an semantic analysis of the collected data, and the PHS Module accomplishing an optimized Map Reduce by using SNS and Web data involved a Businesse. This paper can utilize this analysis result for a business marketing strategy by efficiently managing SNS and Web data with these modules.

Design and Implementation of Cloud-based Sensor Data Management System (클라우드 기반 센서 데이터 관리 시스템 설계 및 구현)

  • Park, Kyoung-Wook;Kim, Kyong-Og;Ban, Kyeong-Jin;Kim, Eung-Kon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.6
    • /
    • pp.672-677
    • /
    • 2010
  • Recently, the efficient management system for large-scale sensor data has been required due to the increasing deployment of large-scale sensor networks. In this paper, we propose a cloud-based sensor data management system with low cast, high scalability, and efficiency. Sensor data in sensor networks are transmitted to the cloud through a cloud-gateway. At this point, outlier detection and event processing is performed. Transmitted sensor data are stored in the Hadoop HBase, distributed column-oriented database, and processed in parallel by query processing module designed as the MapReduce model. The proposed system can be work with the application of a variety of platforms, because processed results are provided through REST-based web service.

Transaction Processing Method for NoSQL Based Column

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.6
    • /
    • pp.1575-1584
    • /
    • 2017
  • As interest in big data has increased recently, NoSQL, a solution for storing and processing big data, is getting attention. NoSQL supports high speed, high availability, and high scalability, but is limited in areas where data integrity is important because it does not support multiple row transactions. To overcome these drawbacks, many studies are underway to support multiple row transactions in NoSQL. However, existing studies have a disadvantage that the number of transactions that can be processed per unit of time is low and performance is degraded. Therefore, in this paper, we design and implement a multi-row transaction system for data integrity in big data environment based on HBase, a column-based NoSQL which is widely used recently. The multi-row transaction system efficiently performs multi-row transactions by adding columns to manage transaction information for every user table. In addition, it controls the execution, collision, and recovery of multiple row transactions through the transaction manager, and it communicates with HBase through the communication manager so that it can exchange information necessary for multiple row transactions. Finally, we performed a comparative performance evaluation with HAcid and Haeinsa, and verified the superiority of the multirow transaction system developed in this paper.