• Title/Summary/Keyword: NoSQL Database System

Design and Implementation of a Benchmarking System Based on ArangoDB (ArangoDB기반 벤치마킹 시스템 설계 및 구현)

  • Choi, Do-Jin;Baek, Yeon-Hee;Lee, So-Min;Kim, Yun-A;Kim, Nam-Young;Choi, Jae-Young;Lee, Hyeon-Byeong;Lim, Jong-Tae;Bok, Kyoung-Soo;Song, Seok-Il;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • v.21 no.9
    • pp.198-208
    • 2021
  • ArangoDB is a NoSQL database system that has been popularly utilized in many applications for storing large amounts of data. In order to apply a new NoSQL database system such as ArangoDB, to real work environments we need a benchmarking system that can evaluate its performance. In this paper, we design and implement a ArangoDB based benchmarking system that measures a kernel level performance well as an application level performance. We partially modify YCSB to measure the performance of a NoSQL database system in the cluster environment. We also define three real-world workload types by analyzing the existing materials. We prove the feasibility of the proposed system through the benchmarking of three workload types. We derive available workloads in ArangoDB and show that performance at the kernel layer as well as the application layer can be visualized through benchmarking of three workload types. It is expected that applicability and risk reviews will be possible through benchmarking of this system in environments that need to transfer data from the existing database engine to ArangoDB.

A Content-based Audio Retrieval System Supporting Efficient Expansion of Audio Database (음원 데이터베이스의 효율적 확장을 지원하는 내용 기반 음원 검색 시스템)

  • Park, Ji Hun;Kang, Hyunchul
    • Journal of Digital Contents Society
    • v.18 no.5
    • pp.811-820
    • 2017
  • For content-based audio retrieval which is one of main functions in audio service, the techniques for extracting fingerprints from the audio source, storing and indexing them in a database are widely used. However, if the fingerprints of new audio sources are continually inserted into the database, there is a problem that space efficiency as well as audio retrieval performance are gradually deteriorated. Therefore, there is a need for techniques to support efficient expansion of audio database without periodic reorganization of the database that would increase the system operation cost. In this paper, we design a content-based audio retrieval system that solves this problem by using MapReduce and NoSQL database in a cluster computing environment based on the Shazam's fingerprinting algorithm, and evaluate its performance through a detailed set of experiments using real world audio data.

Performance Comparison and Analysis between Open-Source DBMS (오픈소스 DBMS 성능비교분석)

  • Jang, Rae-Young;Bae, Jung-Min;Jung, Sung-Jae;Soh, Woo-Young;Sung, Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • 2014.10a
    • pp.805-808
    • 2014
  • The DBMS is a database management software system to access by people. It is an open source DBMS, such as MySQL and commercial services, such as ORACLE. Since MySQL has been acquired by Oracle, MariaDB released increase demand. NoSQL also are increasing, the trend is of interest, depending on the circumstances. Based on the same type of mass data, Depending on the performance comparison between the open source DBMS is required, and The study compared the performance between MariaDB and MongoDB. This paper proposes a DBMS for big data to process.

Finding Frequent Route of Taxi Trip Events Based on MapReduce and MongoDB (택시 데이터에 대한 효율적인 Top-K 빈도 검색)

  • Putri, Fadhilah Kurnia;An, Seonga;Purnaningtyas, Magdalena Trie;Jeong, Han-You;Kwon, Joonho
    • KIPS Transactions on Software and Data Engineering
    • v.4 no.9
    • pp.347-356
    • 2015
  • Due to the rapid development of IoT(Internet of Things) technology, traditional taxis are connected through dispatchers and location systems. Typically, modern taxis have embedded with GPS(Global Positioning System), which aims for obtaining the route information. By analyzing the frequency of taxi trip events, we can find the frequent route for a given query time. However, a scalability problem would occur when we convert the raw location data of taxi trip events into the analyzed frequency information due to the volume of location data. For this problem, we propose a NoSQL based top-K query system for taxi trip events. First, we analyze raw taxi trip events and extract frequencies of all routes. Then, we store the frequency information into hash-based index structure of MongoDB which is a document-oriented NoSQL database. Efficient top-K query processing for frequent route is done with the top of the MongoDB. We validate the efficiency of our algorithms by using real taxi trip events of New York City.

Development of the Design Methodology for Large-scale Data Warehouse based on MongoDB

  • Lee, Junho;Joo, Kyungsoo
    • Journal of the Korea Society of Computer and Information
    • v.23 no.3
    • pp.49-54
    • 2018
  • A data warehouse is a system that collectively manages and integrates data of a company. And provides the basis for decision making for management strategy. Nowadays, analysis data volumes are reaching critical size challenging traditional data ware housing approaches. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we extend the data warehouse design methodology based on relational database using star schema, and have developed a consistent design methodology from information requirement analysis to data warehouse construction for large scale data warehouse construction based on MongoDB, one of NoSQL.

Storage Benchmarking System Using NoSQL Database Engines (NoSQL 데이터베이스 엔진을 이용한 스토리지 벤치마킹 시스템)

  • Choi, do-jin;Park, soo-bin;Park, song-hee;Baek, yeon-hee;Shin, bo-kyoung;Choi, jae-yong;Park, jae-yeol;Lim, jong-tae;Bok, kyoung-soo;Yoo, jae-soo
    • Proceedings of the Korea Contents Association Conference
    • 2019.05a
    • pp.445-446
    • 2019
  • 빅데이터 시대의 도래로 다양한 NoSQL 데이터베이스 엔진이 활용되고 있다. NoSQL 데이터베이스 엔진 기반의 다양한 응용들이 수행될 때 스토리지의 성능을 평가하기 위한 스토리지 벤치마킹 툴이 요구된다. 본 논문에서는 NoSQL 데이터베이스를 이용한 스토리지 벤치마킹 시스템을 설계한다. 제안하는 스토리지 벤치마킹 시스템은 IO 추적기를 통해 스토리지의 성능을 측정하고, 웹 UI를 통해 사용자 정의 워크로드 생성, 벤치마킹 실행, 결과 확인을 수행할 수 있다.

NoSQL-based User Behavior Detection System in Cloud Computing Environment (NoSQL 기반 클라우드 사용자 행동 탐지 시스템 설계)

  • Ahn, Kwang-Min;Lee, Bong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • 2012.10a
    • pp.804-807
    • 2012
  • Cloud service provider has to protect client's information securely since all the resources are offered by the service provider, and a large number of users share the resources. In this paper, a NoSQL-based anomaly detection system is proposed in order to enhance the security of mobile cloud services. The existing integrated security management system that uses a relational database can not be used for real-time processing of data since security log from a variety of security equipment and data from cloud node have different data format with unstructured features. The proposed system can resolve the emerging security problem because it provides real time processing and scalability in distributed processing environment.

Spatial Operator for Spatial MongoDB (Spatial MongoDB를 위한 공간 연산자)

  • Kwak, Kwang-Jin;Yoon, Ha-Young;Shin, Dong-Yoon;Shin, Dong-Jin;Park, Jeong-Min;Kim, Jeong-Joon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • /
    • /
    • 2018
  • Recently, media data is increasing due to the development of Internet and SNS. Since photographs and videos often have geo-tags, many techniques have been developed to analyze them. In order to process various kind of such as SNS, NoSQL has been covered. However, most NoSQL does not have enough computation and query about spatial data. Therefore, in this paper, we designed and implemented a system for adding spatial operators using MongoDB among the representative NoSQL. Through this study, it is confirmed that various operators can be used and it is expected that various services can be performed using operators.

Design and Implementation of Sensor Information Management System based on Celery-MongoDB (Celery-MongoDB 를 활용한 센서정보 관리시스템 설계 및 구현)

  • Kang, Yun-Hee
    • Journal of Platform Technology
    • /
    • /
    • /
    • 2021
  • The management of sensor information requires the functions for registering, modifying and deleting rapidly sensor information about various many sensors. In this research, Celery and MongoDB are used for developing a sensory data management system. Celery supplies a queue structure based on asynchronous communication in Python. Celery is a distributed simple job-queue but reliable distributed system suitable for processing large message. MongoDB is a NoSQL database that is capable of managing various informal information. In this experiment, we have checked that variety of sensor information can be processed with this system in a IoT environment. To improve the performance for handling a message with sensory data, this system will be deployed in the edge of a cloud infrastructure.

Graph Database Benchmarking Systems Supporting Diversity (다양성을 지원하는 그래프 데이터베이스 벤치마킹 시스템)

  • Choi, Do-Jin;Baek, Yeon-Hee;Lee, So-Min;Kim, Yun-A;Kim, Nam-Young;Choi, Jae-Young;Lee, Hyeon-Byeong;Lim, Jong-Tae;Bok, Kyoung-Soo;Song, Seok-Il;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • v.21 no.12
    • pp.84-94
    • 2021
  • Graph databases have been developed to efficiently store and query graph data composed of vertices and edges to express relationships between objects. Since the query types of graph database show very different characteristics from traditional NoSQL databases, benchmarking tools suitable for graph databases to verify the performance of the graph database are needed. In this paper, we propose an efficient graph database benchmarking system that supports diversity in graph inputs and queries. The proposed system utilizes OrientDB to conduct benchmarking for graph databases. In order to support the diversity of input graphs and query graphs, we use LDBC that is an existing graph data generation tool. We demonstrate the feasibility and effectiveness of the proposed scheme through analysis of benchmarking results. As a result of performance evaluation, it has been shown that the proposed system can generate customizable synthetic graph data, and benchmarking can be performed based on the generated graph data.