• Title/Summary/Keyword: NoSQL Database

Search Result 64, Processing Time 0.028 seconds

An Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment

  • Kim, Myoungjin;Cui, Yun;Lee, Hanku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.3182-3202
    • /
    • 2015
  • Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured log processing system (MdbULPS) for collecting, categorizing, and analyzing log data generated from banks. The proposed system includes a Hadoop-based analysis module for reliable parallel-distributed processing of massive log data. Furthermore, because the Hadoop distributed file system (HDFS) stores data by generating replicas of collected log data in block units, the proposed system offers automatic system recovery against system failures and data loss. Finally, by establishing a distributed database using the NoSQL-based MongoDB, the proposed system provides methods of effectively processing unstructured log data. To evaluate the proposed system, we conducted three different performance tests on a local test bed including twelve nodes: comparing our system with a MySQL-based approach, comparing it with an Hbase-based approach, and changing the chunk size option. From the experiments, we found that our system showed better performance in processing unstructured log data.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

NVST DATA ARCHIVING SYSTEM BASED ON FASTBIT NOSQL DATABASE

  • Liu, Ying-Bo;Wang, Feng;Ji, Kai-Fan;Deng, Hui;Dai, Wei;Liang, Bo
    • Journal of The Korean Astronomical Society
    • /
    • v.47 no.3
    • /
    • pp.115-122
    • /
    • 2014
  • The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our slestudy brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

Development of educational programs for managing medical information utilizing medical data generation and analysis techniques (의료 데이터 발생과 분석기술을 활용한 의료정보관리 교육용 프로그램 개발)

  • Choi, Joonyoung
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.377-386
    • /
    • 2017
  • This study has developed a medical information management educational program that can improve the management ability of medical information. The educational medical information management program was developed for 8mnths uing VB. The database utilized the ACCESS Database, which allows learners to easily understand and understand the structure of the data. The learners enter data in the discharge analysis and the cancer registration program and the incomplete program after analyze the medical records. After entering and saving data, medical information management programs can be used to understand and analyze the structure of the database to generate medical information. The educational programs can improve the ability of learners to manage medical information by extracting the necessary data from the database directly through SQL and creating various medical information. However, although the medical information management program is an educational program, there is no evaluation system for the learners program operation. Accordingly, the next studies should develop the assessment system of the medical information management program for learners evaluation.

An Implementation of Web-based Client/Server Architecture using Distributed Objects (분산 객체를 이용한 웹기반 클라이언트 / 서버 구조의 구현)

  • 박희창;이태공
    • Journal of the military operations research society of Korea
    • /
    • v.23 no.2
    • /
    • pp.25-44
    • /
    • 1997
  • Internet users been rapidly increased due to the convenient GUI environment. Current Web-based HTTP/CGI client/server architecture has several problems such as the CGI bottleneck, no maintaince of state, and no load balancing. However, with Java and CORBA technologies called“Object Web technology”, we can solve them because Java is not only a mobile code but also a platform-independent code, and CORBA has ability to build distributed object and language-independent object model. The goal of “Object Web technology”is to create multivendor, multiOS, multilanguage“legoware”using objects. This paper implement“Book Search System”which is Web-based client/server architecture using distributed objects. Environments of this implementation are Hangul Windows NT(included IIS) server, Hangul Windows 95 client, Visigenic's VisiBroker for Java 1.2 which is a product of CORBA 2.0, HTTP protocol on TCP-IP-based, Sybase SQL Anywhere 5.0 database server, and the interface between application server and database is JDBC-ODBC bridge middleware.

  • PDF

No-load Database Security using Network Packet Control (패킷제어를 이용한 무부하 데이터베이스 접근보안)

  • Sin, Sung Chul;Lee, Kyeong Seok;Ryu, Keun Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.540-543
    • /
    • 2014
  • 수많은 데이터를 사용하는 기업 및 기관들은 전용 데이터베이스를 구축하여 모든 정보들을 저장, 관리하고 있다. 개인정보를 포함하여 비인가된 중요 정보들이 저장되어있음에도 불구하고 대부분의 경우 데이터베이스에 대한 보안적용이 되어있지 않거나 아주 미비한 상태이다. 보안사고의 대부분이 데이터베이스에 저장된 중요 정보가 유출되는 점을 보았을 때 데이터베이스 자체에 대한 보안시스템을 구축하여 예방하는 것이 보안사고의 피해를 막을 수 있는 가장 핵심적인 방법이라 할 수 있다. 이에 본 연구에서는 네트워크 패킷을 모니터링하여 데이터베이스의 성능에 영향을 미치지 않고 보안기능을 수행할 수 있는 Sniffing 을 이용한 방안을 제안한다. 제안 보안시스템은 별도의 하드웨어에서 기능을 수행하며 운영 중인 데이터베이스에 서비스의 중단 없이 보안시스템을 구축할 수 있도록 하였다. Sniffing 방식에서 접근제어 기능을 수행 할 수 있도록 알고리즘 설계를 하였으며 감사 로그를 기록할 때 가장 많은 부분을 차지하는 SQL 에 대해 MD5 해시함수를 적용하여 해당 로그에 대한 데이터크기를 크게 줄일 수 있었다. 운영중인 데이터베이스 환경에 영향을 주지 않으면서 높은 수준의 감사성능을 제공하고 다양한 보안 정책을 간단하게 적용할 수 있도록 구현하였으므로 데이터베이스에 저장된 중요정보의 유출을 예방하고, 보안사고가 일어났을 때 추적 및 증거자료로 활용할 수 있을 것이다.

Development of an Agricultural Data Middleware to Integrate Multiple Sensor Networks for an Farm Environment Monitoring System

  • Kim, Joonyong;Lee, Chungu;Kwon, Tae-Hyung;Park, Geonhwan;Rhee, Joong-Yong
    • Journal of Biosystems Engineering
    • /
    • v.38 no.1
    • /
    • pp.25-32
    • /
    • 2013
  • Purpose: The objective of this study is to develop a data middleware for u-IT convergence in agricultural environment monitoring, which can support non-standard data interfaces and solve the compatibility problems of heterogenous sensor networks. Methods: Six factors with three different interfaces were chosen as target data among the environmental monitoring factors for crop cultivation. PostgresSQL and PostGIS were used for database and the data middleware was implemented by Python programming language. Based on hierarchical model design and key-value type table design, the data middleware was developed. For evaluation, 2,000 records of each data access interface were prepared. Results: Their execution times of File I/O interface, SQL interface and HTTP interface were 0.00951 s/record, 0.01967 s/record and 0.0401 s/record respectively. And there was no data loss. Conclusions: The data middleware integrated three heterogenous sensor networks with different data access interfaces.

Efficient Query Retrieval from Social Data in Neo4j using LIndex

  • Mathew, Anita Brigit
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.5
    • /
    • pp.2211-2232
    • /
    • 2018
  • The unstructured and semi-structured big data in social network poses new challenges in query retrieval. This requirement needs to be met by introducing quality retrieval time measures like indexing. Due to the huge volume of data storage, there originate the need for efficient index algorithms to promote query processing. However, conventional algorithms fail to index the huge amount of frequently obtained information in real time and fall short of providing scalable indexing service. In this paper, a new LIndex algorithm, which is a heuristic on Lucene is built on Neo4jHA architecture that holds the social network Big data. LIndex is a flexible and simplified adaptive indexing scheme that ascendancy decomposed shortest paths around term neighbors as basic indexing unit. This newfangled index proves to be effectual in query space pruning of graph database Neo4j, scalable in index construction and deployment. A graph query is processed and optimized beyond the traditional Lucene in a time-based manner to a more efficient path method in LIndex. This advanced algorithm significantly reduces query fetch without compromising the quality of results in time. The experiments are conducted to confirm the efficiency of the proposed query retrieval in Neo4j graph NoSQL database.

OHDSI OMOP-CDM Database Security Weakness and Countermeasures (OHDSI OMOP-CDM 데이터베이스 보안 취약점 및 대응방안)

  • Lee, Kyung-Hwan;Jang, Seong-Yong
    • Journal of Information Technology Services
    • /
    • v.21 no.4
    • /
    • pp.63-74
    • /
    • 2022
  • Globally researchers at medical institutions are actively sharing COHORT data of patients to develop vaccines and treatments to overcome the COVID-19 crisis. OMOP-CDM, a common data model that efficiently shares medical data research independently operated by individual medical institutions has patient personal information (e.g. PII, PHI). Although PII and PHI are managed and shared indistinguishably through de-identification or anonymization in medical institutions they could not be guaranteed at 100% by complete de-identification and anonymization. For this reason the security of the OMOP-CDM database is important but there is no detailed and specific OMOP-CDM security inspection tool so risk mitigation measures are being taken with a general security inspection tool. This study intends to study and present a model for implementing a tool to check the security vulnerability of OMOP-CDM by analyzing the security guidelines for the US database and security controls of the personal information protection of the NIST. Additionally it intends to verify the implementation feasibility by real field demonstration in an actual 3 hospitals environment. As a result of checking the security status of the test server and the CDM database of the three hospitals in operation, most of the database audit and encryption functions were found to be insufficient. Based on these inspection results it was applied to the optimization study of the complex and time-consuming CDM CSF developed in the "Development of Security Framework Required for CDM-based Distributed Research" task of the Korea Health Industry Promotion Agency. According to several recent newspaper articles, Ramsomware attacks on financially large hospitals are intensifying. Organizations that are currently operating or will operate CDM databases need to install database audits(proofing) and encryption (data protection) that are not provided by the OMOP-CDM database template to prevent attackers from compromising.