• Title/Summary/Keyword: Storage Schema

Search Result 89, Processing Time 0.031 seconds

A RDF based Ontology Management System (RDF 기반의 온톨로지 처리시스템)

  • Jung Jun-Won;Jung Ho-Young;Kim Jong-Nam;Lim Dong-Hyuk;Kim Hyoung-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.4
    • /
    • pp.381-392
    • /
    • 2005
  • Computing has been making a lot of progress in the quantity of data, today. It's going to be more difficult to get appropriate information as the number of data increases dramatically. Therefore it's more important to get meaningful information than to focus on the speed of processing. Semantic web enables an intelligent processing by adding semantic information to data, and it is useful to make ontology system. In this Paper, we implemented ontology Processing system which support function for ontology and efficient processing for practical service. We proposed system design for independent from storage, storing technique for RDB, caching technique by schema information and useful user interface.

Design of Efficient Storage Exploiting Structural Similarity in Microarray Data (마이크로어레이 데이터의 구조적 유사성을 이용한 효율적인 저장 구조의 설계)

  • Yun, Jong-Han;Shin, Dong-Kyu;Shin, Dong-Il
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.643-650
    • /
    • 2009
  • As one of typical techniques for acquiring bio-information, microarray has contributed greatly to development of bioinformatics. Although it is established as a core technology in bioinformatics, it has difficulty in sharing and storing data because data from experiments has huge and complex type. In this paper, we propose a new method which uses the feature that microarray data format in MAGE-ML, a standard format for exchanging data, has frequent structurally similar patterns. This method constructs compact database by simplifying MAGE-ML schema. In this method, Inlining techniques and newly proposed classification techniques using structural similarity of elements are used. The structure of database becomes simpler and number of table-joins is reduced, performance is enhanced using this method.

Design and Implementation of XQL Query Processing System Using XQL-SQL Query Translation (XQL-SQL 질의 변환을 통한 XQL 질의 처리 시스템의 설계 및 구현)

  • Kim, Chun-Sig;Kim, Kyung-Won;Lee, Ji-Hun;Jang, Bo-Sun;Sohn, Ki-Rack
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.789-800
    • /
    • 2002
  • XML is a standard format of web data and is currently used as a prevailing language for exchanging data. Most of the commercial data are stored in a relational database. It is quite important to convert these conventionally stored data into those for exchange and use them in data exchange, or to get the query results effectively by utilizing XQL on XML data which are store in a relational database. Thus, it is absolutely required to have a proper query processing mechanism for XML data and to maintain many XML data properly. Up to now, many cases of researches on the storage and retrieval of XML data have been carried out and under study. But, effective retrieval and storage system for path queries like XQL has yet to be contrived. Thus, in this paper, a schema to store XML data is designed, in which DFS-Numbegering method is used to store data effectively. And an effective path query processing method is also designed and implemented, in which a traditional relational database engine is used. That is, XQL is converted into SQL with a XQL processor if a user makes query XQL in a system. A database system executes SQL, and a XML generator uses a generated record and makes a XML document.

Storage and Retrieval of XML Documents Without Redundant Path Information (경로정보의 중복을 제거한 XML 문서의 저장 및 질의처리 기법)

  • Lee Hiye-Ja;Jeong Byeong-Soo;Kim Dae-Ho;Lee Young-Koo
    • The KIPS Transactions:PartD
    • /
    • v.12D no.5 s.101
    • /
    • pp.663-672
    • /
    • 2005
  • This Paper Proposes an approach that removes the redundancy of Path information and uses an inverted index, as an efficient way to store a large volume of XML documents and to retrieve wanted information from there. An XML document is decomposed into nodes based on its tree structure, and stored in relational tables according to the node type, with path information from the root to each node. The existing methods using path information store data for all element paths, which cause retrieval performance to be decreased with increased data volume. Our approach stores only data for leaf element path excluding internal element paths. As the inverted index is made by the leaf element path only, the number of posting lists by key words become smaller than those of the existing methods. For the storage and retrieval of U data, our approach doesn't require the XML schema information of XML documents and any extension of relational database. We demonstrate the better performance of on approach than the existing approaches within the scope of our experiment.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

An Adaptive Materialized Query Selection Method in a Mediator System (미디에이터 시스템의 적응적 구체화 질의 선택방법)

  • Joo, Kil-Hong;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.11D no.1
    • /
    • pp.83-94
    • /
    • 2004
  • Recent researches which purport to Integrate distributed information have been concentrated on developing efficient mediator systems that not only provide a high degree of autonomy for local users but also support the flexible integration of required functions for global users. However, there has been little attention on how to evaluate a global query in a mediator. A global query is transformed into a set of its sub-queries and each sub-query is the unit of evaluation in a remote server. Therefore, it is possible to speed up the execution of a global query if the previous results of frequently evaluated sub-queries are materialized in a mediator. Since the Integration schema of a mediator can be incrementally modified and the evaluation frequency of a global query can also be continuously varied, query usage should be carefully monitored to determine the optimized set of materialized sub-queries. Furthermore, as the number of sub-queries increases, the optimization process itself may take too long, so that the optimized set Identified by a long optimization process nay become obsolete due to the recent change of query usage. This paper proposes the adaptive selection of materialized sub-queries such that available storage in a mediator can be highly utilized at any time. In order to differentiate the recent usage of a query from the past, the accumulated usage frequency of a query decays as time goes by.

Digital Forensic Investigation of HBase (HBase에 대한 디지털 포렌식 조사 기법 연구)

  • Park, Aran;Jeong, Doowon;Lee, Sang Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.2
    • /
    • pp.95-104
    • /
    • 2017
  • As the technology in smart device is growing and Social Network Services(SNS) are becoming more common, the data which is difficult to be processed by existing RDBMS are increasing. As a result of this, NoSQL databases are getting popular as an alternative for processing massive and unstructured data generated in real time. The demand for the technique of digital investigation of NoSQL databases is increasing as the businesses introducing NoSQL database in their system are increasing, although the technique of digital investigation of databases has been researched centered on RDMBS. New techniques of digital forensic investigation are needed as NoSQL Database has no schema to normalize and the storage method differs depending on the type of database and operation environment. Research on document-based database of NoSQL has been done but it is not applicable as itself to other types of NoSQL Database. Therefore, the way of operation and data model, grasp of operation environment, collection and analysis of artifacts and recovery technique of deleted data in HBase which is a NoSQL column-based database are presented in this paper. Also the proposed technique of digital forensic investigation to HBase is verified by an experimental scenario.

A Linkage between IndoorGML and CityGML using External Reference (외부참조를 통한 IndoorGML과 CityGML의 결합)

  • Kim, Joon-Seok;Yoo, Sung-Jae;Li, Ki-Joune
    • Spatial Information Research
    • /
    • v.22 no.1
    • /
    • pp.65-73
    • /
    • 2014
  • Recently indoor navigation with indoor map such as Indoor Google Maps is served. For the services, constructing indoor data are required. CityGML and IFC are widely used as standards for representing indoor data. The data models contains spatial information for the indoor visualization and analysis, but indoor navigation requires semantic and topological information like graph as well as geometry. For this reason, IndoorGML, which is a GML3 application schema and data model for representation, storage and exchange of indoor geoinformation, is under standardization of OGC. IndoorGML can directly describe geometric property and refer elements in external documents. Because a lot of data in CityGML or IFC have been constructed, a huge amount of construction time and cost for IndoorGML data will be reduced if CityGML can help generate data in IndoorGML. Thus, this paper suggest practical use of CityGML including deriving from and link to CityGML. We analyze relationships between IndoorGML and CityGML. In this paper, issues and solutions for linkage of IndoorGML and CityGML are addressed.

The Method for Real-time Complex Event Detection of Unstructured Big data (비정형 빅데이터의 실시간 복합 이벤트 탐지를 위한 기법)

  • Lee, Jun Heui;Baek, Sung Ha;Lee, Soon Jo;Bae, Hae Young
    • Spatial Information Research
    • /
    • v.20 no.5
    • /
    • pp.99-109
    • /
    • 2012
  • Recently, due to the growth of social media and spread of smart-phone, the amount of data has considerably increased by full use of SNS (Social Network Service). According to it, the Big Data concept is come up and many researchers are seeking solutions to make the best use of big data. To maximize the creative value of the big data held by many companies, it is required to combine them with existing data. The physical and theoretical storage structures of data sources are so different that a system which can integrate and manage them is needed. In order to process big data, MapReduce is developed as a system which has advantages over processing data fast by distributed processing. However, it is difficult to construct and store a system for all key words. Due to the process of storage and search, it is to some extent difficult to do real-time processing. And it makes extra expenses to process complex event without structure of processing different data. In order to solve this problem, the existing Complex Event Processing System is supposed to be used. When it comes to complex event processing system, it gets data from different sources and combines them with each other to make it possible to do complex event processing that is useful for real-time processing specially in stream data. Nevertheless, unstructured data based on text of SNS and internet articles is managed as text type and there is a need to compare strings every time the query processing should be done. And it results in poor performance. Therefore, we try to make it possible to manage unstructured data and do query process fast in complex event processing system. And we extend the data complex function for giving theoretical schema of string. It is completed by changing the string key word into integer type with filtering which uses keyword set. In addition, by using the Complex Event Processing System and processing stream data at real-time of in-memory, we try to reduce the time of reading the query processing after it is stored in the disk.