• Title/Summary/Keyword: Big data storage

Search Result 203, Processing Time 0.027 seconds

a Study on Using Social Big Data for Expanding Analytical Knowledge - Domestic Big Data supply-demand expectation - (분석지의 확장을 위한 소셜 빅데이터 활용연구 - 국내 '빅데이터' 수요공급 예측 -)

  • Kim, Jung-Sun;Kwon, Eun-Ju;Song, Tae-Min
    • Knowledge Management Research
    • /
    • v.15 no.3
    • /
    • pp.169-188
    • /
    • 2014
  • Big data seems to change knowledge management system and method of enterprises to large extent. Further, the type of method for utilization of unstructured data including image, v ideo, sensor data a nd text may determine the decision on expansion of knowledge management of the enterprise or government. This paper, in this light, attempts to figure out the prediction model of demands and supply for big data market of Korea trough data mining decision making tree by utilizing text bit data generated for 3 years on web and SNS for expansion of form for knowledge management. The results indicate that the market focused on H/W and storage leading by the government is big data market of Korea. Further, the demanders of big data have been found to put important on attribute factors including interest, quickness and economics. Meanwhile, innovation and growth have been found to be the attribute factors onto which the supplier puts importance. The results of this research show that the factors affect acceptance of big data technology differ for supplier and demander. This article may provide basic method for study on expansion of analysis form of enterprise and connection with its management activities.

  • PDF

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

A Strategy Study on Sensitive Information Filtering for Personal Information Protect in Big Data Analyze

  • Koo, Gun-Seo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.101-108
    • /
    • 2017
  • The study proposed a system that filters the data that is entered when analyzing big data such as SNS and BLOG. Personal information includes impersonal personal information, but there is also personal information that distinguishes it from personal information, such as religious institution, personal feelings, thoughts, or beliefs. Define these personally identifiable information as sensitive information. In order to prevent this, Article 23 of the Privacy Act has clauses on the collection and utilization of the information. The proposed system structure is divided into two stages, including Big Data Processing Processes and Sensitive Information Filtering Processes, and Big Data processing is analyzed and applied in Big Data collection in four stages. Big Data Processing Processes include data collection and storage, vocabulary analysis and parsing and semantics. Sensitive Information Filtering Processes includes sensitive information questionnaires, establishing sensitive information DB, qualifying information, filtering sensitive information, and reliability analysis. As a result, the number of Big Data performed in the experiment was carried out at 84.13%, until 7553 of 8978 was produced to create the Ontology Generation. There is considerable significan ce to the point that Performing a sensitive information cut phase was carried out by 98%.

Development of Climate & Environment Data System for Big Data from Climate Model Simulations (대용량 기후모델자료를 위한 통합관리시스템 구축)

  • Lee, Jae-Hee;Sung, Hyun Min;Won, Sangho;Lee, Johan;Byu, Young-Hwa
    • Atmosphere
    • /
    • v.29 no.1
    • /
    • pp.75-86
    • /
    • 2019
  • In this paper, we introduce a novel Climate & Environment Database System (CEDS). The CEDS is developed by the National Institute of Meteorological Sciences (NIMS) to provide easy and efficient user interfaces and storage management of climate model data, so improves work efficiency. In uploading the data/files, the CEDS provides an option to automatically operate the international standard data conversion (CMORization) and the quality assurance (QA) processes for submission of CMIP6 variable data. This option increases the system performance, removes the user mistakes, and increases the level of reliability as it eliminates user operation for the CMORization and QA processes. The uploaded raw files are saved in a NAS storage and the Cassandra database stores the metadata that will be used for efficient data access and storage management. The Metadata is automatically generated when uploading a file, or by the user inputs. With the Metadata, the CEDS supports effective storage management by categorizing data/files. This effective storage management allows easy and fast data access with a higher level of data reliability when requesting with the simple search words by a novice. Moreover, the CEDS supports parallel and distributed computing for increasing overall system performance and balancing the load. This supports the high level of availability as multiple users can use it at the same time with fast system-response. Additionally, it deduplicates redundant data and reduces storage space.

Big Data Management System for Biomedical Images to Improve Short-term and Long-term Storage

  • Qamar, Shamweel;Kim, Eun Sung;Park, Peom
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.15 no.2
    • /
    • pp.66-71
    • /
    • 2019
  • In digital pathology, an electronic system in the biomedical domain storage of the files is a big constrain and because all the analysis and annotation takes place at every user-end manually, it becomes even harder to manage the data that is being shared inside an enterprise. Therefore, we need such a storage system which is not only big enough to store all the data but also manage it and making communication of that data much easier without losing its true from. A virtual server setup is one of those techniques which can solve this issue. We set a main server which is the main storage for all the virtual machines(that are being used at user-end) and that main server is controlled through a hypervisor so that if we want to make changes in storage overall or the main server in itself, it could be reached remotely from anywhere by just using the server's IP address. The server in our case includes XML-RPC based API which are transmitted between computers using HTTP protocol. JAVA API connects to HTTP/HTTPS protocol through JAVA Runtime Environment and exists on top of other SDK web services for the productivity boost of the running application. To manage the server easily, we use Tkinter library to develop the GUI and pmw magawidgets library which is also utilized through Tkinter. For managing, monitoring and performing operations on virtual machines, we use Python binding to XML-RPC based API. After all these settings, we approach to make the system user friendly by making GUI of the main server. Using that GUI, user can perform administrative functions like restart, suspend or resume a virtual machine. They can also logon to the slave host of the pool in case of emergency and if needed, they can also filter virtual machine by the host. Network monitoring can be performed on multiple virtual machines at same time in order to detect any loss of network connectivity.

An Analysis of Big Video Data with Cloud Computing in Ubiquitous City (클라우드 컴퓨팅을 이용한 유시티 비디오 빅데이터 분석)

  • Lee, Hak Geon;Yun, Chang Ho;Park, Jong Won;Lee, Yong Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.45-52
    • /
    • 2014
  • The Ubiquitous-City (U-City) is a smart or intelligent city to satisfy human beings' desire to enjoy IT services with any device, anytime, anywhere. It is a future city model based on Internet of everything or things (IoE or IoT). It includes a lot of video cameras which are networked together. The networked video cameras support a lot of U-City services as one of the main input data together with sensors. They generate huge amount of video information, real big data for the U-City all the time. It is usually required that the U-City manipulates the big data in real-time. And it is not easy at all. Also, many times, it is required that the accumulated video data are analyzed to detect an event or find a figure among them. It requires a lot of computational power and usually takes a lot of time. Currently we can find researches which try to reduce the processing time of the big video data. Cloud computing can be a good solution to address this matter. There are many cloud computing methodologies which can be used to address the matter. MapReduce is an interesting and attractive methodology for it. It has many advantages and is getting popularity in many areas. Video cameras evolve day by day so that the resolution improves sharply. It leads to the exponential growth of the produced data by the networked video cameras. We are coping with real big data when we have to deal with video image data which are produced by the good quality video cameras. A video surveillance system was not useful until we find the cloud computing. But it is now being widely spread in U-Cities since we find some useful methodologies. Video data are unstructured data thus it is not easy to find a good research result of analyzing the data with MapReduce. This paper presents an analyzing system for the video surveillance system, which is a cloud-computing based video data management system. It is easy to deploy, flexible and reliable. It consists of the video manager, the video monitors, the storage for the video images, the storage client and streaming IN component. The "video monitor" for the video images consists of "video translater" and "protocol manager". The "storage" contains MapReduce analyzer. All components were designed according to the functional requirement of video surveillance system. The "streaming IN" component receives the video data from the networked video cameras and delivers them to the "storage client". It also manages the bottleneck of the network to smooth the data stream. The "storage client" receives the video data from the "streaming IN" component and stores them to the storage. It also helps other components to access the storage. The "video monitor" component transfers the video data by smoothly streaming and manages the protocol. The "video translator" sub-component enables users to manage the resolution, the codec and the frame rate of the video image. The "protocol" sub-component manages the Real Time Streaming Protocol (RTSP) and Real Time Messaging Protocol (RTMP). We use Hadoop Distributed File System(HDFS) for the storage of cloud computing. Hadoop stores the data in HDFS and provides the platform that can process data with simple MapReduce programming model. We suggest our own methodology to analyze the video images using MapReduce in this paper. That is, the workflow of video analysis is presented and detailed explanation is given in this paper. The performance evaluation was experiment and we found that our proposed system worked well. The performance evaluation results are presented in this paper with analysis. With our cluster system, we used compressed $1920{\times}1080(FHD)$ resolution video data, H.264 codec and HDFS as video storage. We measured the processing time according to the number of frame per mapper. Tracing the optimal splitting size of input data and the processing time according to the number of node, we found the linearity of the system performance.

A Security-Enhanced Identity-Based Batch Provable Data Possession Scheme for Big Data Storage

  • Zhao, Jining;Xu, Chunxiang;Chen, Kefei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4576-4598
    • /
    • 2018
  • In big data age, flexible and affordable cloud storage service greatly enhances productivity for enterprises and individuals, but spontaneously has their outsourced data susceptible to integrity breaches. Provable Data Possession (PDP) as a critical technology, could enable data owners to efficiently verify cloud data integrity, without downloading entire copy. To address challenging integrity problem on multiple clouds for multiple owners, an identity-based batch PDP scheme was presented in ProvSec 2016, which attempted to eliminate public key certificate management issue and reduce computation overheads in a secure and batch method. In this paper, we firstly demonstrate this scheme is insecure so that any clouds who have outsourced data deleted or modified, could efficiently pass integrity verification, simply by utilizing two arbitrary block-tag pairs of one data owner. Specifically, malicious clouds are able to fabricate integrity proofs by 1) universally forging valid tags and 2) recovering data owners' private keys. Secondly, to enhance the security, we propose an improved scheme to withstand these attacks, and prove its security with CDH assumption under random oracle model. Finally, based on simulations and overheads analysis, our batch scheme demonstrates better efficiency compared to an identity based multi-cloud PDP with single owner effort.

CPC: A File I/O Cache Management Policy for Compute-Bound Workloads

  • Bahn, Hyokyung
    • International journal of advanced smart convergence
    • /
    • v.11 no.2
    • /
    • pp.1-6
    • /
    • 2022
  • With the emergence of the new era of the 4th industrial revolution, compute-bound workloads with large memory footprint like big data processing increase dramatically. Even in such compute-bound workloads, however, we observe bulky I/Os while loading big data from storage to memory. Although file I/O cache plays a role of accelerating the performance of storage I/O, we found out that the cache hit rate in such environments is not improved even though we increase the file I/O cache capacity because of some special I/O references generated by compute-bound workloads. To cope with this situation, we propose a new file I/O cache management policy that improves the cache hit rate for compute-bound workloads significantly. Trace-driven simulations by replaying file I/O reference logs of compute-bound workloads show that the proposed cache management policy improves the cache hit rate compared to the well-acknowledged CLOCK algorithm by a large margin.

Effective visualization methods for a manufacturing big data system (제조 빅데이터 시스템을 위한 효과적인 시각화 기법)

  • Yoo, Kwan-Hee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1301-1311
    • /
    • 2017
  • Manufacturing big data systems have supported decision making that can improve preemptive manufacturing activities through collection, storage, management, and predictive analysis of related 4M data in pre-manufacturing processes. Effective visualization of data is crucial for efficient management and operation of data in these systems. This paper presents visualization techniques that can be used to effectively show data collection, analysis, and prediction results in the manufacturing big data systems. Through the visualization technique presented in this paper, we have confirmed that it was not only easy to identify the problems that occurred at the manufacturing site, but also it was very useful to reply to these problems.

A Study on the Development of Phased Big Data Distribution Model Based on Big Data Distribution Ecology (빅데이터 유통 생태계에 기반한 단계별 빅데이터 유통 모델 개발에 관한 연구)

  • Kim, Shinkon;Lee, Sukjun;Kim, Jeonggon
    • Journal of Digital Convergence
    • /
    • v.14 no.5
    • /
    • pp.95-106
    • /
    • 2016
  • The major thrust of this research focuses on the development of phased big data distribution model based on the big data ecosystem. This model consists of 3 phases. In phase 1, data intermediaries are participated in this model and transaction functions are provided. This system consists of general control systems, registrations, and transaction management systems. In phase 2, trading support systems with data storage, analysis, supply, and customer relation management functions are designed. In phase 3, transaction support systems and linked big data distribution portal systems are developed. Recently, emerging new data distribution models and systems are evolving and substituting for past data management system using new technology and the processes in data science. The proposed model may be referred as criteria for industrial standard establishment for big data distribution and transaction models in the future.