• Title/Summary/Keyword: Large Size data Processing

Search Result 246, Processing Time 0.026 seconds

Auto Regulated Data Provisioning Scheme with Adaptive Buffer Resilience Control on Federated Clouds

  • Kim, Byungsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5271-5289
    • /
    • 2016
  • On large-scale data analysis platforms deployed on cloud infrastructures over the Internet, the instability of the data transfer time and the dynamics of the processing rate require a more sophisticated data distribution scheme which maximizes parallel efficiency by achieving the balanced load among participated computing elements and by eliminating the idle time of each computing element. In particular, under the constraints that have the real-time and limited data buffer (in-memory storage) are given, it needs more controllable mechanism to prevent both the overflow and the underflow of the finite buffer. In this paper, we propose an auto regulated data provisioning model based on receiver-driven data pull model. On this model, we provide a synchronized data replenishment mechanism that implicitly avoids the data buffer overflow as well as explicitly regulates the data buffer underflow by adequately adjusting the buffer resilience. To estimate the optimal size of buffer resilience, we exploits an adaptive buffer resilience control scheme that minimizes both data buffer space and idle time of the processing elements based on directly measured sample path analysis. The simulation results show that the proposed scheme provides allowable approximation compared to the numerical results. Also, it is suitably efficient to apply for such a dynamic environment that cannot postulate the stochastic characteristic for the data transfer time, the data processing rate, or even an environment where the fluctuation of the both is presented.

Speed-up of the Matrix Computation on the Ridge Regression

  • Lee, Woochan;Kim, Moonseong;Park, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3482-3497
    • /
    • 2021
  • Artificial intelligence has emerged as the core of the 4th industrial revolution, and large amounts of data processing, such as big data technology and rapid data analysis, are inevitable. The most fundamental and universal data interpretation technique is an analysis of information through regression, which is also the basis of machine learning. Ridge regression is a technique of regression that decreases sensitivity to unique or outlier information. The time-consuming calculation portion of the matrix computation, however, basically includes the introduction of an inverse matrix. As the size of the matrix expands, the matrix solution method becomes a major challenge. In this paper, a new algorithm is introduced to enhance the speed of ridge regression estimator calculation through series expansion and computation recycle without adopting an inverse matrix in the calculation process or other factorization methods. In addition, the performances of the proposed algorithm and the existing algorithm were compared according to the matrix size. Overall, excellent speed-up of the proposed algorithm with good accuracy was demonstrated.

Generating Large Items Efficiently For Mining Quantitative Association Rules (수량적 연관규칙탐사를 위한 효율적인 고빈도항목열 생성기법)

  • Choe, Yeong-Hui;Jang, Su-Min;Yu, Jae-Su;O, Jae-Cheol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.10
    • /
    • pp.2597-2607
    • /
    • 1999
  • In this paper, we propose an efficient large item generation algorithm that overcomes the problem of the existing algorithm for making large items from quantitative attributes. The proposed algorithm splits dataset into variable size of intervals by min_split_support and merges the intervals according to the support of each interval. It reflects characteristic of data to generated large items and can generate finer large items than the existing algorithm. It is shown through the performance evaluation that our proposed algorithm outperforms the existing algorithm.

  • PDF

Efficient Data Movement for Scientific Application Processing Large Size Data Stream (대용량 데이터 스트림을 처리하는 과학계산 응용을 위한 효율적인 데이터 이동 기법)

  • Byun, Eun-kyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.10a
    • /
    • pp.170-173
    • /
    • 2015
  • 대규모 실험장비에서 발생하는 아주 큰 사이즈의 데이터를 처리하기 위해서 기존에는 수집 및 저장, 계산 장비로의 원거리 전송, 데이터 분석 등의 단계를 따로 처리해 왔다. 데이터의 양이 폭발적으로 증가하고 있고 동시에 데이터의 실시간 처리 요구가 증가하는 상황이다. 이에 본 연구에서는 추상화된 입출력 계층을 이용하여 마치 로컬 저장소에 있는 데이터를 사용하는 것과 같은 인터페이스를 통해 원거리에서 생성된 데이터 스트림을 실시간으로 이동하고 처리할 수 있는 기법을 소개한다. 또한 데이터 전처리 계산 위치를 송신 측으로 변경하여 대용량 데이터를 효과적으로 전송하기 기법을 제안한다.

Non-uniform Failure in Superplastic Ti-6Al-4V Alloy (초소성 Ti-6Al-4V 합금에서의 불균일 파손)

  • 김태원
    • Transactions of Materials Processing
    • /
    • v.9 no.6
    • /
    • pp.663-669
    • /
    • 2000
  • A material model has been presented, at the continuum level, for the representation of superplastic deformation coupled with microstructural evolution. The model presented enables the effects of the spatial variation of distributions of grain size to be predicted at the process level. The model has been tested under conditions of both homogeneous and inhomogeneous stress and strain by carrying out detailed comparison of predicted distributions of grain size and their evolutions with experimentally obtained data. Experimental measurements have shown the extent of the spatial variation of the distribution of grain size that exists in the titanium alloy, Ti-6Al-4V. It is shown that whilst not large, the variations in grain size distributions are sufficient to lead to the development of inhomogeneous deformation in test pieces, which ultimately result in localisation of strain and failure.

  • PDF

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.

Scaling down data/index page structure of the NVRAM based DBMS with the small size blocks (소형 블록 DBMS의 데이터/인덱스 페이지 구조 소형화를 통한 NVRAM 성능 개선)

  • Bae, Sang-Hee;Lee, Taehwa;Cha, Jaehyuk
    • Journal of Digital Contents Society
    • /
    • v.14 no.1
    • /
    • pp.15-23
    • /
    • 2013
  • In response to the demands of large-scale data processing with low-power and new application, a storage system using SSD (Solid State Disk/Drive) with fast input-output performance instead of hard disc has appeared as storage device. Studies on methods to overcome specific problems of SSD such as various processing data units, out-place-update and limited delete count have been actively conducted. However, declining performance and stability have not been resolved yet when storing case specific data with small scale that causes frequent random write in hard disc or SSD. This thesis suggests a system structure that stores index requesting frequent random write in NVRAM capable of byte access by using characteristics such as byte unit fast read / write of NVRAM, non-volatile and smaller size of actual changed data size in index page than block size.

DESIGN AND IMPLEMENTATION OF 3D TERRAIN RENDERING SYSTEM ON MOBILE ENVIRONMENT USING HIGH RESOLUTION SATELLITE IMAGERY

  • Kim, Seung-Yub;Lee, Ki-Won
    • Proceedings of the KSRS Conference
    • /
    • v.1
    • /
    • pp.417-420
    • /
    • 2006
  • In these days, mobile application dealing with information contents on mobile or handheld devices such as mobile communicator, PDA or WAP device face the most important industrial needs. The motivation of this study is the design and implementation of mobile application using high resolution satellite imagery, large-sized image data set. Although major advantages of mobile devices are portability and mobility to users, limited system resources such as small-sized memory, slow CPU, low power and small screen size are the main obstacles to developers who should handle a large volume of geo-based 3D model. Related to this, the previous works have been concentrated on GIS-based location awareness services on mobile; however, the mobile 3D terrain model, which aims at this study, with the source data of DEM (Digital Elevation Model) and high resolution satellite imagery is not considered yet, in the other mobile systems. The main functions of 3D graphic processing or pixel pipeline in this prototype are implemented with OpenGL|ES (Embedded System) standard API (Application Programming Interface) released by Khronos group. In the developing stage, experiments to investigate optimal operation environment and good performance are carried out: TIN-based vertex generation with regular elevation data, image tiling, and image-vertex texturing, text processing of Unicode type and ASCII type.

  • PDF

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data

  • Jin, Ran;Chen, Gang;Tung, Anthony K.H.;Shou, Lidan;Ooi, Beng Chin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2761-2781
    • /
    • 2018
  • With the continuous growth of data size and the use of compression technology, data reduction has great research value and practical significance. Aiming at the shortcomings of the existing semantic compression algorithm, this paper is based on the analysis of ItCompress algorithm, and designs a method of bidirectional order selection based on interval partitioning, which named An Optimized Iterative Semantic Compression Algorithm (Optimized ItCompress Algorithm). In order to further improve the speed of the algorithm, we propose a parallel optimization iterative semantic compression algorithm using GPU (POICAG) and an optimized iterative semantic compression algorithm using Spark (DOICAS). A lot of valid experiments are carried out on four kinds of datasets, which fully verified the efficiency of the proposed algorithm.

Design of GlusterFS Based Big Data Distributed Processing System in Smart Factory (스마트 팩토리 환경에서의 GlusterFS 기반 빅데이터 분산 처리 시스템 설계)

  • Lee, Hyeop-Geon;Kim, Young-Woon;Kim, Ki-Young;Choi, Jong-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.1
    • /
    • pp.70-75
    • /
    • 2018
  • Smart Factory is an intelligent factory that can enhance productivity, quality, customer satisfaction, etc. by applying information and communications technology to the entire production process including design & development, manufacture, and distribution & logistics. The precise amount of data generated in a smart factory varies depending on the factory's size and state of facilities. Regardless, it would be difficult to apply traditional production management systems to a smart factory environment, as it generates vast amounts of data. For this reason, the need for a distributed big-data processing system has risen, which can process a large amount of data. Therefore, this article has designed a Gluster File System (GlusterFS)-based distributed big-data processing system that can be used in a smart factory environment. Compared to existing distributed processing systems, the proposed distributed big-data processing system reduces the system load and the risk of data loss through the distribution and management of network traffic.