• Title/Summary/Keyword: multidimensional data processing

Search Result 92, Processing Time 0.024 seconds

A Study on The Grid File Construction Method based on MapReduce for Multidimensional Data Processing (다차원 데이터 처리를 위한 맵리듀스 기반의 그리드 파일 생성기법에 관한 연구)

  • Jung, Joo-Hyuk;Lee, Sang-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.77-80
    • /
    • 2014
  • 최근 컴퓨터와 인터넷 이용의 확산, 스마트폰을 포함한 스마트 기기의 보급과 소셜 네트워크 이용의 확대, 위치 기반의 다양한 서비스 확대 등으로 처리해야 할 데이터 크기가 증가하는 추세이다. 이에 따라 대용량 데이터에 대한 처리가 큰 이슈로 떠오르고 있다. 그로 인해 대용량 데이터 처리를 위한 큰 규모의 분산 컴퓨팅 환경을 지원하는 프레임워크인 하둡이 개발되었으며 많은 기업에서 이를 활용하고 있는 추세이다. 하지만 대용량 데이터 중 영상, 의료, 센서 데이터 등 다차원 데이터 처리에 관한 연구는 미비한 상태이다. 기존의 다차원 데이터 처리를 위해 다양한 다차원 인덱스가 제안되었지만, 대용량 다차원 데이터 처리는 단일머신에서는 비효율적인 단점이 있다. 본 논문에서는 다차원 인덱스 기법인 그리드 파일을 하둡의 분산 병렬 처리 모델인 맵리듀스를 기반으로 생성하는 기법을 제안한다. 또한 앞서 생성된 그리드 파일을 가지고 맵리듀스를 이용한 질의처리 방법을 제안 한다. 이로 인해 단일머신에서의 그리드 파일 생성을 병렬처리 함으로써 생성 시간을 단축시키고 질의 처리 또한 맵리듀스를 이용하여 병렬 처리 함으로써 질의 시간 단축을 예상한다.

Physical Database Design for DFT-Based Multidimensional Indexes in Time-Series Databases (시계열 데이터베이스에서 DFT-기반 다차원 인덱스를 위한 물리적 데이터베이스 설계)

  • Kim, Sang-Wook;Kim, Jin-Ho;Han, Byung-ll
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.11
    • /
    • pp.1505-1514
    • /
    • 2004
  • Sequence matching in time-series databases is an operation that finds the data sequences whose changing patterns are similar to that of a query sequence. Typically, sequence matching hires a multi-dimensional index for its efficient processing. In order to alleviate the dimensionality curse problem of the multi-dimensional index in high-dimensional cases, the previous methods for sequence matching apply the Discrete Fourier Transform(DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes of the multi-dimensional index. This paper first points out the problems in such simple methods taking the firs two or three coefficients, and proposes a novel solution to construct the optimal multi -dimensional index. The proposed method analyzes the characteristics of a target database, and identifies the organizing attributes having the best discrimination power based on the analysis. It also determines the optimal number of organizing attributes for efficient sequence matching by using a cost model. To show the effectiveness of the proposed method, we perform a series of experiments. The results show that the Proposed method outperforms the previous ones significantly.

  • PDF

Replacement Condition Detection of Railway Point Machines Using Data Cube and SVM (데이터 큐브 모델과 SVM을 이용한 철도 선로전환기의 교체시기 탐지)

  • Choi, Yongju;Oh, Jeeyoung;Park, Daihee;Chung, Yongwha;Kim, Hee-Young
    • Smart Media Journal
    • /
    • v.6 no.2
    • /
    • pp.33-41
    • /
    • 2017
  • Railway point machines act as actuators that provide different routes to trains by driving switchblades from the current position to the opposite one. Since point failure caused by the aging effect can significantly affect railway operations with potentially disastrous consequences, replacement detection of point machine at an appropriate time is critical. In this paper, we propose a replacement condition detection method of point machine in railway condition monitoring systems using electrical current signals, after analyzing and relabeling domestic in-field replacement data by means of OLAP(On-Line Analytical Processing) operations in the multidimensional data cube into "does-not-need-to-be replaced" and "needs-to-be-replaced" data. The system enables extracting suitable feature vectors from the incoming electrical current signals by DWT(Discrete Wavelet Transform) with reduced feature dimensions using PCA(Principal Components Analysis), and employs SVM(Support Vector Machine) for the real-time replacement detection of point machine. Experimental results with in-field replacement data including points anomalies show that the system could detect the replacement conditions of railway point machines with accuracy exceeding 98%.

A Method for Time Warping Based Similarity Search in Sequence Databases (시퀀스 데이터베이스를 위한 타임 워핑 기반 유사 검색)

  • Kim, Sang-Wook;Park, Sang-Hyun
    • Journal of Industrial Technology
    • /
    • v.20 no.B
    • /
    • pp.219-226
    • /
    • 2000
  • In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without false dismissal. To attain this goal, we devise a new distance function $D_{tw-lb}$ that consistently underestimates the time warping distance and also satisfies the triangular inequality. $D_{tw-lb}$ uses a 4-tuple feature vector extracted from each sequence and is invariant to time warping. For efficient processing, we employ a multidimensional index that uses the 4-tuple feature vector as indexing attributes and $D_{tw-lb}$ as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we perform extensive experiments. The results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data.

  • PDF

Implementation of a Data Processing Method to Enhance the Quality and Support the What-If Analysis for Traffic History Data (교통이력 데이터의 품질 개선과 What-If 분석을 위한 자료처리 기법의 구현)

  • Lee, Min-Soo;Cheong, Su-Jeong;Choi, Ok-Ju;Meang, Bo-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.17D no.2
    • /
    • pp.87-102
    • /
    • 2010
  • A vast amount of traffic data is produced every day from detection devices but this data includes a considerable amount of errors and missing values. Moreover, this information is periodically deleted before it could be used as important analysis information. Therefore, this paper discusses the implementation of an integrated traffic history database system that continuously stores the traffic data as a multidimensional model and increases the validity and completeness of the data via a flow of processing steps, and provides a what-if analysis function. The implemented system provides various techniques to correct errors and missing data patterns, and a what-if analysis function that enables the analysis of results under various conditions by allowing the flexible definition of various process related environment variables and combinations of the processing flows. Such what-if analysis functions dramatically increase the usability of traffic data but are not provided by other traffic data systems. Experimantal results for cleaning the traffic history data showed that it provides superior performance in terms of validity and completeness.

A Study on the Field Data Applicability of Seismic Data Processing using Open-source Software (Madagascar) (오픈-소스 자료처리 기술개발 소프트웨어(Madagascar)를 이용한 탄성파 현장자료 전산처리 적용성 연구)

  • Son, Woohyun;Kim, Byoung-yeop
    • Geophysics and Geophysical Exploration
    • /
    • v.21 no.3
    • /
    • pp.171-182
    • /
    • 2018
  • We performed the seismic field data processing using an open-source software (Madagascar) to verify if it is applicable to processing of field data, which has low signal-to-noise ratio and high uncertainties in velocities. The Madagascar, based on Python, is usually supposed to be better in the development of processing technologies due to its capabilities of multidimensional data analysis and reproducibility. However, this open-source software has not been widely used so far for field data processing because of complicated interfaces and data structure system. To verify the effectiveness of the Madagascar software on field data, we applied it to a typical seismic data processing flow including data loading, geometry build-up, F-K filter, predictive deconvolution, velocity analysis, normal moveout correction, stack, and migration. The field data for the test were acquired in Gunsan Basin, Yellow Sea using a streamer consisting of 480 channels and 4 arrays of air-guns. The results at all processing step are compared with those processed with Landmark's ProMAX (SeisSpace R5000) which is a commercial processing software. Madagascar shows relatively high efficiencies in data IO and management as well as reproducibility. Additionally, it shows quick and exact calculations in some automated procedures such as stacking velocity analysis. There were no remarkable differences in the results after applying the signal enhancement flows of both software. For the deeper part of the substructure image, however, the commercial software shows better results than the open-source software. This is simply because the commercial software has various flows for de-multiple and provides interactive processing environments for delicate processing works compared to Madagascar. Considering that many researchers around the world are developing various data processing algorithms for Madagascar, we can expect that the open-source software such as Madagascar can be widely used for commercial-level processing with the strength of expandability, cost effectiveness and reproducibility.

Learning Multidimensional Sequential Patterns Using Hellinger Entropy Function (Hellinger 엔트로피를 이용한 다차원 연속패턴의 생성방법)

  • Lee, Chang-Hwan
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.477-484
    • /
    • 2004
  • The technique of sequential pattern mining means generating a set of inter-transaction patterns residing in time-dependent data. This paper proposes a new method for generating sequential patterns with the use of Hellinger measure. While the current methods are generating single dimensional sequential patterns within a single attribute, the proposed method is able to detect multi-dimensional patterns among different attributes. A number of heuristics, based on the characteristics of Hellinger measure, are proposed to reduce the computational complexity of the sequential pattern systems. Some experimental results are presented.

An Efficient Technique for Processing Frequent Updates in the R-tree (R-트리에서 빈번한 변경 질의 처리를 위한 효율적인 기법)

  • 권동섭;이상준;이석호
    • Journal of KIISE:Databases
    • /
    • v.31 no.3
    • /
    • pp.261-273
    • /
    • 2004
  • Advances in information and communication technologies have been creating new classes of applications in the area of databases. For example, in moving object databases, which track positions of a lot of objects, or stream databases, which process data streams from a lot of sensors, data Processed in such database systems are usually changed very rapidly and continuously. However, traditional database systems have a problem in processing these rapidly and continuously changing data because they suppose that a data item stored in the database remains constant until It is explicitly modified. The problem becomes more serious in the R-tree, which is a typical index structure for multidimensional data, because modifying data in the R-tree can generate cascading node splits or merges. To process frequent updates more efficiently, we propose a novel update technique for the R-tree, which we call the leaf-update technique. If a new value of a data item lies within the leaf MBR that the data item belongs, the leaf-update technique changes the leaf node only, not whole of the tree. Using this leaf-update manner and the leaf-access hash table for direct access to leaf nodes, the proposed technique can reduce update cost greatly. In addition, the leaf-update technique can be adopted in diverse variants of the R-tree and various applications that use the R-tree since it is based on the R-tree and it guarantees the correctness of the R-tree. In this paper, we prove the effectiveness of the leaf-update techniques theoretically and present experimental results that show that our technique outperforms traditional one.

Efficient Computation of Stream Cubes Using AVL Trees (AVL 트리를 사용한 효율적인 스트림 큐브 계산)

  • Kim, Ji-Hyun;Kim, Myung
    • The KIPS Transactions:PartD
    • /
    • v.14D no.6
    • /
    • pp.597-604
    • /
    • 2007
  • Stream data is a continuous flow of information that mostly arrives as the form of an infinite rapid stream. Recently researchers show a great deal of interests in analyzing such data to obtain value added information. Here, we propose an efficient cube computation algorithm for multidimensional analysis of stream data. The fact that stream data arrives in an unsorted fashion and aggregation results can only be obtained after the last data item has been read. cube computation requires a tremendous amount of memory. In order to resolve such difficulties, we compute user selected aggregation fables only, and use a combination of an way and AVL trees as a temporary storage for aggregation tables. The proposed cube computation algorithm works even when main memory is not large enough to store all the aggregation tables during the computation. We showed that the proposed algorithm is practically fast enough by theoretical analysis and performance evaluation.

Metamorphosis Hierarchical Motion Vector Estimation Algorithm for Multidimensional Image System (다차원 영상 시스템을 위한 변형계층 모션벡터 추정알고리즘)

  • Kim Jeong-Woong;Yang Hae-Sool
    • The KIPS Transactions:PartB
    • /
    • v.13B no.2 s.105
    • /
    • pp.105-114
    • /
    • 2006
  • In ubiquitous environment where various kinds of computers are embedded in persons, objects and environment and they are interconnected and can be used in my place as necessary, different types of data need to be exchanged between heterogeneous machines through home network. In the environment, the efficient processing, transmission and monitoring of image data are essential technologies. We need to make research not only on traditional image processing such as spatial and visual resolution, color expression and methods of measuring image quality but also on transmission rate on home network that has a limited bandwidth. The present study proposes a new motion vector estimation algorithm for transmitting, processing and controlling image data, which is the core part of contents in home network situation and, using algorithm, implements a real time monitoring system of multi dimensional images transmitted from multiple cameras. Image data of stereo cameras to be transmitted in different environment in angle, distance, etc. are preprocessed through reduction, magnification, shift or correction, and compressed and sent using the proposed metamorphosis hierarchical motion vector estimation algorithm for the correction of motion. The proposed algorithm adopts advantages and complements disadvantages of existing motion vector estimation algorithms such as whole range search, three stage search and hierarchical search, and estimates efficiently the motion of images with high variation of brightness using an atypical small size macro block. The proposed metamorphosis hierarchical motion vector estimation algorithm and implemented image systems can be utilized in various ways in ubiquitous environment.