• Title/Summary/Keyword: Data Partition

Search Result 415, Processing Time 0.028 seconds

Partition-based Big Data Analysis and Visualization Algorithm (빅데이터 분석을 위한 파티션 기반 시각화 알고리즘)

  • Hong, Jun-Ki
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.147-154
    • /
    • 2020
  • Today, research is actively being conducted to derive meaningful results from big data. In this paper, we propose a partition-based big data analysis algorithm that can analyze the correlation between variables by setting the data areas of big data as partitions and calculating the representative values of each partition. In this paper, the analyzed visualization results are compared according to the partition size of a proposed partition-based big data analysis (PBDA) algorithm that can control the size of the partition. In order to verify the proposed PBDA algorithm, the big data of 'A' is analyzed, and meaningful results are obtained through the analysis of changes in sales volume of products according to changes in temperature and sales price.

Design and Implementation of a Simulation Framework for Wireless Data Broadcasting based on Data ID Space Partition

  • Im, Seokjin
    • International journal of advanced smart convergence
    • /
    • v.7 no.4
    • /
    • pp.10-18
    • /
    • 2018
  • For the information services supporting requests of data items from a great number of mobile clients, wireless data broadcasting is an effective way because it can accommodate any number of clients. In the wireless data broadcasting, various air indexing schemes and data scheduling schemes have been developed in order to enable the clients to access their desired data items efficiently. The broadcasting system needs a method to simulate newly designed air indexing and scheduling schemes of the system, and to evaluate the performance parameters of the schemes. In this paper, we design an expandable and efficient simulation framework for the wireless data broadcasting based on the partition of data ID space. The framework can adopt regular and irregular space partition and evaluate various performance parameters of the broadcasting system. We implement a testbed of the broadcasting system using the framework, that adopts IIP, GDI and EXP as its air indexing schemes. We simulate the system using the testbed and evaluate the performance parameters of the system. Thus, we show the efficiency and expandability of the designed and implemented framework.

Evaluation of Cooling Energy Saving through Applying Aisle Partition System on a Data Center Server Room (파티션 시스템 적용을 통한 기존 데이터센터 서버실의 냉방 에너지 절감 성능평가)

  • Park, Jong-Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.726-733
    • /
    • 2016
  • In this study, a computer simulation of the three types of air distribution systems, open type system, aisle partition system and aisle containment system, to evaluate the applicability of the aisle partition system on a data center server room. The variables of the simulation were the height and location of the partition fixed on the top server rack. The energy efficiency of the air distribution systems were confirmed to be excellent in the order of the aisle containment system, aisle partition system, and open type system. In the cold aisle partition system, the height of the partition that can be effective in saving cooling energy by obstructing sufficient air recirculation was found to be more than 0.9m. In the hot aisle partition system, the height of the partition was found to be more than 0.8m.

File Deduplication using Logical Partition of Storage System (저장 시스템의 논리 파티션을 이용한 파일 중복 제거)

  • Kong, Jin-San;Yoo, Chuck;Ko, Young-Woong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.7 no.6
    • /
    • pp.345-351
    • /
    • 2012
  • In traditional target-based data deduplication system, all of the files should be chunked and compared for reducing duplicated data blocks. One of the critical problem of this system arises as the number of files are increasing. The system suffers from computational delay for calculating hash value and processing metadata for handling each file. To overcome this problem, in this paper, we propose a novel data deduplication system using logical partition of storage system. The system applies data deduplication scheme to each logical partition not each file. Experiment result shows that the proposed system is more efficient compared with traditional deduplication scheme where the logical partition is full of files by 50% in terms of deduplication capacity and processing time.

A Study on the Construction of Stable Clustering by Minimizing the Order Bias (순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구)

  • Lee, Gye-Seong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1571-1580
    • /
    • 1999
  • When a hierarchical structure is derived from data set for data mining and machine learning, using a conceptual clustering algorithm, one of the unsupervised learning paradigms, it is not unusual to have a different set of outcomes with respect to the order of processing data objects. To overcome this problem, the first classification process is proceeded to construct an initial partition. The partition is expected to imply the possible range in the number of final classes. We apply center sorting to the data objects in the classes of the partition for new data ordering and build a new partition using ITERATE clustering procedure. We developed an algorithm, REIT that leads to the final partition with stable and best partition score. A number of experiments were performed to show the minimization of order bias effects using the algorithm.

  • PDF

Adaptive Partitioning for Efficient Query Support

  • Yun, Hong-Won
    • Journal of information and communication convergence engineering
    • /
    • v.5 no.4
    • /
    • pp.369-373
    • /
    • 2007
  • RFID systems large volume of data, it can lead to slower queries. To achieve better query performance, we can partition into active and some nonactive data. In this paper, we propose two approaches of partitioning for efficient query support. The one is average period plus delta partition and the other is adaptive average period partition. We also present the system architecture to manage active data and non-active data and logical database schema. The data manager check the active partition and move all objects from the active store to an archive store associated with an average period plus data and an adaptive average period. Our experiments show the performance of our partitioning methods.

Spatial Statistic Data Release Based on Differential Privacy

  • Cai, Sujin;Lyu, Xin;Ban, Duohan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.10
    • /
    • pp.5244-5259
    • /
    • 2019
  • With the continuous development of LBS (Location Based Service) applications, privacy protection has become an urgent problem to be solved. Differential privacy technology is based on strict mathematical theory that provides strong privacy guarantees where it supposes that the attacker has the worst-case background knowledge and that knowledge has been applied to different research directions such as data query, release, and mining. The difficulty of this research is how to ensure data availability while protecting privacy. Spatial multidimensional data are usually released by partitioning the domain into disjointed subsets, then generating a hierarchical index. The traditional data-dependent partition methods need to allocate a part of the privacy budgets for the partitioning process and split the budget among all the steps, which is inefficient. To address such issues, a novel two-step partition algorithm is proposed. First, we partition the original dataset into fixed grids, inject noise and synthesize a dataset according to the noisy count. Second, we perform IH-Tree (Improved H-Tree) partition on the synthetic dataset and use the resulting partition keys to split the original dataset. The algorithm can save the privacy budget allocated to the partitioning process and obtain a more accurate release. The algorithm has been tested on three real-world datasets and compares the accuracy with the state-of-the-art algorithms. The experimental results show that the relative errors of the range query are considerably reduced, especially on the large scale dataset.

Neuro-Fuzzy System and Its Application by Input Space Partition Methods (입력 공간 분할에 따른 뉴로-퍼지 시스템과 응용)

  • 곽근창;유정웅
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1998.10a
    • /
    • pp.433-439
    • /
    • 1998
  • In this paper, we present an approach to the structure identification based on the input space partition methods and to the parameter identification by hybrid learning method in neuro-fuzzy system. The structure identification can automatically estimate the number of membership function and fuzzy rule using grid partition, tree partition, scatter partition from numerical input-output data. And then the parameter identification is carried out by the hybrid learning scheme using back-propagation and least squares estimate. Finally, we sill show its usefulness for neuro-fuzzy modeling to truck backer-upper control.

  • PDF

Particle Contamination Control in the Cleanroom Production Line using Partition Check Method (클린룸 제조공정에서 공정분할평가법을 이용한 입자오염제어)

  • Lee, Hyeon-Cheol;Park, Jung-Il;Lee, Seong-Hun;Noh, Kwang-Chul;Oh, Myung-Do
    • Proceedings of the KSME Conference
    • /
    • 2007.05b
    • /
    • pp.2338-2343
    • /
    • 2007
  • The practical studies on the method of particle contamination control for yield enhancement in the cleanroom were carried out. The method of the contamination control was proposed, which are composed of data collection, data analysis, improvement action, verification, and implement control. The partition check method for data collection and data analysis was used in the cellular phone module production lines. And this method was evaluated by the variation of yield loss between before and after improvement action. In case that the partition check method was applied, the critical process step was selected and yield loss reduction through improvement actions was observed. From these results, it is concluded that the partition check method is effective solution for particle contamination control in the cleanroom production lines.

  • PDF

A Non-Uniform Network Split Method for Energy Efficiency in a Data Centric Sensor Network (데이타 중심 센서 네트워크에서 에너지 효율성을 고려한 비균등 네트워크 분할 기법)

  • Kang, Hong-Koo;Kim, Joung-Joon;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.3
    • /
    • pp.35-50
    • /
    • 2007
  • In a data centric sensor network, a sensor node to store data is determined by the measured data value of each sensor node. Therefore, if the same data occur frequently, the energy of the sensor node to store the data is exhausted quickly due to the concentration of loads. And if the sensor network is extended, the communication cost for storing data and processing queries is increased, since the length of the routing path for them is usually in the distance. However, the existing researches that generally focus on the efficient management of data storing can not solve these problems efficiently. In this paper, we propose a NUNS(Non-Uniform Network Split) method that can distribute loads of sensor nodes and decrease the communication cost caused by the sensor network extension. By dividing the sensor network into non-uniform partitions that have the minimum difference in the number of sensor nodes and the splitted area size and storing the data which is occurred in a partition at the sensor nodes within the partition, the NUNS can distribute loads of sensor nodes and decrease the communication cost efficiently. In addition, by dividing each partition into non-uniform zones that have the minimum difference in the splitted area size as many as the number of the sensor nodes in the partition and allocating each of them as the processing area of each sensor node, the NUNS can protect a specific sensor node from the load concentration and decrease the unnecessary routing cost.

  • PDF