• Title/Summary/Keyword: Data Partition Algorithm

Search Result 128, Processing Time 0.029 seconds

An Incremental Multi Partition Averaging Algorithm Based on Memory Based Reasoning (메모리 기반 추론 기법에 기반한 점진적 다분할평균 알고리즘)

  • Yih, Hyeong-Il
    • Journal of IKEEE
    • /
    • v.12 no.1
    • /
    • pp.65-74
    • /
    • 2008
  • One of the popular methods used for pattern classification is the MBR (Memory-Based Reasoning) algorithm. Since it simply computes distances between a test pattern and training patterns or hyperplanes stored in memory, and then assigns the class of the nearest training pattern, it is notorious for memory usage and can't learn additional information from new data. In order to overcome this problem, we propose an incremental learning algorithm (iMPA). iMPA divides the entire pattern space into fixed number partitions, and generates representatives from each partition. Also, due to the fact that it can not learn additional information from new data, we present iMPA which can learn additional information from new data and not require access to the original data, used to train. Proposed methods have been successfully shown to exhibit comparable performance to k-NN with a lot less number of patterns and better result than EACH system which implements the NGE theory using benchmark data sets from UCI Machine Learning Repository.

  • PDF

A Heuristic Polynomial Time Algorithm for Crew Scheduling Problem

  • Lee, Sang-Un
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.11
    • /
    • pp.69-75
    • /
    • 2015
  • This paper suggests heuristic polynomial time algorithm for crew scheduling problem that is a kind of optimization problems. This problem has been solved by linear programming, set cover problem, set partition problem, column generation, etc. But the optimal solution has not been obtained by these methods. This paper sorts transit costs $c_{ij}$ to ascending order, and the task i and j crew paths are merged in case of the sum of operation time ${\Sigma}o$ is less than day working time T. As a result, we can be obtain the minimum number of crews $_{min}K$ and minimum transit cost $z=_{min}c_{ij}$. For the transit cost of specific number of crews $K(K>_{min}K)$, we delete the maximum $c_{ij}$ as much as the number of $K-_{min}K$, and to partition a crew path. For the 5 benchmark data, this algorithm can be gets less transit cost than state-of-the-art algorithms, and gets the minimum number of crews.

A study of Time Management System in Data Base (데이터베이스에서의 시간 시스템에 관한 연구)

  • 최진탁
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.21 no.48
    • /
    • pp.185-192
    • /
    • 1998
  • A new algorithm is proposed in this paper which efficiently performs join in the temporal database. The main idea is to sort the smaller relation and to partition the larger relation, and the proposed algorithm reduces the cost of sorting the larger relation. To show the usefulness of the algorithm, the cost is analyzed with respect to the number of accesses to secondary storage and compared with that of Sort-Merge algorithm. Through the comparisons, we present and verify the conditions under which the proposed algorithm always outperforms the Sort-Merge algorithm. The comparisons show that the proposed algorithm achieves 10∼30% gain under those conditions.

  • PDF

Flexible Partitioning of CDFGs for Compact Asynchronous Controllers

  • Sretasereekul, Nattha;Okuyama, Yuichi;Saito, Hiroshi;Imai, Masashi;Kuroda, Kenichi;Nanya, Takashi
    • Proceedings of the IEEK Conference
    • /
    • 2002.07c
    • /
    • pp.1724-1727
    • /
    • 2002
  • Asynchronous circuits have the potential to solve the problems related to parameter variations such as gate delays in deep sub-micron technologies. However, current CAD tools for large-scale asyn-chronous circuits partition specification irrelevantly, because these tools cannot control the granularity of circuit decomposition. In this paper we propose a hierarchical Control/Data Flow Graph (CDFG) containing nodes that are flexibly partitioned or merged into other nodes. We show a partitioning algorithm for such CDFGs to generate handleable Signal Transition Graphs (STGs) for asynchronous synthesis tools. The algorithm a1lows designers to assign the maximum number of signals of partitioned nodes considering of timality. From an experiment, this algorithm can flexibly partition and result in more compact asynchronous controllers.

  • PDF

Clustering Algorithm Using Hashing in Classification of Multispectral Satellite Images

  • Park, Sung-Hee;Kim, Hwang-Soo;Kim, Young-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.16 no.2
    • /
    • pp.145-156
    • /
    • 2000
  • Clustering is the process of partitioning a data set into meaningful clusters. As the data to process increase, a laster algorithm is required than ever. In this paper, we propose a clustering algorithm to partition a multispectral remotely sensed image data set into several clusters using a hash search algorithm. The processing time of our algorithm is compared with that of clusters algorithm using other speed-up concepts. The experiment results are compared with respect to the number of bands, the number of clusters and the size of data. It is also showed that the processing time of our algorithm is shorter than that of cluster algorithms using other speed-up concepts when the size of data is relatively large.

Fuzzy Nonlinear Regression Model (퍼지비선형회귀모형)

  • Hwang, Seung-Gook;Park, Young-Man;Seo, Yoo-Jin;Park, Kwang-Pak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.8 no.6
    • /
    • pp.99-105
    • /
    • 1998
  • This paper is to propose the fuzzy regression model using genetic algorithm which is fuzzy nonlinear regression model. Genetic algorithm is used to classify the input data for better fuzzy regression analysis. From this partition. each data can be have the grade of membership function which is belonged to a divided data group. The data group, from optimal partition of the region of each variable, have different fuzzy parameters of fuzzy linear regression model one another. We compound the fuzzy output of each data group so as to obtain the final fuzzy number for a data. We show the efficiency of this method by means of demonstration of a case study.

  • PDF

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

An Optimized Iterative Semantic Compression Algorithm And Parallel Processing for Large Scale Data

  • Jin, Ran;Chen, Gang;Tung, Anthony K.H.;Shou, Lidan;Ooi, Beng Chin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2761-2781
    • /
    • 2018
  • With the continuous growth of data size and the use of compression technology, data reduction has great research value and practical significance. Aiming at the shortcomings of the existing semantic compression algorithm, this paper is based on the analysis of ItCompress algorithm, and designs a method of bidirectional order selection based on interval partitioning, which named An Optimized Iterative Semantic Compression Algorithm (Optimized ItCompress Algorithm). In order to further improve the speed of the algorithm, we propose a parallel optimization iterative semantic compression algorithm using GPU (POICAG) and an optimized iterative semantic compression algorithm using Spark (DOICAS). A lot of valid experiments are carried out on four kinds of datasets, which fully verified the efficiency of the proposed algorithm.

A study on the color image segmentation using the fuzzy Clustering (퍼지 클러스터링을 이용한 칼라 영상 분할)

  • 이재덕;엄경배
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 1999.05a
    • /
    • pp.109-112
    • /
    • 1999
  • Image segmentation is the critical first step in image information extraction for computer vision systems. Clustering methods have been used extensively in color image segmentation. Most analytic fuzzy clustering approaches are divided from the fuzzy c-means(FCM) algorithm. The FCM algorithm uses fie probabilistic constraint that the memberships of a data point across classes sum to 1. However, the memberships resulting from the FCM do not always correspond to the intuitive concept of degree of belonging or compatibility. Moreover, the FCM algorithm has considerable trouble under noisy environments in the feature space. Recently, a possibilistic approach to clustering(PCM) for solving above problems was proposed. In this paper, we used the PCM for color image segmentation. This approach differs from existing fuzzy clustering methods for color image segmentation in that the resulting partition of the data can be interpreted as a possibilistic partition. So, the problems in the FCM can be solved by the PCM. But, the clustering results by the PCM are not smoothly bounded, and they often have holes. The region growing was used as a postprocessing after smoothing the noise points in the pixel seeds. In our experiments, we illustrate that the PCM us reasonable than the FCM in noisy environments.

  • PDF

EXTENSION OF FACTORING LIKELIHOOD APPROACH TO NON-MONOTONE MISSING DATA

  • Kim, Jae-Kwang
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.4
    • /
    • pp.401-410
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Rubin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood. Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method.