• Title/Summary/Keyword: Data Partition Algorithm

Search Result 128, Processing Time 0.028 seconds

Partition-based Big Data Analysis and Visualization Algorithm (빅데이터 분석을 위한 파티션 기반 시각화 알고리즘)

  • Hong, Jun-Ki
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.147-154
    • /
    • 2020
  • Today, research is actively being conducted to derive meaningful results from big data. In this paper, we propose a partition-based big data analysis algorithm that can analyze the correlation between variables by setting the data areas of big data as partitions and calculating the representative values of each partition. In this paper, the analyzed visualization results are compared according to the partition size of a proposed partition-based big data analysis (PBDA) algorithm that can control the size of the partition. In order to verify the proposed PBDA algorithm, the big data of 'A' is analyzed, and meaningful results are obtained through the analysis of changes in sales volume of products according to changes in temperature and sales price.

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

A Study on the Construction of Stable Clustering by Minimizing the Order Bias (순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구)

  • Lee, Gye-Seong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1571-1580
    • /
    • 1999
  • When a hierarchical structure is derived from data set for data mining and machine learning, using a conceptual clustering algorithm, one of the unsupervised learning paradigms, it is not unusual to have a different set of outcomes with respect to the order of processing data objects. To overcome this problem, the first classification process is proceeded to construct an initial partition. The partition is expected to imply the possible range in the number of final classes. We apply center sorting to the data objects in the classes of the partition for new data ordering and build a new partition using ITERATE clustering procedure. We developed an algorithm, REIT that leads to the final partition with stable and best partition score. A number of experiments were performed to show the minimization of order bias effects using the algorithm.

  • PDF

Spatial Statistic Data Release Based on Differential Privacy

  • Cai, Sujin;Lyu, Xin;Ban, Duohan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.10
    • /
    • pp.5244-5259
    • /
    • 2019
  • With the continuous development of LBS (Location Based Service) applications, privacy protection has become an urgent problem to be solved. Differential privacy technology is based on strict mathematical theory that provides strong privacy guarantees where it supposes that the attacker has the worst-case background knowledge and that knowledge has been applied to different research directions such as data query, release, and mining. The difficulty of this research is how to ensure data availability while protecting privacy. Spatial multidimensional data are usually released by partitioning the domain into disjointed subsets, then generating a hierarchical index. The traditional data-dependent partition methods need to allocate a part of the privacy budgets for the partitioning process and split the budget among all the steps, which is inefficient. To address such issues, a novel two-step partition algorithm is proposed. First, we partition the original dataset into fixed grids, inject noise and synthesize a dataset according to the noisy count. Second, we perform IH-Tree (Improved H-Tree) partition on the synthetic dataset and use the resulting partition keys to split the original dataset. The algorithm can save the privacy budget allocated to the partitioning process and obtain a more accurate release. The algorithm has been tested on three real-world datasets and compares the accuracy with the state-of-the-art algorithms. The experimental results show that the relative errors of the range query are considerably reduced, especially on the large scale dataset.

A Network Partition Approach for MFD-Based Urban Transportation Network Model

  • Xu, Haitao;Zhang, Weiguo;zhuo, Zuozhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4483-4501
    • /
    • 2020
  • Recent findings identified the scatter and shape of MFD (macroscopic fundamental diagram) is heavily influenced by the spatial distribution of link density in a road network. This implies that the concept of MFD can be utilized to divide a heterogeneous road network with different degrees of congestion into multiple homogeneous subnetworks. Considering the actual traffic data is usually incomplete and inaccurate while most traffic partition algorithms rely on the completeness of the data, we proposed a three-step partitioned algorithm called Iso-MB (Isoperimetric algorithm - Merging - Boundary adjustment) permitting of incompletely input data in this paper. The proposed algorithm was implemented and verified in a simulated urban transportation network. The existence of well-defined MFD in each subnetwork was revealed and discussed and the selection of stop parameter in the isoperimetric algorithm was explained and dissected. The effectiveness of the approach to the missing input data was also demonstrated and elaborated.

Bin Packing-Exchange Algorithm for 3-Partition Problem (3-분할 문제의 상자 채우기-교환 알고리즘)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.95-102
    • /
    • 2022
  • This paper proposed a linear time algorithm for a three-partition problem(TPP) in which a polynomial time algorithm is not known as NP-complete. This paper proposes a backtracking method that improves the problems of not being able to obtain a solution of the MM method using the sum of max-min values and third numbers, which are known polynomial algorithms in the past. In addition, the problem of MM applying the backtracking method was improved. The proposed algorithm partition the descending ordered set S into three and assigned to the forward, backward, and best-fit allocation method with maximum margin, and found an optimal solution for 50.00%, which is 5 out of 10 data in initial allocation phase. The remaining five data also showed performance to find the optimal solution by exchanging numbers between surplus boxes and shortage boxes at least once and up to seven times. The proposed algorithm that performs simple allocation and exchange optimization with less O(k) linear time performance complexity than the three-partition m=n/3 data, and it was shown that there could be a polynomial time algorithm in which TPP is a P-problem, not NP-complete.

Designing a Distribution Network for Faster Delivery of Online Retailing : A Case Study in Bangkok, Thailand

  • Amchang, Chompoonut;Song, Sang-Hwa
    • The Journal of Industrial Distribution & Business
    • /
    • v.9 no.5
    • /
    • pp.25-35
    • /
    • 2018
  • Purpose - The purpose of this paper is to partition a last-mile delivery network into zones and to determine locations of last mile delivery centers (LMDCs) in Bangkok, Thailand. Research design, data, and methodology - As online shopping has become popular, parcel companies need to improve their delivery services as fast as possible. A network partition has been applied to evaluate suitable service areas by using METIS algorithm to solve this scenario and a facility location problem is used to address LMDC in a partitioned area. Research design, data, and methodology - Clustering and mixed integer programming algorithms are applied to partition the network and to locate facilities in the network. Results - Network partition improves last mile delivery service. METIS algorithm divided the area into 25 partitions by minimizing the inter-network links. To serve short-haul deliveries, this paper located 96 LMDCs in compact partitioning to satisfy customer demands. Conclusions -The computational results from the case study showed that the proposed two-phase algorithm with network partitioning and facility location can efficiently design a last-mile delivery network. It improves parcel delivery services when sending parcels to customers and reduces the overall delivery time. It is expected that the proposed two-phase approach can help parcel delivery companies minimize investment while providing faster delivery services.

Fuzzy Partitioning with Fuzzy Equalization Given Two Points and Partition Cardinality (두 점과 분할 카디날리티가 주어진 퍼지 균등화조건을 갖는 퍼지분할)

  • Kim, Kyeong-Taek;Kim, Chong-Su;Kang, Sung-Yeol
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.31 no.4
    • /
    • pp.140-145
    • /
    • 2008
  • Fuzzy partition is a conceptual vehicle that encapsulates data into information granules. Fuzzy equalization concerns a process of building information granules that are semantically and experimentally meaningful. A few algorithms generating fuzzy partitions with fuzzy equalization have been suggested. Simulations and experiments have showed that fuzzy partition representing more characteristics of given input distribution usually produces meaningful results. In this paper, given two points and cardinality of fuzzy partition, we prove that it is not true that there always exists a fuzzy partition with fuzzy equalization in which two of points having peaks fall on the given two points. Then, we establish an algorithm that minimizes the maximum distance between given two points and adjacent points having peaks in the partition. A numerical example is presented to show the validity of the suggested algorithm.

An Efficient Algorithm For Mining Association Rules In Main Memory Systems (대용량 주기억장치 시스템에서 효율적인 연관 규칙 탐사 알고리즘)

  • Lee, Jae-Mun
    • The KIPS Transactions:PartD
    • /
    • v.9D no.4
    • /
    • pp.579-586
    • /
    • 2002
  • This paper propose an efficient algorithm for mining association rules in the large main memory systems. To do this, the paper attempts firstly to extend the conventional algorithms such as DHP and Partition in order to be compatible to the large main memory systems and proposes secondly an algorithm to improve Partition algorithm by applying the techniques of the hash table and the bit map. The proposed algorithm is compared to the extended DHP within the experimental environments and the results show up to 65% performance improvement in comparison to the expanded DHP.

Partition Algorithm for Updating Discovered Association Rules in Data Mining (데이터마이닝에서 기존의 연관규칙을 갱신하는 분할 알고리즘)

  • 이종섭;황종원;강맹규
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.23 no.54
    • /
    • pp.1-11
    • /
    • 2000
  • This study suggests the partition algorithm for updating the discovered association rules in large database, because a database may allow frequent or occasional updates, and such update may not only invalidate some existing strong association rules, but also turn some weak rules into strong ones. the Partition algorithm updates strong association rules efficiently in the whole update database reuseing the information of the old large itemsets. Partition algorithms that is suggested in this study scans an incremental database in view of the fact that it is difficult to find the new set of large itemset in the whole updated database after an incremental database is added to the original database. This method of generating large itemsets is different from that of FUP(Fast Update) and KDP(Kim Dong Pil)

  • PDF