• 제목/요약/키워드: Data Partition Algorithm

검색결과 128건 처리시간 0.019초

빅데이터 분석을 위한 파티션 기반 시각화 알고리즘 (Partition-based Big Data Analysis and Visualization Algorithm)

  • 홍준기
    • 한국빅데이터학회지
    • /
    • 제5권1호
    • /
    • pp.147-154
    • /
    • 2020
  • 오늘날 빅데이터로부터 유의미한 결과를 도출하는 연구가 활발히 진행되고 있다. 본 논문에선 빅데이터의 데이터의 영역들을 파티션(partition)으로 설정하고 각 파티션들의 대표 값을 계산하여 변수들 사이의 상관관계를 분석 할 수 있는 파티션 기반 빅데이터 분석 알고리즘을 제안한다. 본 논문에선 파티션의 크기조절이 가능한 파티션 기반 빅데이터 분석 알고리즘의 파티션 크기 변화에 따른 시각화 결과를 비교분석하였다. 제안한 파티션 기반 빅데이터 분석 알고리즘을 검증하기 위해 의류 회사 'A'의 빅데이터를 분석하여 온도와 판매 가격 변화에 따른 상품의 판매량 변화를 분석하고 시각화하여 유의미한 결과를 얻을 수 있었다.

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • 제8권3호
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구 (A Study on the Construction of Stable Clustering by Minimizing the Order Bias)

  • 이계성
    • 한국정보처리학회논문지
    • /
    • 제6권6호
    • /
    • pp.1571-1580
    • /
    • 1999
  • 데이터 마이닝 또는 기계학습을 위한 무감독 학습 알고리즘인 개념적 클러스터링을 이용하여 계층적 구조를 유도해낼 때 자료를 처리하는 순서에 따라 서로 다른 결과에 도달하는 양상을 보인다. 이 순서 바이어스 문제를 해결하는 방안으로 먼저 주어진 자료 세트에 분류를 시행하여 초기 분류를 형성한다. 이 분류를 통해 최종 분류의 가능한 클래스 수를 예측하고 이 정보에 기반하여 자료 분석과 중심 정렬을 통해 자료 처리 순서를 새로이 결정한다. 재배열된 자료 세트에 ITERATE 분류 과정을 적용해 새로운 분류를 생성한다. 본 논문에서는 이 과정을 반복하여 안정적이고 최적의 분류 점수를 갖도록 하는 알고리즘 REIT를 제안하였다. 이 알고리즘을 여러 자료 세트에 적용하고 순서 바이어스의 영향을 최소화하는지 여부를 실험을 통해 비교 분석하였다.

  • PDF

Spatial Statistic Data Release Based on Differential Privacy

  • Cai, Sujin;Lyu, Xin;Ban, Duohan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권10호
    • /
    • pp.5244-5259
    • /
    • 2019
  • With the continuous development of LBS (Location Based Service) applications, privacy protection has become an urgent problem to be solved. Differential privacy technology is based on strict mathematical theory that provides strong privacy guarantees where it supposes that the attacker has the worst-case background knowledge and that knowledge has been applied to different research directions such as data query, release, and mining. The difficulty of this research is how to ensure data availability while protecting privacy. Spatial multidimensional data are usually released by partitioning the domain into disjointed subsets, then generating a hierarchical index. The traditional data-dependent partition methods need to allocate a part of the privacy budgets for the partitioning process and split the budget among all the steps, which is inefficient. To address such issues, a novel two-step partition algorithm is proposed. First, we partition the original dataset into fixed grids, inject noise and synthesize a dataset according to the noisy count. Second, we perform IH-Tree (Improved H-Tree) partition on the synthetic dataset and use the resulting partition keys to split the original dataset. The algorithm can save the privacy budget allocated to the partitioning process and obtain a more accurate release. The algorithm has been tested on three real-world datasets and compares the accuracy with the state-of-the-art algorithms. The experimental results show that the relative errors of the range query are considerably reduced, especially on the large scale dataset.

A Network Partition Approach for MFD-Based Urban Transportation Network Model

  • Xu, Haitao;Zhang, Weiguo;zhuo, Zuozhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권11호
    • /
    • pp.4483-4501
    • /
    • 2020
  • Recent findings identified the scatter and shape of MFD (macroscopic fundamental diagram) is heavily influenced by the spatial distribution of link density in a road network. This implies that the concept of MFD can be utilized to divide a heterogeneous road network with different degrees of congestion into multiple homogeneous subnetworks. Considering the actual traffic data is usually incomplete and inaccurate while most traffic partition algorithms rely on the completeness of the data, we proposed a three-step partitioned algorithm called Iso-MB (Isoperimetric algorithm - Merging - Boundary adjustment) permitting of incompletely input data in this paper. The proposed algorithm was implemented and verified in a simulated urban transportation network. The existence of well-defined MFD in each subnetwork was revealed and discussed and the selection of stop parameter in the isoperimetric algorithm was explained and dissected. The effectiveness of the approach to the missing input data was also demonstrated and elaborated.

3-분할 문제의 상자 채우기-교환 알고리즘 (Bin Packing-Exchange Algorithm for 3-Partition Problem)

  • 이상운
    • 한국인터넷방송통신학회논문지
    • /
    • 제22권4호
    • /
    • pp.95-102
    • /
    • 2022
  • 본 논문은 NP-완전으로 다항시간 알고리즘이 알려져 있지 않은 3-분할 문제(TPP)에 대한 선형시간 알고리즘을 제안하였다. 본 논문은 기존에 알려진 다항시간 알고리즘인 최대-최소치와 제3의 숫자 합을 이용하는 MM법이 갖고 있는 해를 구하지 못하는 문제점을 개선한 역추적 법을 제안하였으며, 또한 역추적 법을 적용한 MM의 문제점도 개선하였다. 제안된 알고리즘은 내림차순 정렬된 S 집합을 3-분할하여 순방향, 역방향과 최대 여유량 순서인 최적합 배정 법으로 배정한 결과 10개 데이터 중 5개 데이터인 50.00%에 대해서는 최적 해를 찾을 수 있었다. 나머지 5개 데이터에 대해서도 최소 1회, 최대 7회의 잉여 상자와 부족 상자 간 숫자 교환으로 최적 해를 찾을 수 있는 성능을 보였다. 제안된 알고리즘은 n개 데이터를 3-분할한 m=n/3 보다도 적은 O(k)의 선형시간 수행 복잡도로 단순 배정과 교환 최적화를 수행하는 알고리즘으로 TPP가 NP-완전이 아닌 P-문제인 다항시간 알고리즘이 존재할 수 있음을 보였다.

Designing a Distribution Network for Faster Delivery of Online Retailing : A Case Study in Bangkok, Thailand

  • Amchang, Chompoonut;Song, Sang-Hwa
    • 산경연구논집
    • /
    • 제9권5호
    • /
    • pp.25-35
    • /
    • 2018
  • Purpose - The purpose of this paper is to partition a last-mile delivery network into zones and to determine locations of last mile delivery centers (LMDCs) in Bangkok, Thailand. Research design, data, and methodology - As online shopping has become popular, parcel companies need to improve their delivery services as fast as possible. A network partition has been applied to evaluate suitable service areas by using METIS algorithm to solve this scenario and a facility location problem is used to address LMDC in a partitioned area. Research design, data, and methodology - Clustering and mixed integer programming algorithms are applied to partition the network and to locate facilities in the network. Results - Network partition improves last mile delivery service. METIS algorithm divided the area into 25 partitions by minimizing the inter-network links. To serve short-haul deliveries, this paper located 96 LMDCs in compact partitioning to satisfy customer demands. Conclusions -The computational results from the case study showed that the proposed two-phase algorithm with network partitioning and facility location can efficiently design a last-mile delivery network. It improves parcel delivery services when sending parcels to customers and reduces the overall delivery time. It is expected that the proposed two-phase approach can help parcel delivery companies minimize investment while providing faster delivery services.

두 점과 분할 카디날리티가 주어진 퍼지 균등화조건을 갖는 퍼지분할 (Fuzzy Partitioning with Fuzzy Equalization Given Two Points and Partition Cardinality)

  • 김경택;김종수;강성열
    • 산업경영시스템학회지
    • /
    • 제31권4호
    • /
    • pp.140-145
    • /
    • 2008
  • Fuzzy partition is a conceptual vehicle that encapsulates data into information granules. Fuzzy equalization concerns a process of building information granules that are semantically and experimentally meaningful. A few algorithms generating fuzzy partitions with fuzzy equalization have been suggested. Simulations and experiments have showed that fuzzy partition representing more characteristics of given input distribution usually produces meaningful results. In this paper, given two points and cardinality of fuzzy partition, we prove that it is not true that there always exists a fuzzy partition with fuzzy equalization in which two of points having peaks fall on the given two points. Then, we establish an algorithm that minimizes the maximum distance between given two points and adjacent points having peaks in the partition. A numerical example is presented to show the validity of the suggested algorithm.

대용량 주기억장치 시스템에서 효율적인 연관 규칙 탐사 알고리즘 (An Efficient Algorithm For Mining Association Rules In Main Memory Systems)

  • 이재문
    • 정보처리학회논문지D
    • /
    • 제9D권4호
    • /
    • pp.579-586
    • /
    • 2002
  • 본 논문은 대용량 주기억장치를 가진 시스템에 적합한 연관 규칙 탐사 알고리즘에 관한 연구이다. 이를 위하여 먼저 기존의 잘 알려진 알고리즘인 DHP, Partition 방법을 대용량 주기억장치를 가진 시스템에서 효율적으로 동작하도록 확장하였고, 다음 Partition 방법에 대해서 해쉬 테이블과 비트맵 기법을 적용하여 Partition 방법을 개선하는 방법을 제안하였다. 제안된 알고리즘은 실험적 환경에서 DHP와 성능이 비교되었으며, 제안하는 알고리즘이 확장된 DHP보다 최대 65%까지 성능 개선 효과가 있음을 보인다.

데이터마이닝에서 기존의 연관규칙을 갱신하는 분할 알고리즘 (Partition Algorithm for Updating Discovered Association Rules in Data Mining)

  • 이종섭;황종원;강맹규
    • 산업경영시스템학회지
    • /
    • 제23권54호
    • /
    • pp.1-11
    • /
    • 2000
  • This study suggests the partition algorithm for updating the discovered association rules in large database, because a database may allow frequent or occasional updates, and such update may not only invalidate some existing strong association rules, but also turn some weak rules into strong ones. the Partition algorithm updates strong association rules efficiently in the whole update database reuseing the information of the old large itemsets. Partition algorithms that is suggested in this study scans an incremental database in view of the fact that it is difficult to find the new set of large itemset in the whole updated database after an incremental database is added to the original database. This method of generating large itemsets is different from that of FUP(Fast Update) and KDP(Kim Dong Pil)

  • PDF