• Title/Summary/Keyword: data partition evaluation

Search Result 29, Processing Time 0.025 seconds

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • v.8 no.3
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

Evaluation of Cooling Energy Saving through Applying Aisle Partition System on a Data Center Server Room (파티션 시스템 적용을 통한 기존 데이터센터 서버실의 냉방 에너지 절감 성능평가)

  • Park, Jong-Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.7
    • /
    • pp.726-733
    • /
    • 2016
  • In this study, a computer simulation of the three types of air distribution systems, open type system, aisle partition system and aisle containment system, to evaluate the applicability of the aisle partition system on a data center server room. The variables of the simulation were the height and location of the partition fixed on the top server rack. The energy efficiency of the air distribution systems were confirmed to be excellent in the order of the aisle containment system, aisle partition system, and open type system. In the cold aisle partition system, the height of the partition that can be effective in saving cooling energy by obstructing sufficient air recirculation was found to be more than 0.9m. In the hot aisle partition system, the height of the partition was found to be more than 0.8m.

Generation of Efficient Fuzzy Classification Rules Using Evolutionary Algorithm with Data Partition Evaluation (데이터 분할 평가 진화알고리즘을 이용한 효율적인 퍼지 분류규칙의 생성)

  • Ryu, Joung-Woo;Kim, Sung-Eun;Kim, Myung-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.32-40
    • /
    • 2008
  • Fuzzy rules are very useful and efficient to describe classification rules especially when the attribute values are continuous and fuzzy in nature. However, it is generally difficult to determine membership functions for generating efficient fuzzy classification rules. In this paper, we propose a method of automatic generation of efficient fuzzy classification rules using evolutionary algorithm. In our method we generate a set of initial membership functions for evolutionary algorithm by supervised clustering the training data set and we evolve the set of initial membership functions in order to generate fuzzy classification rules taking into consideration both classification accuracy and rule comprehensibility. To reduce time to evaluate an individual we also propose an evolutionary algorithm with data partition evaluation in which the training data set is partitioned into a number of subsets and individuals are evaluated using a randomly selected subset of data at a time instead of the whole training data set. We experimented our algorithm with the UCI learning data sets, the experiment results showed that our method was more efficient at average compared with the existing algorithms. For the evolutionary algorithm with data partition evaluation, we experimented with our method over the intrusion detection data of KDD'99 Cup, and confirmed that evaluation time was reduced by about 70%. Compared with the KDD'99 Cup winner, the accuracy was increased by 1.54% while the cost was reduced by 20.8%.

A Hybrid Index of Voronoi and Grid Partition for NN Search

  • Seokjin Im
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.1-8
    • /
    • 2023
  • Smart IoT over high speed network and high performance smart devices explodes the ubiquitous services and applications. Nearest Neighbor(NN) query is one of the important type of queries that have to be supported for ubiquitous information services. In order to process efficiently NN queries in the wireless broadcast environment, it is important that the clients determine quickly the search space and filter out NN from the candidates containing the search space. In this paper, we propose a hybrid index of Voronoi and grid partition to provide quick search space decision and rapid filtering out NN from the candidates. Grid partition plays the role of helping quick search space decision and Voronoi partition providing the rapid filtering. We show the effectiveness of the proposed index by comparing the existing indexing schemes in the access time and tuning time. The evaluation shows the proposed index scheme makes the two performance parameters improved than the existing schemes.

Evaluation of Aisle Partition System's Thermal Performance in Large Data Centers for Superior Cooling Efficiency (데이터센터의 공조효율 향상을 위한 공조파티션시스템 성능평가에 관한 연구)

  • Cho, Jin-Kyun;Jeong, Cha-Su;Kim, Byung-Seon
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.22 no.4
    • /
    • pp.205-212
    • /
    • 2010
  • In a typical data center, large numbers of IT sever racks are arranged multiple rows. IT environments, in which extensive electronic hardware is air-cooled, cooling system inefficiencies result when heated exhaust air from equipment prematurely mixes with chilled coolant air before it is used for cooling. Mixing of chilled air before its use with heated exhaust air results in significant cooling inefficiencies in many systems. Over temperatures may not only harm expensive electronic equipment but also interrupt critical and revenue generating services. Cool shield is a cost effective aisle partition system to contain the air in cold aisles and hot aisles of an IT server room. This paper focuses on the use of performance metrics for analyzing aisle partition system in data centers.

Meta Analysis of Usability Experimental Research Using New Bi-Clustering Algorithm

  • Kim, Kyung-A;Hwang, Won-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.1007-1014
    • /
    • 2008
  • Usability evaluation(UE) experiments are conducted to provide UE practitioners with guidelines for better outcomes. In UE research, significant quantities of empirical results have been accumulated in the past decades. While those results have been anticipated to integrate for producing generalized guidelines, traditional meta-analysis has limitations to combine UE empirical results that often show considerable heterogeneity. In this study, a new data mining method called weighted bi-clustering(WBC) was proposed to partition heterogeneous studies into homogeneous subsets. We applied the WBC to UE empirical results and identified two homogeneous subsets, each of which can be meta-analyzed. In addition, interactions between experimental conditions and UE methods were hypothesized based on the resulting partition and some interactions were confirmed via statistical tests.

Generation of Efficient Fuzzy Classification Rules for Intrusion Detection (침입 탐지를 위한 효율적인 퍼지 분류 규칙 생성)

  • Kim, Sung-Eun;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.6
    • /
    • pp.519-529
    • /
    • 2007
  • In this paper, we investigate the use of fuzzy rules for efficient intrusion detection. We use evolutionary algorithm to optimize the set of fuzzy rules for intrusion detection by constructing fuzzy decision trees. For efficient execution of evolutionary algorithm we use supervised clustering to generate an initial set of membership functions for fuzzy rules. In our method both performance and complexity of fuzzy rules (or fuzzy decision trees) are taken into account in fitness evaluation. We also use evaluation with data partition, membership degree caching and zero-pruning to reduce time for construction and evaluation of fuzzy decision trees. For performance evaluation, we experimented with our method over the intrusion detection data of KDD'99 Cup, and confirmed that our method outperformed the existing methods. Compared with the KDD'99 Cup winner, the accuracy was increased by 1.54% while the cost was reduced by 20.8%.

Development of a Set of Data for Verifying Partition Recovery Tool and Evaluation of Recovery Tool (파티션 복구 도구 검증용 데이터 세트 개발 및 도구 평가)

  • Park, Songyee;Hur, Gimin;Lee, Sang-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.27 no.6
    • /
    • pp.1397-1404
    • /
    • 2017
  • When a digital forensic investigation is conducted on a damaged storage medium, recovery is performed using a recovery tool. But the result of each recovery tool is different depending on the tools. Therefore, it is necessary to identify and use the performance and limitations of the tool for accurate investigation. In this paper, we propose a scenario considering the disk recognition type such as MBR, GPT and the structural characteristics of FAT32 and NTFS filesystem to verify the performance of the partition recovery tool. And then We validate the existing tools with the data set built on the scenarios.

Near infrared spectroscopy for classification of apples using K-mean neural network algorism

  • Muramatsu, Masahiro;Takefuji, Yoshiyasu;Kawano, Sumio
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1131-1131
    • /
    • 2001
  • To develop a nondestructive quality evaluation technique of fruits, a K-mean algorism is applied to near infrared (NIR) spectroscopy of apples. The K-mean algorism is one of neural network partition methods and the goal is to partition the set of objects O into K disjoint clusters, where K is assumed to be known a priori. The algorism introduced by Macqueen draws an initial partition of the objects at random. It then computes the cluster centroids, assigns objects to the closest of them and iterates until a local minimum is obtained. The advantage of using neural network is that the spectra at the wavelengths having absorptions against chemical bonds including C-H and O-H types can be selected directly as input data. In conventional multiple regression approaches, the first wavelength is selected manually around the absorbance wavelengths as showing a high correlation coefficient between the NIR $2^{nd}$ derivative spectrum and Brix value with a single regression. After that, the second and following wavelengths are selected statistically as the calibration equation shows a high correlation. Therefore, the second and following wavelengths are selected not in a NIR spectroscopic way but in a statistical way. In this research, the spectra at the six wavelengths including 900, 904, 914, 990, 1000 and 1016nm are selected as input data for K-mean analysis. 904nm is selected because the wavelength shows the highest correlation coefficients and is regarded as the absorbance wavelength. The others are selected because they show relatively high correlation coefficients and are revealed as the absorbance wavelengths against the chemical structures by B. G. Osborne. The experiment was performed with two phases. In first phase, a reflectance was acquired using fiber optics. The reflectance was calculated by comparing near infrared energy reflected from a Teflon sphere as a standard reference, and the $2^{nd}$ derivative spectra were used for K-mean analysis. Samples are intact 67 apples which are called Fuji and cultivated in Aomori prefecture in Japan. In second phase, the Brix values were measured with a commercially available refractometer in order to estimate the result of K-mean approach. The result shows a partition of the spectral data sets of 67 samples into eight clusters, and the apples are classified into samples having high Brix value and low Brix value. Consequently, the K-mean analysis realized the classification of apples on the basis of the Brix values.

  • PDF

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.