• Title/Summary/Keyword: Objective clustering

Search Result 226, Processing Time 0.02 seconds

A Similar Price Zone Determination of Public Land Price Using a Hybrid Clustering Technique (평균연결법과 K-means 혼합클러스터링 기법을 이용한 공시지가 유사가격권역의 설정)

  • Yi Seong-Kyu;Park Soo-Hong;Hong Sung-Eon
    • Journal of the Korean Geographical Society
    • /
    • v.41 no.1 s.112
    • /
    • pp.121-135
    • /
    • 2006
  • Even though the similar land price zone is very important element in the public land appraisal procedure, the concept is implicitly described and applied into the actual land appraisal system. This situation makes it worse when applying for the automatic selection of a comparative standard land parcel. In addition, the division of similar land price zones requires the objective and reasonable process for improving ALPAS(Automatic land Price Appraisal System), which becomes an issue today. To solve the similar land price zone determination problem that is caused by the lack of objective numerical standard, this study proposed a similar land price zone determination method using a hybrid clustering technique. Results showed that this hybrid clustering method that applied into the test area could easily detect similar land price zones with considerable accuracy levels, which are verified with some test statistics and real comparative standard land parcels done by manually.

Feature Weighting in Projected Clustering for High Dimensional Data (고차원 데이타에 대한 투영 클러스터링에서 특성 가중치 부여)

  • Park, Jong-Soo
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.228-242
    • /
    • 2005
  • The projected clustering seeks to find clusters in different subspaces within a high dimensional dataset. We propose an algorithm to discover near optimal projected clusters without user specified parameters such as the number of output clusters and the average cardinality of subspaces of projected clusters. The objective function of the algorithm computes projected energy, quality, and the number of outliers in each process of clustering. In order to minimize the projected energy and to maximize the quality in clustering, we start to find best subspace of each cluster on the density of input points by comparing standard deviations of the full dimension. The weighting factor for each dimension of the subspace is used to get id of probable error in measuring projected distances. Our extensive experiments show that our algorithm discovers projected clusters accurately and it is scalable to large volume of data sets.

Multi-Objective Genetic Algorithm based on Multi-Robot Positions for Scheduling Problems (스케줄링 문제를 위한 멀티로봇 위치 기반 다목적 유전 알고리즘)

  • Choi, Jong Hoon;Kim, Je Seok;Jeong, Jin Han;Kim, Jung Min;Park, Jahng Hyon
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.31 no.8
    • /
    • pp.689-696
    • /
    • 2014
  • This paper presents a scheduling problem for a high-density robotic workcell using multi-objective genetic algorithm. We propose a new algorithm based on NSGA-II(Non-dominated Sorting Algorithm-II) which is the most popular algorithm to solve multi-objective optimization problems. To solve the problem efficiently, the proposed algorithm divides the problem into two processes: clustering and scheduling. In clustering process, we focus on multi-robot positions because they are fixed in manufacturing system and have a great effect on task distribution. We test the algorithm by changing multi-robot positions and compare it to previous work. Test results shows that the proposed algorithm is effective under various conditions.

A Clustering Algorithm for Handling Missing Data (손실 데이터를 처리하기 위한 집락분석 알고리즘)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.11
    • /
    • pp.103-108
    • /
    • 2017
  • In the ubiquitous environment, there has been a problem of transmitting data from various sensors at a long distance. Especially, in the process of integrating data arriving at different locations, data having different property values of data or having some loss in data had to be processed. This paper present a method to analyze such data. The core of this method is to define an objective function suitable for the problem and to develop an algorithm that can optimize this objective function. The objective function is used by modifying the OCS function. MFA (Mean Field Annealing), which was able to process only binary data, is extended to be applicable to fields with continuous values. It is called CMFA and used as an optimization algorithm.

Rule-Based Fuzzy-Neural Networks Using the Identification Algorithm of the GA Hybrid Scheme

  • Park, Ho-Sung;Oh, Sung-Kwun
    • International Journal of Control, Automation, and Systems
    • /
    • v.1 no.1
    • /
    • pp.101-110
    • /
    • 2003
  • This paper introduces an identification method for nonlinear models in the form of rule-based Fuzzy-Neural Networks (FNN). In this study, the development of the rule-based fuzzy neural networks focuses on the technologies of Computational Intelligence (CI), namely fuzzy sets, neural networks, and genetic algorithms. The FNN modeling and identification environment realizes parameter identification through synergistic usage of clustering techniques, genetic optimization and a complex search method. We use a HCM (Hard C-Means) clustering algorithm to determine initial apexes of the membership functions of the information granules used in this fuzzy model. The parameters such as apexes of membership functions, learning rates, and momentum coefficients are then adjusted using the identification algorithm of a GA hybrid scheme. The proposed GA hybrid scheme effectively combines the GA with the improved com-plex method to guarantee both global optimization and local convergence. An aggregate objective function (performance index) with a weighting factor is introduced to achieve a sound balance between approximation and generalization of the model. According to the selection and adjustment of the weighting factor of this objective function, we reveal how to design a model having sound approximation and generalization abilities. The proposed model is experimented with using several time series data (gas furnace, sewage treatment process, and NOx emission process data from gas turbine power plants).

A Hill-Sliding Strategy for Initialization of Gaussian Clusters in the Multidimensional Space

  • Park, J.Kyoungyoon;Chen, Yung-H.;Simons, Daryl-B.;Miller, Lee-D.
    • Korean Journal of Remote Sensing
    • /
    • v.1 no.1
    • /
    • pp.5-27
    • /
    • 1985
  • A hill-sliding technique was devised to extract Gaussian clusters from the multivariate probability density estimates of sample data for the first step of iterative unsupervised classification. The underlying assumption in this approach was that each cluster possessed a unimodal normal distribution. The key idea was that a clustering function proposed could distinguish elements of a cluster under formation from the rest in the feature space. Initial clusters were extracted one by one according to the hill-sliding tactics. A dimensionless cluster compactness parameter was proposed as a universal measure of cluster goodness and used satisfactorily in test runs with Landsat multispectral scanner (MSS) data. The normalized divergence, defined by the cluster divergence divided by the entropy of the entire sample data, was utilized as a general separability measure between clusters. An overall clustering objective function was set forth in terms of cluster covariance matrices, from which the cluster compactness measure could be deduced. Minimal improvement of initial data partitioning was evaluated by this objective function in eliminating scattered sparse data points. The hill-sliding clustering technique developed herein has the potential applicability to decomposition of any multivariate mixture distribution into a number of unimodal distributions when an appropriate diatribution function to the data set is employed.

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

Improved Density-Independent Fuzzy Clustering Using Regularization (레귤러라이제이션 기반 개선된 밀도 무관 퍼지 클러스터링)

  • Han, Soowhan;Heo, Gyeongyong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Fuzzy clustering, represented by FCM(Fuzzy C-Means), is a simple and efficient clustering method. However, the object function in FCM makes clusters affect clustering results proportional to the density of clusters, which can distort clustering results due to density difference between clusters. One method to alleviate this density problem is EDI-FCM(Extended Density-Independent FCM), which adds additional terms to the objective function of FCM to compensate for the density difference. In this paper, proposed is an enhanced EDI-FCM using regularization, Regularized EDI-FCM. Regularization is commonly used to make a solution space smooth and an algorithm noise insensitive. In clustering, regularization can reduce the effect of a high-density cluster on clustering results. The proposed method converges quickly and accurately to real centers when compared with FCM and EDI-FCM, which can be verified with experimental results.

Fuzzy c-Logistic Regression Model in the Presence of Noise Cluster

  • Alanzado, Arnold C.;Miyamoto, Sadaaki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.431-434
    • /
    • 2003
  • In this paper we introduce a modified objective function for fuzzy c-means clustering with logistic regression model in the presence of noise cluster. The logistic regression model is commonly used to describe the effect of one or several explanatory variables on a binary response variable. In real application there is very often no sharp boundary between clusters so that fuzzy clustering is often better suited for the data.

  • PDF

The Graph Partition Problem (그래프분할문제)

  • 명영수
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.28 no.4
    • /
    • pp.131-143
    • /
    • 2003
  • In this paper, we present a survey about the various graph partition problems including the clustering problem, the k-cut problem, the multiterminal cut problem, the multicut problem, the sparsest cut problem, the network attack problem, the network disconnection problem. We compare those problems focusing on the problem characteristics such as the objective function and the conditions that the partitioned clusters should satisfy. We also introduce the mathematical programming formulations, and the solution approaches developed for the problems.