• Title/Summary/Keyword: Data Clustering

Search Result 2,769, Processing Time 0.042 seconds

Cluster Analysis with Balancing Weight on Mixed-type Data

  • Chae, Seong-San;Kim, Jong-Min;Yang, Wan-Youn
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.719-732
    • /
    • 2006
  • A set of clustering algorithms with proper weight on the formulation of distance which extend to mixed numeric and multiple binary values is presented. A simple matching and Jaccard coefficients are used to measure similarity between objects for multiple binary attributes. Similarities are converted to dissimilarities between i th and j th objects. The performance of clustering algorithms with balancing weight on different similarity measures is demonstrated. Our experiments show that clustering algorithms with application of proper weight give competitive recovery level when a set of data with mixed numeric and multiple binary attributes is clustered.

Clustering-based identification for the prediction of splitting tensile strength of concrete

  • Tutmez, Bulent
    • Computers and Concrete
    • /
    • v.6 no.2
    • /
    • pp.155-165
    • /
    • 2009
  • Splitting tensile strength (STS) of high-performance concrete (HPC) is one of the important mechanical properties for structural design. This property is related to compressive strength (CS), water/binder (W/B) ratio and concrete age. This paper presents a clustering-based fuzzy model for the prediction of STS based on the CS and (W/B) at a fixed age (28 days). The data driven fuzzy model consists of three main steps: fuzzy clustering, inference system, and prediction. The system can be analyzed directly by the model from measured data. The performance evaluations showed that the fuzzy model is more accurate than the other prediction models concerned.

Support Vector Machine based Cluster Merging (Support Vector Machines 기반의 클러스터 결합 기법)

  • Choi, Byung-In;Rhee, Frank Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.3
    • /
    • pp.369-374
    • /
    • 2004
  • A cluster merging algorithm that merges convex clusters resulted by the Fuzzy Convex Clustering(FCC) method into non-convex clusters was proposed. This was achieved by proposing a fast and reliable distance measure between two convex clusters using Support Vector Machines(SVM) to improve accuracy and speed over other existing conventional methods. In doing so, it was possible to reduce cluster number without losing its representation of the data. In this paper, results for several data sets are given to show the validity of our distance measure and algorithm.

Entropy-based Correlation Clustering for Wireless Sensor Networks in Multi-Correlated Regional Environments

  • Nga, Nguyen Thi Thanh;Khanh, Nguyen Kim;Hong, Son Ngo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.85-93
    • /
    • 2016
  • The existence of correlation characteristics brings significant potential advantages to the development of efficient routing protocols in wireless sensor networks. This research proposes a new simple method of clustering sensor nodes into correlation groups in multiple-correlation areas. At first, the evaluation of joint entropy for multiple-sensed data is considered. Based on the evaluation, the definition of correlation region, based on entropy theory, is proposed. Following that, a correlation clustering scheme with less computation is developed. The results are validated with a real data set.

Similarity measure for P2P processing of semantic data (시맨틱웹 데이터의 P2P 처리를 위한 유사도 측정)

  • Kim, Byung Gon;Kim, Youn Hee
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.6 no.4
    • /
    • pp.11-20
    • /
    • 2010
  • Ontology is important role in semantic web to construct and query semantic data. Because of dynamic characteristic of ontology, P2P environment is considered for ontology processing in web environment. For efficient processing of ontology in P2P environment, clustering of peers should be considered. When new peer is added to the network, cluster allocation problem of the new peer is important for system efficiency. For clustering of peers with similar chateristics, similarlity measure method of ontology in added peer with ontologies in other clusters is needed. In this paper, we propose similarity measure techniques of ontologies for clustering of peers. Similarity measure method in this paper considered ontology's strucural characteristics like schema, class, property. Results of experiments show that ontologies of similar topics, class, property can be allocated to the same cluster.

Clustering Analysis with Spring Discharge Data and Evaluation of Groundwater System in Jeju Island (용천수 유출량 클러스터링 해석을 이용한 제주도 지하수 순환 해석)

  • Kim Tae-Hui;Mun Deok-Cheol;Park Won-Bae;Park Gi-Hwa;Go Gi-Won
    • Proceedings of the Korean Society of Soil and Groundwater Environment Conference
    • /
    • 2005.04a
    • /
    • pp.296-299
    • /
    • 2005
  • Time series of spring discharge data in Jeju island can provide abundant information on the spatial groundwater system. In this study, the classification based on time series of spring discharge was performed with clustering analysis: discharge rate and EC. Peak discharges are mainly observed in august or september. However, double peaks and late peaks of discharge are also observed at a plenty of springs. Based on results of clustering analysis, it can be deduced that GH model is not appropriate for the conceptual model of Groundwater system in Jeju island. EC distributions in dry season are also support the conclusion.

  • PDF

A Study of Simulation Method and New Fuzzy Cluster Analysis (새로운 Fuzzy 집락분석방법과 Simulation기법에 관한 연구)

  • Im Dae-Heug
    • Management & Information Systems Review
    • /
    • v.14
    • /
    • pp.51-65
    • /
    • 2004
  • We consider the Fuzzy clustering which is devised for partitioning a set of objects into a certain number of groups by assigning the membership probabilities to each object. The researches carried out in this field before show that the Fuzzy clustering concept is involved so much that for a certain set of data, the main purpose of the clustering cannot be attained as desired. Thus we Propose a new objective function, named as Fuzzy-Entroppy Function in order to satisfy the main motivation of the clustering which is classifying the data clearly. Also we suggest Mean Field Annealing Algorithm as an optimization algorithm rather than the ISODATA used traditionally in this field since the objective function is changed. We show the Mean Field Annealing Algorithm works pretty well not only for the new objective function but also for the classical Fuzzy objective function by indicating that the local minimum problem resulted from the ISODATA can be improved.

  • PDF

Color image segmentation using the possibilistic C-mean clustering and region growing (Possibilistic C-mean 클러스터링과 영역 확장을 이용한 칼라 영상 분할)

  • 엄경배;이준환
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.3
    • /
    • pp.97-107
    • /
    • 1997
  • Image segmentation is teh important step in image infromation extraction for computer vison sytems. Fuzzy clustering methods have been used extensively in color image segmentation. Most analytic fuzzy clustering approaches are derived from the fuzzy c-means (FCM) algorithm. The FCM algorithm uses th eprobabilistic constraint that the memberships of a data point across classes sum to 1. However, the memberships resulting from the FCM do not always correspond to the intuitive concept of degree of belongingor compatibility. moreover, the FCM algorithm has considerable trouble above under noisy environments in the feature space. Recently, the possibilistic C-mean (PCM) for solving growing for color image segmentation. In the PCM, the membersip values may be interpreted as degrees of possibility of the data points belonging to the classes. So, the problems in the FCM can be solved by the PCM. The clustering results by just PCM are not smoothly bounded, and they often have holes. So, the region growing was used as a postprocessing. In our experiments, we illustrated that the proposed method is reasonable than the FCM in noisy enviironments.

  • PDF

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF

Design of Spatial Clustering Method for Data Mining of Various Spatial Objects (다양한 공간객체의 데이터 마이닝을 위한 공간 클러스터링 기법의 설계)

  • 문상호;최진오;김진덕
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.4
    • /
    • pp.955-959
    • /
    • 2004
  • Existing Clustering Methods for spatial data mining process only Point objects, not spatial objects with polygonometry such as lines and areas. It is because that distance computation between objects with polygonometry for clustering is more complex than distance computation between point objects. To solve this problem, we design a clustering method based on regular grid cell structures. In details, it reduces cost and time for distance computation using cell relationships in grid cell structures.