Search | Korea Science

A Simple Tandem Method for Clustering of Multimodal Dataset

Cho C.;Lee J.W.;Lee J.W.
- Proceedings of the Korean Operations and Management Science Society Conference
- /
- 2003.05a
- /
- pp.729-733
- /
- 2003
The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.
PDF

Workflow Clustering Methodology Using Structural Similarity Metrics (프로세스 유사성을 이용한 워크플로우 클러스터링)

Jung, Jae-Yoon;Bae, Joonsoo;Kang, Suk-Ho
- Journal of Korean Institute of Industrial Engineers
- /
- v.33 no.1
- /
- pp.99-109
- /
- 2007
To realize process-driven management, so many companies have been launching business process managementsystems. Business process is collection of standardized and structured tasks inducing value creation of acompany. Moreover, it is recognized as one of significant intangible business assets to achieve competitiveadvantages. This research introduces a novel approach of workflow process analysis, which has more and moresignificance as process-aware information systems are spreading widely into a lot of companies, In this paper, amethodology of workflow clustering based on process similarity has been proposed. The purpose of workflowclustering is to analyze accumulated process definitions in order to assist design of new processes andimprovement of existing ones. The proposed methodology exploits measures of structural similarity of workflowprocesses.The methodology has been experimented with synthetic process models for illustrating the implicationofworkflow clustering.
PDF KSCI

Industrial load forecasting using the fuzzy clustering and wavelet transform analysis

Yu, In-Keun
- Journal of IKEEE
- /
- v.4 no.2 s.7
- /
- pp.233-240
- /
- 2000
This paper presents fuzzy clustering and wavelet transform analysis based technique for the industrial hourly load forecasting fur the purpose of peak demand control. Firstly, one year of historical load data were sorted and clustered into several groups using fuzzy clustering and then wavelet transform is adopted using the Biorthogonal mother wavelet in order to forecast the peak load of one hour ahead. The 5-level decomposition of the daily industrial load curve is implemented to consider the weather sensitive component of loads effectively. The wavelet coefficients associated with certain frequency and time localization is adjusted using the conventional multiple regression method and the components are reconstructed to predict the final loads through a five-scale synthesis technique. The outcome of the study clearly indicates that the proposed composite model of fuzzy clustering and wavelet transform approach can be used as an attractive and effective means for the industrial hourly peak load forecasting.
PDF

Application of Genetic and Local Optimization Algorithms for Object Clustering Problem with Similarity Coefficients (유사성 계수를 이용한 군집화 문제에서 유전자와 국부 최적화 알고리듬의 적용)

Yim, Dong-Soon;Oh, Hyun-Seung
- Journal of Korean Institute of Industrial Engineers
- /
- v.29 no.1
- /
- pp.90-99
- /
- 2003
Object clustering, which makes classification for a set of objects into a number of groups such that objects included in a group have similar characteristic and objects in different groups have dissimilar characteristic each other, has been exploited in diverse area such as information retrieval, data mining, group technology, etc. In this study, an object-clustering problem with similarity coefficients between objects is considered. At first, an evaluation function for the optimization problem is defined. Then, a genetic algorithm and local optimization technique based on heuristic method are proposed and used in order to obtain near optimal solutions. Solutions from the genetic algorithm are improved by local optimization techniques based on object relocation and cluster merging. Throughout extensive experiments, the validity and effectiveness of the proposed algorithms are tested.
PDF KSCI

Technology Clustering Using Textual Information of Reference Titles in Scientific Paper (과학기술 논문의 참고문헌 텍스트 정보를 활용한 기술의 군집화)

Park, Inchae;Kim, Songhee;Yoon, Byungun
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.43 no.2
- /
- pp.25-32
- /
- 2020
Data on patent and scientific paper is considered as a useful information source for analyzing technological information and has been widely utilized. Technology big data is analyzed in various ways to identify the latest technological trends and predict future promising technologies. Clustering is one of the ways to discover new features by creating groups from technology big data. Patent includes refined bibliographic information such as patent classification code whereas scientific paper does not have appropriate bibliographic information for clustering. This research proposes a new approach for clustering data of scientific paper by utilizing reference titles in each scientific paper. In this approach, the reference titles are considered as textual information because each reference consists of the title of the paper that represents the core content of the paper. We collected the scientific paper data, extracted the title of the reference, and conducted clustering by measuring the text-based similarity. The results from the proposed approach are compared with the results using existing methodologies that one is the approach utilizing textual information from titles and abstracts and the other one is a citation-based approach. The suggested approach in this paper shows statistically significant difference compared to the existing approaches and it shows better clustering performance. The proposed approach will be considered as a useful method for clustering scientific papers.
https://doi.org/10.11627/jkise.2020.43.2.025 인용 PDF KSCI

A Layer-based Dynamic Unequal Clustering Method in Large Scale Wireless Sensor Networks (대규모 무선 센서 네트워크에서 계층 기반의 동적 불균형 클러스터링 기법)

Kim, Jin-Su
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.12
- /
- pp.6081-6088
- /
- 2012
An unequal clustering method in wireless sensor networks is the technique that forms the cluster of different size. This method decreases whole energy consumption by solving the hot spot problem. In this paper, I propose a layer-based dynamic unequal clustering using the unequal clustering model. This method decreases whole energy consumption and maintain that equally using optimal cluster's number and cluster head position. I also show that proposed method is better than previous clustering method at the point of network lifetime.
https://doi.org/10.5762/KAIS.2012.13.12.6081 인용 PDF KSCI

Estimation of Defect Clustering Parameter Using Markov Chain Monte Carlo (Markov Chain Monte Carlo를 이용한 반도체 결함 클러스터링 파라미터의 추정)

Ha, Chung-Hun;Chang, Jun-Hyun;Kim, Joon-Hyun
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.32 no.3
- /
- pp.99-109
- /
- 2009
Negative binomial yield model for semiconductor manufacturing consists of two parameters which are the average number of defects per die and the clustering parameter. Estimating the clustering parameter is quite complex because the parameter has not clear closed form. In this paper, a Bayesian approach using Markov Chain Monte Carlo is proposed to estimate the clustering parameter. To find an appropriate estimation method for the clustering parameter, two typical estimators, the method of moments estimator and the maximum likelihood estimator, and the proposed Bayesian estimator are compared with respect to the mean absolute deviation between the real yield and the estimated yield. Experimental results show that both the proposed Bayesian estimator and the maximum likelihood estimator have excellent performance and the choice of method depends on the purpose of use.
PDF KSCI

A Study of optimized clustering method based on SOM for CRM

Jong T. Rhee;Lee, Joon.
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2001.01a
- /
- pp.464-469
- /
- 2001
CRM(Customer Relationship Management : CRM) is an advanced marketing supporting system which analyze customers\` transaction data and classify or target customer groups to effectively increase market share and profit. Many engines were developed to implements the function and those for classification and clustering are considered core ones. In this study, an improved clustering method based on SOM(Self-Organizing Maps : SOM) is proposed. The proposed clustering method finds the optimal number of clusters so that the effectiveness of clustering is increased. It considers all the data types existing in CRM data warehouses. In particular, and adaptive algorithm where the concepts of degeneration and fusion are applied to find optimal number of clusters. The feasibility and efficiency of the proposed method are demonstrated through simulation with simplified data of customers.
PDF

Design of Hierarchically Structured Clustering Algorithm and its Application (계층 구조 클러스터링 알고리즘 설계 및 그 응용)

Bang, Young-Keun;Park, Ha-Yong;Lee, Chul-Heui
- Journal of Industrial Technology
- /
- v.29 no.B
- /
- pp.17-23
- /
- 2009
In many cases, clustering algorithms have been used for extracting and discovering useful information from non-linear data. They have made a great effect on performances of the systems dealing with non-linear data. Thus, this paper presents a new approach called hierarchically structured clustering algorithm, and it is applied to the prediction system for non-linear time series data. The proposed hierarchically structured clustering algorithm (called HCKA: Hierarchical Cross-correlation and K-means clustering Algorithms) in which the cross-correlation and k-means clustering algorithm are combined can accept the correlationship of non-linear time series as well as statistical characteristics. First, the optimal differences of data are generated, which can suitably reveal the characteristics of non-linear time series. Second, the generated differences are classified into the upper clusters for their predictors by the cross-correlation clustering algorithm, and then each classified differences are classified again into the lower fuzzy sets by the k-means clustering algorithm. As a result, the proposed method can give an efficient classification and improve the performance. Finally, we demonstrates the effectiveness of the proposed HCKA via typical time series examples.
PDF

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

Shin, Hyung-Won;Sohn, So-Young
- Journal of Korean Institute of Industrial Engineers
- /
- v.27 no.1
- /
- pp.47-53
- /
- 2001
In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.
PDF

Search Result 401, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)