• Title/Summary/Keyword: Cluster-label

Search Result 40, Processing Time 0.032 seconds

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

COUNTING OF FLOWERS BASED ON K-MEANS CLUSTERING AND WATERSHED SEGMENTATION

  • PAN ZHAO;BYEONG-CHUN SHIN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.27 no.2
    • /
    • pp.146-159
    • /
    • 2023
  • This paper proposes a hybrid algorithm combining K-means clustering and watershed algorithms for flower segmentation and counting. We use the K-means clustering algorithm to obtain the main colors in a complex background according to the cluster centers and then take a color space transformation to extract pixel values for the hue, saturation, and value of flower color. Next, we apply the threshold segmentation technique to segment flowers precisely and obtain the binary image of flowers. Based on this, we take the Euclidean distance transformation to obtain the distance map and apply it to find the local maxima of the connected components. Afterward, the proposed algorithm adaptively determines a minimum distance between each peak and apply it to label connected components using the watershed segmentation with eight-connectivity. On a dataset of 30 images, the test results reveal that the proposed method is more efficient and precise for the counting of overlapped flowers ignoring the degree of overlap, number of overlap, and relatively irregular shape.

Classification Tree-Based Feature-Selective Clustering Analysis: Case of Credit Card Customer Segmentation (분류나무를 활용한 군집분석의 입력특성 선택: 신용카드 고객세분화 사례)

  • Yoon Hanseong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • Clustering analysis is used in various fields including customer segmentation and clustering methods such as k-means are actively applied in the credit card customer segmentation. In this paper, we summarized the input features selection method of k-means clustering for the case of the credit card customer segmentation problem, and evaluated its feasibility through the analysis results. By using the label values of k-means clustering results as target features of a decision tree classification, we composed a method for prioritizing input features using the information gain of the branch. It is not easy to determine effectiveness with the clustering effectiveness index, but in the case of the CH index, cluster effectiveness is improved evidently in the method presented in this paper compared to the case of randomly determining priorities. The suggested method can be used for effectiveness of actively used clustering analysis including k-means method.

Intelligent DB Retrieval System for Marine Accidents Using FCM (FCM을 이용한 지능형 해양사고 DB 검색시스템 구축)

  • Park, Gyei-Kark;Han, Xu;Kim, Young-Ki;Oh, Se-Woong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.4
    • /
    • pp.568-573
    • /
    • 2009
  • Marine accidents have always caused huge economic losses, as well as environmental pollution. Prevention of marine accidents has become a focus of argumentation. The analysis of past accident cases, reviewing the experience and lessons, is important and necessary for preventing marine accidents. With the same subject above, the Korean Maritime Safety Tribunal provides for past marine accidents' written judgments and analysis of judgment and associated retrieval system on its homepage. In these systems, the name of the ship, accident occurrence time, accident pattern or related keywords are used as search conditions. However, most of the marine events' happening were not due to a single reason, but multiple ones. In addition, one marine event could often come under several categories. In this case, now the retrieval systems' DB is used on the Korean Maritime Safety Tribunal homepage was built based on single category and failed to be able to retrieve according to multiple reasons or multiple categories. In order to solve this problem, a more practical retrieval approach might be needed. Therefore, in this paper, a new retrieval system will be proposed, which using the linguistic label to describe the cluster after analyzing the relational properties between marine accidents and clustering by FCM algorithm, and then adding an interface to allow users to get the results they want through choosing multiple reasons or multiple categories.

A study on the improvement of concrete defect detection performance through the convergence of transfer learning and k-means clustering (전이학습과 k-means clustering의 융합을 통한 콘크리트 결함 탐지 성능 향상에 대한 연구)

  • Younggeun Yoon;Taekeun Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.2
    • /
    • pp.561-568
    • /
    • 2023
  • Various defects occur in concrete structures due to internal and external environments. If there is a defect, it is important to efficiently identify and maintain it because there is a problem with the structural safety of concrete. However, recent deep learning research has focused on cracks in concrete, and studies on exfoliation and contamination are lacking. In this study, focusing on exfoliation and contamination, which are difficult to label, four models were developed and their performance evaluated through unlabelling method, filtering method, the convergence of transfer learning based k-means clustering. As a result of the analysis, the convergence model classified the defects in the most detail and could increase the efficiency compared to direct labeling. It is hoped that the results of this study will contribute to the development of deep learning models for various types of defects that are difficult to label in the future.

Effects of Food-related Lifestyle on the Importance of Selected Attributes of Diet Lunch Box (식생활 라이프스타일 유형이 다이어트 도시락 선택속성의 중요도에 미치는 영향)

  • Kim, Binna;Sim, Ki Hyeon
    • The Korean Journal of Food And Nutrition
    • /
    • v.30 no.3
    • /
    • pp.413-426
    • /
    • 2017
  • The study subjects were 302 adult males and females aged more than 20 years living in the metropolitan area of South Korea. This study was conducted to obtain baseline data to establish proper development and marketing strategies by examining the effects of food-related lifestyles on the importance of diet, purchasing behavior towards diet lunch boxes, and their selected attributes such as menu, packaging, and services. With respect to food-related lifestyle, a cluster analysis was performed by using five factors such as convenience factor, health factor, safety factor, taste factor, and economy factor obtained from factor analysis to derive the economy type, the taste and economy type, the convenience type, the safety type, and the health type. As a result, the respondents regarded 'food hygiene (4.59)', 'freshness (4.47)', 'taste (4.28)', and 'nutrient balance (4.19)' as the selected attributes of diet lunch box menus. Moreover, the importance of diet lunch box menus (${\beta}=0.179$) was increased with increasing safety orientation. 'Shelf life label (4.42)' was the most important selected attribute of diet lunch boxes, followed by 'ingredient label (4.19)', 'nutrition facts label (4.16)', and 'indication of origin (4.15)'. In particular, the importance of packaging for diet lunch boxes (${\beta}=0.203$) was increased with increasing safety orientation. With respect to the selected attributes of services in purchasing diet lunch boxes, 'provision of personalized menus (4.07)' was the most important, and the importance of services for diet lunch box (${\beta}=0.160$) was increased with increasing taste and economy orientation. Based on the above results, the respondents gave importance to the selected attributes related to food safety and health such as hygiene and, freshness. In addition, they also placed emphasis on hygiene and safe factors such as shelf life, ingredients, and nutrition facts labels. Therefore, it is considered necessary to develop diet lunch boxes by taking these factors into account. Furthermore, in services for diet lunch boxes, it is considered necessary to establish a service system capable of providing consumers with specialized menu or nutrition counseling according to the food-related lifestyle for their proper health management. Particularly, because consumers place emphasis on both food hygiene and safety, and health, it is considered necessary to thoroughly manage hygiene, safety, and nutrition in menu or packaging so that it is possible to enhance customer satisfaction by considering these selected attributes in greater detail.

Hierarchical Ann Classification Model Combined with the Adaptive Searching Strategy (적응적 탐색 전략을 갖춘 계층적 ART2 분류 모델)

  • 김도현;차의영
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.649-658
    • /
    • 2003
  • We propose a hierarchical architecture of ART2 Network for performance improvement and fast pattern classification model using fitness selection. This hierarchical network creates coarse clusters as first ART2 network layer by unsupervised learning, then creates fine clusters of the each first layer as second network layer by supervised learning. First, it compares input pattern with each clusters of first layer and select candidate clusters by fitness measure. We design a optimized fitness function for pruning clusters by measuring relative distance ratio between a input pattern and clusters. This makes it possible to improve speed and accuracy. Next, it compares input pattern with each clusters connected with selected clusters and finds winner cluster. Finally it classifies the pattern by a label of the winner cluster. Results of our experiments show that the proposed method is more accurate and fast than other approaches.

Image Clustering Using Machine Learning : Study of InceptionV3 with K-means Methods. (머신 러닝을 사용한 이미지 클러스터링: K-means 방법을 사용한 InceptionV3 연구)

  • Nindam, Somsauwt;Lee, Hyo Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.681-684
    • /
    • 2021
  • In this paper, we study image clustering without labeling using machine learning techniques. We proposed an unsupervised machine learning technique to design an image clustering model that automatically categorizes images into groups. Our experiment focused on inception convolutional neural networks (inception V3) with k-mean methods to cluster images. For this, we collect the public datasets containing Food-K5, Flowers, Handwritten Digit, Cats-dogs, and our dataset Rice Germination, and the owner dataset Palm print. Our experiment can expand into three-part; First, format all the images to un-label and move to whole datasets. Second, load dataset into the inception V3 extraction image features and transferred to the k-mean cluster group hold on six classes. Lastly, evaluate modeling accuracy using the confusion matrix base on precision, recall, F1 to analyze. In this our methods, we can get the results as 1) Handwritten Digit (precision = 1.000, recall = 1.000, F1 = 1.00), 2) Food-K5 (precision = 0.975, recall = 0.945, F1 = 0.96), 3) Palm print (precision = 1.000, recall = 0.999, F1 = 1.00), 4) Cats-dogs (precision = 0.997, recall = 0.475, F1 = 0.64), 5) Flowers (precision = 0.610, recall = 0.982, F1 = 0.75), and our dataset 6) Rice Germination (precision = 0.997, recall = 0.943, F1 = 0.97). Our experiment showed that modeling could get an accuracy rate of 0.8908; the outcomes state that the proposed model is strongest enough to differentiate the different images and classify them into clusters.

A Multi-Layer Graphical Model for Constrained Spectral Segmentation

  • Kim, Tae Hoon;Lee, Kyoung Mu;Lee, Sang Uk
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.437-438
    • /
    • 2011
  • Spectral segmentation is a major trend in image segmentation. Specially, constrained spectral segmentation, inspired by the user-given inputs, remains its challenging task. Since it makes use of the spectrum of the affinity matrix of a given image, its overall quality depends mainly on how to design the graphical model. In this work, we propose a sparse, multi-layer graphical model, where the pixels and the over-segmented regions are the graph nodes. Here, the graph affinities are computed by using the must-link and cannot-link constraints as well as the likelihoods that each node has a specific label. They are then used to simultaneously cluster all pixels and regions into visually coherent groups across all layers in a single multi-layer framework of Normalized Cuts. Although we incorporate only the adjacent connections in the multi-layer graph, the foreground object can be efficiently extracted in the spectral framework. The experimental results demonstrate the relevance of our algorithm as compared to existing popular algorithms.

  • PDF

Unsupervised Outpatients Clustering: A Case Study in Avissawella Base Hospital, Sri Lanka

  • Hoang, Huu-Trung;Pham, Quoc-Viet;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.4
    • /
    • pp.480-490
    • /
    • 2019
  • Nowadays, Electronic Medical Record (EMR) has just implemented at few hospitals for Outpatient Department (OPD). OPD is the diversified data, it includes demographic and diseases of patient, so it need to be clustered in order to explore the hidden rules and the relationship of data types of patient's information. In this paper, we propose a novel approach for unsupervised clustering of patient's demographic and diseases in OPD. Firstly, we collect data from a hospital at OPD. Then, we preprocess and transform data by using powerful techniques such as standardization, label encoder, and categorical encoder. After obtaining transformed data, we use some strong experiments, techniques, and evaluation to select the best number of clusters and best clustering algorithm. In addition, we use some tests and measurements to analyze and evaluate cluster tendency, models, and algorithms. Finally, we obtain the results to analyze and discover new knowledge, meanings, and rules. Clusters that are found out in this research provide knowledge to medical managers and doctors. From these information, they can improve the patient management methods, patient arrangement methods, and doctor's ability. In addition, it is a reference for medical data scientist to mine OPD dataset.