• 제목/요약/키워드: Cluster label

검색결과 40건 처리시간 0.026초

A Novel Thresholding for Prediction Analytics with Machine Learning Techniques

  • Shakir, Khan;Reemiah Muneer, Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.33-40
    • /
    • 2023
  • Machine-learning techniques are discovering effective performance on data analytics. Classification and regression are supported for prediction on different kinds of data. There are various breeds of classification techniques are using based on nature of data. Threshold determination is essential to making better model for unlabelled data. In this paper, threshold value applied as range, based on min-max normalization technique for creating labels and multiclass classification performed on rainfall data. Binary classification is applied on autism data and classification techniques applied on child abuse data. Performance of each technique analysed with the evaluation metrics.

COUNTING OF FLOWERS BASED ON K-MEANS CLUSTERING AND WATERSHED SEGMENTATION

  • PAN ZHAO;BYEONG-CHUN SHIN
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • 제27권2호
    • /
    • pp.146-159
    • /
    • 2023
  • This paper proposes a hybrid algorithm combining K-means clustering and watershed algorithms for flower segmentation and counting. We use the K-means clustering algorithm to obtain the main colors in a complex background according to the cluster centers and then take a color space transformation to extract pixel values for the hue, saturation, and value of flower color. Next, we apply the threshold segmentation technique to segment flowers precisely and obtain the binary image of flowers. Based on this, we take the Euclidean distance transformation to obtain the distance map and apply it to find the local maxima of the connected components. Afterward, the proposed algorithm adaptively determines a minimum distance between each peak and apply it to label connected components using the watershed segmentation with eight-connectivity. On a dataset of 30 images, the test results reveal that the proposed method is more efficient and precise for the counting of overlapped flowers ignoring the degree of overlap, number of overlap, and relatively irregular shape.

분류나무를 활용한 군집분석의 입력특성 선택: 신용카드 고객세분화 사례 (Classification Tree-Based Feature-Selective Clustering Analysis: Case of Credit Card Customer Segmentation)

  • 윤한성
    • 디지털산업정보학회논문지
    • /
    • 제19권4호
    • /
    • pp.1-11
    • /
    • 2023
  • Clustering analysis is used in various fields including customer segmentation and clustering methods such as k-means are actively applied in the credit card customer segmentation. In this paper, we summarized the input features selection method of k-means clustering for the case of the credit card customer segmentation problem, and evaluated its feasibility through the analysis results. By using the label values of k-means clustering results as target features of a decision tree classification, we composed a method for prioritizing input features using the information gain of the branch. It is not easy to determine effectiveness with the clustering effectiveness index, but in the case of the CH index, cluster effectiveness is improved evidently in the method presented in this paper compared to the case of randomly determining priorities. The suggested method can be used for effectiveness of actively used clustering analysis including k-means method.

FCM을 이용한 지능형 해양사고 DB 검색시스템 구축 (Intelligent DB Retrieval System for Marine Accidents Using FCM)

  • 박계각;한욱;김영기;오세웅
    • 한국지능시스템학회논문지
    • /
    • 제19권4호
    • /
    • pp.568-573
    • /
    • 2009
  • 해양사고로 인한 경제적, 환경적 피해가 커짐에 따라, 해양사고 방지를 위한 이슈가 크게 대두되고 있다. 발생된 해양사고 사례의 종류와 원인을 분석하여 구축된 DB가 해양사고 방지를 위한 연구에 널리 활용 되고 있으나, 하나의 종류 및 원인에 대해서만 DB가 구축되어 있어 일반적으로 복수의 원인에 의해 발생되고 복수의 종류에 해당하는 해양사고를 합리적으로 분류하지 못하고 다양하고 막연한 조건을 이용해 검색할 수 없다는 문제점이 있다. 따라서 본 연구에서는 FCM을 이용하여 복수의 해양사고 원인과 종류에 연계된 해양사고 DB를 구축하고 언어 레이블을 이용하여 다양한 원인과 종류에 의해 해양사고 사례추출이 가능한 검색 시스템을 제시하였다.

전이학습과 k-means clustering의 융합을 통한 콘크리트 결함 탐지 성능 향상에 대한 연구 (A study on the improvement of concrete defect detection performance through the convergence of transfer learning and k-means clustering)

  • 윤영근;오태근
    • 문화기술의 융합
    • /
    • 제9권2호
    • /
    • pp.561-568
    • /
    • 2023
  • 콘크리트 구조물은 대내외적 환경에 의해 다양한 결함이 발생한다. 결함이 있는 경우 콘크리트의 구조적 안전성에 문제가 있어 이를 효율적으로 파악하여 유지관리하는 것이 중요하다. 하지만, 최근 딥러닝 연구는 콘크리트의 균열에 초점이 맞추어져 있어, 박락과 오염 등에 대한 연구는 부족하다. 본 연구에서는 라벨링이 어려운 박락과 오염에 초점을 맞추어 언라벨 방법, 필터링 방법, 전이학습과 k-means cluster의 융합을 통한 4개의 모델을 개발하고 성능을 평가하였다. 분석결과, 융합모델이 결함을 가장 세밀하게 구분하였으며, 직접 라벨링을 하는 것보다 효율성을 증가시킬 수 있었다. 본 연구 결과가 향후 라벨링이 어려운 다양한 결함 유형에 대한 딥러닝 모델 개발에 기여할 수 있기를 기대한다.

식생활 라이프스타일 유형이 다이어트 도시락 선택속성의 중요도에 미치는 영향 (Effects of Food-related Lifestyle on the Importance of Selected Attributes of Diet Lunch Box)

  • 김빛나;심기현
    • 한국식품영양학회지
    • /
    • 제30권3호
    • /
    • pp.413-426
    • /
    • 2017
  • The study subjects were 302 adult males and females aged more than 20 years living in the metropolitan area of South Korea. This study was conducted to obtain baseline data to establish proper development and marketing strategies by examining the effects of food-related lifestyles on the importance of diet, purchasing behavior towards diet lunch boxes, and their selected attributes such as menu, packaging, and services. With respect to food-related lifestyle, a cluster analysis was performed by using five factors such as convenience factor, health factor, safety factor, taste factor, and economy factor obtained from factor analysis to derive the economy type, the taste and economy type, the convenience type, the safety type, and the health type. As a result, the respondents regarded 'food hygiene (4.59)', 'freshness (4.47)', 'taste (4.28)', and 'nutrient balance (4.19)' as the selected attributes of diet lunch box menus. Moreover, the importance of diet lunch box menus (${\beta}=0.179$) was increased with increasing safety orientation. 'Shelf life label (4.42)' was the most important selected attribute of diet lunch boxes, followed by 'ingredient label (4.19)', 'nutrition facts label (4.16)', and 'indication of origin (4.15)'. In particular, the importance of packaging for diet lunch boxes (${\beta}=0.203$) was increased with increasing safety orientation. With respect to the selected attributes of services in purchasing diet lunch boxes, 'provision of personalized menus (4.07)' was the most important, and the importance of services for diet lunch box (${\beta}=0.160$) was increased with increasing taste and economy orientation. Based on the above results, the respondents gave importance to the selected attributes related to food safety and health such as hygiene and, freshness. In addition, they also placed emphasis on hygiene and safe factors such as shelf life, ingredients, and nutrition facts labels. Therefore, it is considered necessary to develop diet lunch boxes by taking these factors into account. Furthermore, in services for diet lunch boxes, it is considered necessary to establish a service system capable of providing consumers with specialized menu or nutrition counseling according to the food-related lifestyle for their proper health management. Particularly, because consumers place emphasis on both food hygiene and safety, and health, it is considered necessary to thoroughly manage hygiene, safety, and nutrition in menu or packaging so that it is possible to enhance customer satisfaction by considering these selected attributes in greater detail.

적응적 탐색 전략을 갖춘 계층적 ART2 분류 모델 (Hierarchical Ann Classification Model Combined with the Adaptive Searching Strategy)

  • 김도현;차의영
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제30권7_8호
    • /
    • pp.649-658
    • /
    • 2003
  • 본 연구에서는 ART2 신경회로망의 성능을 개선하기 위한 계층적 구조를 제안하고, 구성된 클러스터에 대하여 적합도(fitness) 선택을 통한 빠르고 효과적인 패턴 분류 모델(HART2)을 제안한다. 본 논문에서 제안하는 신경회로망은 비지도 학습을 통하여 대략적으로 1차 클러스터를 형성하고, 이 각각의 1차 클러스터로 분류된 패턴에 대해 지도학습을 통한 2군 클러스터를 생성하여 패턴을 분류하는 계층적 신경회로망이다. 이 신경회로망을 이용한 패턴분류 과정은 먼저 입력패턴을 1차 클러스터와 비교하여 유사한 몇 개의 1차 클러스터를 적합도에 따라 선택한다. 이때, 입력패턴과 클러스터들간의 상대 측정 거리비에 기반한 적합도 함수를 도입하여 1차 클러스터에 연결된 클러스터들을 Pruning 함으로써 계층적인 네트워크에서의 속도 향상과 정확성을 추구하였다. 마지막으로 입력패턴과 선택된 1차 클러스터에 연결된 2차 클러스터와의 비교를 통해 최종적으로 패턴을 분류하게 된다. 본 논문의 효율성을 검증하기 위하여 22종의 한글 및 영어 글꼴에 대한 숫자 데이타를 다양한 형태로 변형시켜 확장된 테스트 패턴에 대하여 실험해 본 결과 제안된 신경회로망의 패턴 분류 능력의 우수함을 증명하였다

머신 러닝을 사용한 이미지 클러스터링: K-means 방법을 사용한 InceptionV3 연구 (Image Clustering Using Machine Learning : Study of InceptionV3 with K-means Methods.)

  • 닌담 솜사우트;이효종
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.681-684
    • /
    • 2021
  • In this paper, we study image clustering without labeling using machine learning techniques. We proposed an unsupervised machine learning technique to design an image clustering model that automatically categorizes images into groups. Our experiment focused on inception convolutional neural networks (inception V3) with k-mean methods to cluster images. For this, we collect the public datasets containing Food-K5, Flowers, Handwritten Digit, Cats-dogs, and our dataset Rice Germination, and the owner dataset Palm print. Our experiment can expand into three-part; First, format all the images to un-label and move to whole datasets. Second, load dataset into the inception V3 extraction image features and transferred to the k-mean cluster group hold on six classes. Lastly, evaluate modeling accuracy using the confusion matrix base on precision, recall, F1 to analyze. In this our methods, we can get the results as 1) Handwritten Digit (precision = 1.000, recall = 1.000, F1 = 1.00), 2) Food-K5 (precision = 0.975, recall = 0.945, F1 = 0.96), 3) Palm print (precision = 1.000, recall = 0.999, F1 = 1.00), 4) Cats-dogs (precision = 0.997, recall = 0.475, F1 = 0.64), 5) Flowers (precision = 0.610, recall = 0.982, F1 = 0.75), and our dataset 6) Rice Germination (precision = 0.997, recall = 0.943, F1 = 0.97). Our experiment showed that modeling could get an accuracy rate of 0.8908; the outcomes state that the proposed model is strongest enough to differentiate the different images and classify them into clusters.

A Multi-Layer Graphical Model for Constrained Spectral Segmentation

  • 김태훈;이경무;이상욱
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송공학회 2011년도 하계학술대회
    • /
    • pp.437-438
    • /
    • 2011
  • Spectral segmentation is a major trend in image segmentation. Specially, constrained spectral segmentation, inspired by the user-given inputs, remains its challenging task. Since it makes use of the spectrum of the affinity matrix of a given image, its overall quality depends mainly on how to design the graphical model. In this work, we propose a sparse, multi-layer graphical model, where the pixels and the over-segmented regions are the graph nodes. Here, the graph affinities are computed by using the must-link and cannot-link constraints as well as the likelihoods that each node has a specific label. They are then used to simultaneously cluster all pixels and regions into visually coherent groups across all layers in a single multi-layer framework of Normalized Cuts. Although we incorporate only the adjacent connections in the multi-layer graph, the foreground object can be efficiently extracted in the spectral framework. The experimental results demonstrate the relevance of our algorithm as compared to existing popular algorithms.

  • PDF

Unsupervised Outpatients Clustering: A Case Study in Avissawella Base Hospital, Sri Lanka

  • Hoang, Huu-Trung;Pham, Quoc-Viet;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제22권4호
    • /
    • pp.480-490
    • /
    • 2019
  • Nowadays, Electronic Medical Record (EMR) has just implemented at few hospitals for Outpatient Department (OPD). OPD is the diversified data, it includes demographic and diseases of patient, so it need to be clustered in order to explore the hidden rules and the relationship of data types of patient's information. In this paper, we propose a novel approach for unsupervised clustering of patient's demographic and diseases in OPD. Firstly, we collect data from a hospital at OPD. Then, we preprocess and transform data by using powerful techniques such as standardization, label encoder, and categorical encoder. After obtaining transformed data, we use some strong experiments, techniques, and evaluation to select the best number of clusters and best clustering algorithm. In addition, we use some tests and measurements to analyze and evaluate cluster tendency, models, and algorithms. Finally, we obtain the results to analyze and discover new knowledge, meanings, and rules. Clusters that are found out in this research provide knowledge to medical managers and doctors. From these information, they can improve the patient management methods, patient arrangement methods, and doctor's ability. In addition, it is a reference for medical data scientist to mine OPD dataset.