• Title/Summary/Keyword: 데이터 군집화

Search Result 567, Processing Time 0.026 seconds

Research on the Development of Distance Metrics for the Clustering of Vessel Trajectories in Korean Coastal Waters (국내 연안 해역 선박 항적 군집화를 위한 항적 간 거리 척도 개발 연구)

  • Seungju Lee;Wonhee Lee;Ji Hong Min;Deuk Jae Cho;Hyunwoo Park
    • Journal of Navigation and Port Research
    • /
    • v.47 no.6
    • /
    • pp.367-375
    • /
    • 2023
  • This study developed a new distance metric for vessel trajectories, applicable to marine traffic control services in the Korean coastal waters. The proposed metric is designed through the weighted summation of the traditional Hausdorff distance, which measures the similarity between spatiotemporal data and incorporates the differences in the average Speed Over Ground (SOG) and the variance in Course Over Ground (COG) between two trajectories. To validate the effectiveness of this new metric, a comparative analysis was conducted using the actual Automatic Identification System (AIS) trajectory data, in conjunction with an agglomerative clustering algorithm. Data visualizations were used to confirm that the results of trajectory clustering, with the new metric, reflect geographical distances and the distribution of vessel behavioral characteristics more accurately, than conventional metrics such as the Hausdorff distance and Dynamic Time Warping distance. Quantitatively, based on the Davies-Bouldin index, the clustering results were found to be superior or comparable and demonstrated exceptional efficiency in computational distance calculation.

Utilization Pattern Analysis of an Enterprise Information System using Event Log Data (로그 데이터를 이용한 기업 정보 시스템의 사용 패턴 분석)

  • Han, Kwan Hee
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.723-732
    • /
    • 2022
  • The success of enterprise information system(EIS) is crucial to align with corporate strategies and eventually attain corporate goals. Since one of the factors to information system success is system use, managerial efforts to measure the level of EIS utilization is vital. In this paper, the EIS utilization level is analyzed using system access log data. In particular, process sequence patterns and clustering of similar functions are identified in more detail based on a process mining method, in addition to basic access log statistics. The result of this research can be used to improve existing information system design by finding real IS usage sequences and function clusters.

Visualization Method of Document Retrieval Result based on Centers of Clusters (군집 중심 기반 문헌 검색 결과의 시각화)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.5
    • /
    • pp.16-26
    • /
    • 2007
  • Because it is difficult on existing document retrieval systems to visualize the search result, search results show document titles and short summaries of the parts that include the search keywords. If the result list is long, it is difficult to examine all the documents at once and to find a relation among them. This study uses clustering to classify similar documents into groups to make it easy to grasp the relations among the searched documents. Also, this study proposes a two-level visualization algorithm such that, first, the center of clusters is projected to low-dimensional space by using multi-dimensional scaling to help searchers grasp the relation among clusters at a glance, and second, individual documents are drawn in low-dimensional space based on the center of clusters using the orbital model as a basis to easily confirm similarities among individual documents. This study is tested on the benchmark data and the real data, and it shows that it is possible to visualize search results in real time.

A Movie Recommendation System based on Fuzzy-AHP with User Preference and Partition Algorithm (사용자 선호도와 군집 알고리즘을 이용한 퍼지-계층적 분석 기법 기반 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.425-432
    • /
    • 2017
  • The current recommendation systems have problems including the difficulty of figuring out whether they recommend items that actual users have preference for or have simple interest in, the scarcity of data to recommend proper items due to the extremely small number of users, and the cold-start issue of the dropping system performance to recommend items that can satisfy users according to the influx of new users. In an effort to solve these problems, this study implemented a movie recommendation system to ensure user satisfaction by using the Fuzzy-Analytic Hierarchy Process, which can reflect uncertain situations and problems, and the data partition algorithm to group similar items among the given ones. The data of a survey on movie preference with 61 users was applied to the system, and the results show that it solved the data scarcity problem based on the Fuzzy-AHP and recommended items fit for a user with the data partition algorithm even with the influx of new users. It is thought that research on the density-based clustering will be needed to filter out future noise data or outlier data.

A study on development method for practical use of Big Data related to recommendation to financial item (금융 상품 추천에 관련된 빅 데이터 활용을 위한 개발 방법)

  • Kim, Seok-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.8
    • /
    • pp.73-81
    • /
    • 2014
  • This study proposed development method for practical use techniques compromise data storage layer, data processing layer, data analysis layer, visualization layer. Data of storage, process, analysis of each phase can see visualization. After data process through Hadoop, the result visualize from Mahout. According to this course, we can capture several features of customer, we can choose recommendation of financial item on time. This study introduce background and problem of big data and discuss development method and case study that how to create big data has new business opportunity through financial item recommendation case.

Clustering and Classifying DNA Chip Data using Particle Swarm Optimization Algorithm (Particle Swarm Optimization 알고리즘을 이용한 바이오칩 데이터의 군집화 및 분류화 기법)

  • Lee, Yoon-Kyung;Yoon, Hye-Jung;Lee, Min-Soo;Yoon, Kyong-Oh;Choi, Hye-Yeon;Kim, Dae-Hyun;Lee, Keun-Il;Kim, Dae-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.151-154
    • /
    • 2007
  • 바이오 칩 분석 시스템은 다양한 종류의 바이오칩에서 자료를 추출하고 유용한 정보를 얻기 위해 데이터를 분석하는 시스템이다. 데이터를 분석하는 다양한 기법 중 대표적인 것이 클러스터링과 분류화(classification)이다. 클러스터링은 비슷한 개체들을 한 집단으로 묶는 방법이고, 분류화는 미리 정해진 클래스에 데이터를 해당하는 클래스로 분류하는 기법이다. 다양한 알고리즘을 통해서 데이터를 클러스터링 및 분류화를 할 수 있는데 바이오칩과 같이 데이터의 양이 방대한 경우는 생태계를 모방한 알고리즘을 적용하는 것이 효율적이다. 본 논문에서는 생태계 모방알고리즘 중 하나인 PSO 집단 알고리즘을 사용하여 바이오칩 데이터로부터 클러스터의 중심을 찾아 클러스터링을 하교, 분류 규칙을 발견하여 이를 바이오 데이터에 적용, 분류해 주는 시스템을 기술하고 있다.

  • PDF

Experimental Analysis of Clustering of Various Product Models for Production and Inventory Policy (생산재고 정책수립을 위한 다품종모델 군집화의 실증적 분석)

  • 김훈태;정재윤;강석호
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.345-350
    • /
    • 2003
  • 현대의 많은 제조산업에서 제품 모델의 다양성은 점점 증가하고 있다. 제품 모델을 관리하기 위한 이론적 모델이 존재하지만, 현실의 변수들을 이론적으로 다루기 쉽지 않으며, 수리적인 접근도 한계가 있다. 본 논문은 생산재고 정책을 적용하기 위하여 다수의 제품 모델을 군집화 하기 위한 실증연구를 목표로 한다. 수요 분포, 생산 특성, 재고 수준의 세 가지 측면에서 각 모델의 공급에 영향을 미칠 수 있는 몇 가지 변수들을 정의하였다. 먼저 변수들의 상호 연관관계를 파악하기 위해서 요인 분석을 수행하여 변수들의 주요 유형을 파악하고, 이를 실제 생산 데이터에 적용함으로써 실증적 분석을 하였다. 본 논문에서 는 자동차 속기 모델에 위의 변수들을 적용하여 모델의 군집화를 수행하였다. 이러한 제품 모델 관리에 관한 연구는 다단계 공급망의 형성과 민첩한 생산정책 수립의 요구와 함께 중요한 의미를 가질 것이다.

  • PDF

A Study of Incremental Clustering Technique based on Ontology (온톨로지 기반 점진적 클러스터링 기법에 관한 연구)

  • Kim Je-Min;Park Young-Tack
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.643-645
    • /
    • 2005
  • 클러스터링은 무질서한 데이터들의 상호 연관 관계를 정의하고, 이를 통하여 보다 체계적으로 데이터를 군집화하는 것이다. 클러스터링을 적용한 웹 서비스 시스템은 비슷한 내용을 묶어 제공하기 때문에 사용자는 보다 효율적으로 정보를 제공받을 수 있다. 시멘틱 웹의 기반이 되는 온톨로지는 클러스터링을 위한 완벽한 입력 데이터를 제공한다. 본 논문은 온톨로지를 기반의 메타 데이터를 클러스터링 하기 위한 기법을 제안한다. 본 논문의 목적은 온톨로지 기반의 메타 데이터들의 유사성을 측정하기 위한 평가함수를 정의하고, 이러한 평가함수를 적용한 계층적 클러스터링 알고리즘을 연구하는 것이다.

  • PDF

An Efficient Deep Learning Ensemble Using a Distribution of Label Embedding

  • Park, Saerom
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.27-35
    • /
    • 2021
  • In this paper, we propose a new stacking ensemble framework for deep learning models which reflects the distribution of label embeddings. Our ensemble framework consists of two phases: training the baseline deep learning classifier, and training the sub-classifiers based on the clustering results of label embeddings. Our framework aims to divide a multi-class classification problem into small sub-problems based on the clustering results. The clustering is conducted on the label embeddings obtained from the weight of the last layer of the baseline classifier. After clustering, sub-classifiers are constructed to classify the sub-classes in each cluster. From the experimental results, we found that the label embeddings well reflect the relationships between classification labels, and our ensemble framework can improve the classification performance on a CIFAR 100 dataset.

Comparison of journal clustering methods based on citation structure (논문 인용에 따른 학술지 군집화 방법의 비교)

  • Kim, Jinkwang;Kim, Sohyung;Oh, Changhyuck
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.827-839
    • /
    • 2015
  • Extraction of communities from a journal citation database by the citation structure is a useful tool to see closely related groups of the journals. SCI of Thomson Reuters or SCOPUS of Elsevier have had tried to grasp community structure of the journals in their indices according to citation relationships, but such a trial has not been made yet with the Korean Citation Index, KCI. Therefore, in this study, we extracted communities of the journals of the natural science area in KCI, using various clustering algorithms for a social network based on citations among the journals and compared the groups obtained with the classfication of KCI. The infomap algorithm, one of the clustering methods applied in this article, showed the best grouping result in the sense that groups obtained by it are closer to the KCI classification than by other algorithms considered and reflect well the citation structure of the journals. The classification results obtained in this study might be taken consideration when reclassification of the KCI journals will be made in the future.