• Title/Summary/Keyword: Big Data Clustering

Search Result 146, Processing Time 0.035 seconds

Automatic Information Summary System using by Big Data Analysis (빅 데이터의 분석을 통한 정보 자동 요약 시스템)

  • Yun, Da Young;Lee, Hyun Hwa;Song, Jeo;Lee, Sang Moon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.415-416
    • /
    • 2014
  • 오늘날 인터넷상에서는 무수히 많은 디지털 데이터가 생성되고 있으며, 그 디지털 데이터는 기존의 소프트웨어로는 처리할 수 없을 정도로 그 양이 방대해지고 있다. 이러한 데이터들을 사용자의 검색의도에 따라 문장 분석, 키워드 추출, 요약문 생성 등의 방법을 통하여, 사용자에게 개인화된 정보를 제공하기 위한 빅 데이터의 분석을 이용한 정보 자동 요약 시스템을 제안한다.

  • PDF

Outlier detection of main engine data of a ship using ensemble method (앙상블 기법을 이용한 선박 메인엔진 빅데이터의 이상치 탐지)

  • KIM, Dong-Hyun;LEE, Ji-Hwan;LEE, Sang-Bong;JUNG, Bong-Kyu
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.56 no.4
    • /
    • pp.384-394
    • /
    • 2020
  • This paper proposes an outlier detection model based on machine learning that can diagnose the presence or absence of major engine parts through unsupervised learning analysis of main engine big data of a ship. Engine big data of the ship was collected for more than seven months, and expert knowledge and correlation analysis were performed to select features that are closely related to the operation of the main engine. For unsupervised learning analysis, ensemble model wherein many predictive models are strategically combined to increase the model performance, is used for anomaly detection. As a result, the proposed model successfully detected the anomalous engine status from the normal status. To validate our approach, clustering analysis was conducted to find out the different patterns of anomalies the anomalous point. By examining distribution of each cluster, we could successfully find the patterns of anomalies.

Perceptions and Trends of Digital Fashion Technology - A Big Data Analysis - (빅데이터 분석을 이용한 디지털 패션 테크에 대한 인식 연구)

  • Song, Eun-young;Lim, Ho-sun
    • Fashion & Textile Research Journal
    • /
    • v.23 no.3
    • /
    • pp.380-389
    • /
    • 2021
  • This study aimed to reveal the perceptions and trends of digital fashion technology through an informational approach. A big data analysis was conducted after collecting the text shown in a web environment from April 2019 to April 2021. Key words were derived through text mining analysis and network analysis, and the structure of perception of digital fashion technology was identified. Using textoms, we collected 8144 texts after data refinement, conducted a frequency of emergence and central component analysis, and visualized the results with word cloud and N-gram. The frequency of appearance also generated matrices with the top 70 words, and a structural equivalent analysis was performed. The results were presented with network visualizations and dendrograms. Fashion, digital, and technology were the most frequently mentioned topics, and the frequencies of platform, digital transformation, and start-ups were also high. Through clustering, four clusters of marketing were formed using fashion, digital technology, startups, and augmented reality/virtual reality technology. Future research on startups and smart factories with technologies based on stable platforms is needed. The results of this study contribute to increasing the fashion industry's knowledge on digital fashion technology and can be used as a foundational study for the development of research on related topics.

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.

A study on the upper body type and size of men aged 30-44 for men jacket pattern design (남성 재킷 패턴 설계를 위한 30-44세 남성의 상반신 체형 및 유형별 사이즈 연구)

  • Kwon, Dong Kuk
    • The Research Journal of the Costume Culture
    • /
    • v.29 no.6
    • /
    • pp.881-903
    • /
    • 2021
  • This study aimed to analyze adult men's body sizes and shapes and suggest size specifications to provide preliminary data to academia and industries. A total of 814 adult men aged 30-44 were selected from the 7th Size Korea data, and 55 direct upper body measurement and calculation items were analyzed using SPSS 25.0. In individual Individual differences, thickness, circumference, and width were high, and height and length were low. Height above the waist base line and shoulder dimension decreased in early 40s age group, while height below the waist base line declined as age increased. In addition, buttocks shape changes were found in early 40s age group. According to factor analysis, 'upper body and upper-extremity horizontal size', 'torso height and upper extremity length', 'shoulder dimension', 'upper body length' and 'shoulder angle' were derived. Using clustering analysis, four different body types were classified: i) big abdomen with flat chest, ii) slender with big, raised shoulders, iii) dwarfish with small, droopy shoulders, and iv) obese with large shoulders. 'Slender with big, raised shoulders' was a typical body shape among men aged 30-44. In older participants, the 'big abdomen with flat chest' ratio was low, while 'obese with large shoulders' was more common. This study proposed size specifications by body type considering the above characteristics.

Cultural Region-based Clustering of SNS Big Data and Users Preferences Analysis (문화권 클러스터링 기반 SNS 빅데이터 및 사용자 선호도 분석)

  • Rho, Seungmin
    • Journal of Advanced Navigation Technology
    • /
    • v.22 no.6
    • /
    • pp.670-674
    • /
    • 2018
  • Social network service (SNS) related data including comments/text, images, videos, blogs, and user experiences contain a wealth of information which can be used to build recommendation systems for various clients' and provide insightful data/results to business analysts. Multimedia data, especially visual data like image and videos are the richest source of SNS data which can reflect particular region, and cultures values/interests, form a gigantic portion of the overall data. Mining such huge amounts of data for extracting actionable intelligence require efficient and smart data analysis methods. The purpose of this paper is to focus on this particular modality for devising ways to model, index, and retrieve data as and when desired.

Predicting Learning Achievement Using Big Data Cluster Analysis - Focusing on Longitudinal Study (빅데이터 군집 분석을 이용한 학습성취도 예측 - 종단 연구를 중심으로)

  • Ko, Sujeong
    • Journal of Digital Contents Society
    • /
    • v.19 no.9
    • /
    • pp.1769-1778
    • /
    • 2018
  • As the value of using Big Data is increasing, various researches are being carried out utilizing big data analysis technology in the field of education as well as corporations. In this paper, we propose a method to predict learning achievement using big data cluster analysis. In the proposed method, students in Korea Children and Youth Panel Survey(KCYPS) are classified into groups with similar learning habits using the Kmeans algorithm based on the learning habits of students of the first year at middle school, and group features are extracted. Next, using the extracted features of groups, the first grade students at the middle school in the test group were classified into groups having similar learning habits using the cosine similarity, and then the neighbors were selected and the learning achievement was predicted. The method proposed in this paper has proved that the learning habits at middle school are closely related to at the university, and they make it possible to predict the learning achievement at high school and the satisfaction with university and major.

A study on the ordering of similarity measures with negative matches (음의 일치 빈도를 고려한 유사성 측도의 대소 관계 규명에 관한 연구)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.89-99
    • /
    • 2015
  • The World Economic Forum and the Korean Ministry of Knowledge Economy have selected big data as one of the top 10 in core information technology. The key of big data is to analyze effectively the properties that do have data. Clustering analysis method of big data techniques is a method of assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. Similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we studied upper and lower bounds for binary similarity measures with negative matches such as Russel and Rao measure, simple matching measure by Sokal and Michener, Rogers and Tanimoto measure, Sokal and Sneath measure, Hamann measure, and Baroni-Urbani and Buser mesures I, II. And the comparative studies with these measures were shown by real data and simulated experiment.

Design of Incremental FCM-based Recursive RBF Neural Networks Pattern Classifier for Big Data Processing (빅 데이터 처리를 위한 증분형 FCM 기반 순환 RBF Neural Networks 패턴 분류기 설계)

  • Lee, Seung-Cheol;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.6
    • /
    • pp.1070-1079
    • /
    • 2016
  • In this paper, the design of recursive radial basis function neural networks based on incremental fuzzy c-means is introduced for processing the big data. Radial basis function neural networks consist of condition, conclusion and inference phase. Gaussian function is generally used as the activation function of the condition phase, but in this study, incremental fuzzy clustering is considered for the activation function of radial basis function neural networks, which could effectively do big data processing. In the conclusion phase, the connection weights of networks are given as the linear function. And then the connection weights are calculated by recursive least square estimation. In the inference phase, a final output is obtained by fuzzy inference method. Machine Learning datasets are employed to demonstrate the superiority of the proposed classifier, and their results are described from the viewpoint of the algorithm complexity and performance index.

Types and Characteristics Analysis of Human Dynamics in Seoul Using Location-Based Big Data (위치기반 빅데이터를 활용한 서울시 활동인구 유형 및 유형별 지역 특성 분석)

  • Jung, Jae-Hoon;Nam, Jin
    • Journal of Korea Planning Association
    • /
    • v.54 no.3
    • /
    • pp.75-90
    • /
    • 2019
  • As the 24-hour society arrives, human activities in daytime and nighttime urban spaces are changing drastically, and the need for new urban management policies is steadily increasing. This study analyzes the types and characteristics of Seoul's human dynamics using location-based big data and the results are summarized as follows. First, the pattern of human dynamics in Seoul repeats itself every 7 days. Second, the types of human dynamics in Seoul can be classified into five types, and each of type has its own unique time-series and local characteristics. Third, the degree of match between human dynamics and zoning system in urban planning legislation was highest in 'Type 1' residence pattern and low in other types. The following implications can be drawn from these results. First, This paper examined the methodology of analyzing the regional characteristics of Seoul through the human dynamics and obtained meaningful results. Second, This paper can derive reliable and objective pattern analysis results using Big data that reflect the overall population characteristics. Third, the scale of night-time activity in the urban space of Seoul was understood, and its distribution, patterns and characteristics identified.