• Title/Summary/Keyword: Big Data Clustering

Search Result 147, Processing Time 0.021 seconds

Clustering Corporate Brands based on Opinion Mining: A Case Study of the Automobile Industry (오피니언 마이닝을 통한 브랜드 클러스터링: 자동차 산업 사례연구)

  • Hwang, Hyun-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.453-462
    • /
    • 2016
  • Since the Internet provides a way of expressing and sharing Internet users' mindsets, corporate marketers want to acquire measurable and actionable insights from web data. In the past, companies used to analyze the attitude, satisfaction, and loyalty of consumers toward their brands using survey data, whereas nowadays this is done using the big data extracted from Social Network Services. In this study, we propose a framework for clustering brand names using the social metrics gathered on social media. We also conduct a case study of the automobile industry to verify the feasibility of the proposed framework. We calculate the brand name distance for each pair of brand names based on the total number of times that they are mentioned together. These distances are used to project the brand name onto a 3-dimensional space using multidimensional scaling. After the projection, we found the clusters of brand names and identified the characteristics of each cluster. Furthermore, we concluded this paper with a discussion of the limitations and future directions of this research.

OrdinalEncoder based DNN for Natural Gas Leak Prediction (천연가스 누출 예측을 위한 OrdinalEncoder 기반 DNN)

  • Khongorzul, Dashdondov;Lee, Sang-Mu;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.10
    • /
    • pp.7-13
    • /
    • 2019
  • The natural gas (NG), mostly methane leaks into the air, it is a big problem for the climate. detected NG leaks under U.S. city streets and collected data. In this paper, we introduced a Deep Neural Network (DNN) classification of prediction for a level of NS leak. The proposed method is OrdinalEncoder(OE) based K-means clustering and Multilayer Perceptron(MLP) for predicting NG leak. The 15 features are the input neurons and the using backpropagation. In this paper, we propose the OE method for labeling target data using k-means clustering and compared normalization methods performance for NG leak prediction. There five normalization methods used. We have shown that our proposed OE based MLP method is accuracy 97.7%, F1-score 96.4%, which is relatively higher than the other methods. The system has implemented SPSS and Python, including its performance, is tested on real open data.

A Big Data Analysis of Public Interest in Defense Reform 2.0 and Suggestions for Policy Completion

  • Kim, Tae Kyoung;Kang, Wonseok
    • Journal of East Asia Management
    • /
    • v.4 no.1
    • /
    • pp.1-22
    • /
    • 2023
  • This study conducted a big data analysis study through text mining and semantic network analysis to explore the perception of defense reform 2.0. The collected data were analyzed with the top 70 keywords as the appropriate range for network visualization. Through word frequency analysis, connection centrality analysis, and an N-gram analysis, we identified issues that received much attention such as troop reduction, shortening of military service period, dismantling of the border area unit, and returning wartime operational control. In particular, the results of clustering words through CONCOR analysis showed that there was a great interest in pursuing the technical group, concerns about military capacity reduction, and reorganization of manpower structure. The results of the analysis through text mining techniques are as follows. First, it was found that there was a lack of awareness about measures to reinforce the reduced troops while receiving much attention to the reduction of troops in Defense Reform 2.0. Second, it was found that it is necessary to actively communicate with the local community due to the deconstruction and movement of the border area units, such as the decrease of the population of the region and the collapse of the local commercial area. Third, it was judged that it is necessary to show substantial results through the promotion of barracks culture and the defense industry, which showed that there was less interest than military structure and defense operation from the people and the introduction of active policies. Through this study, we analyzed the public's interest in defense reform 2.0, which is a representative defense policy, and suggested a plan to draw support for national policy.

A Technology Analysis Model using Dynamic Time Warping

  • Choi, JunHyeog;Jun, SungHae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.113-120
    • /
    • 2015
  • Technology analysis is to analyze technological data such as patent and paper for a given technology field. From the results of technology analysis, we can get novel knowledge for R&D planing and management. For the technology analysis, we can use diverse methods of statistics. Time series analysis is one of efficient approaches for technology analysis, because most technologies have researched and developed depended on time. So many technological data are time series. Time series data are occurred through time. In this paper, we propose a methodology of technology forecasting using the dynamic time warping (DTW) of time series analysis. To illustrate how to apply our methodology to real problem, we perform a case study of patent documents in target technology field. This research will contribute to R&D planning and technology management.

Trends Analysis on Research Articles of the Sharing Economy through a Meta Study Based on Big Data Analytics (빅데이터 분석 기반의 메타스터디를 통해 본 공유경제에 대한 학술연구 동향 분석)

  • Kim, Ki-youn
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.97-107
    • /
    • 2020
  • This study aims to conduct a comprehensive meta-study from the perspective of content analysis to explore trends in Korean academic research on the sharing economy by using the big data analytics. Comprehensive meta-analysis methodology can examine the entire set of research results historically and wholly to illuminate the tendency or properties of the overall research trend. Academic research related to the sharing economy first appeared in the year in which Professor Lawrence Lessig introduced the concept of the sharing economy to the world in 2008, but research began in earnest in 2013. In particular, between 2006 and 2008, research improved dramatically. In order to grasp the overall flow of domestic academic research of trends, 8 years of papers from 2013 to the present have been selected as target analysis papers, focusing on titles, keywords, and abstracts using database of electronic journals. Big data analysis was performed in the order of cleaning, analysis, and visualization of the collected data to derive research trends and insights by year and type of literature. We used Python3.7 and Textom analysis tools for data preprocessing, text mining, and metrics frequency analysis for key word extraction, and N-gram chart, centrality and social network analysis and CONCOR clustering visualization based on UCINET6/NetDraw, Textom program, the keywords clustered into 8 groups were used to derive the typologies of each research trend. The outcomes of this study will provide useful theoretical insights and guideline to future studies.

Discovery of Travel Patterns in Seoul Metropolitan Subway Using Big Data of Smart Card Transaction Systems (스마트카드 빅데이터를 이용한 서울시 지하철 이동패턴 분석)

  • Kim, Kwanho;Oh, Kyuhyup;Lee, Yeong Kyu;Jung, Jae-Yoon
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.3
    • /
    • pp.211-222
    • /
    • 2013
  • Discovering zones which a1re sets of geographically adjacent regions are essential in sophisticated urban developments and people's movement improvements. While there are some studies that separately focus on movements between particular regions and zone discovery, they show limitations to understand people's movements from a wider viewpoint. Therefore, in this research, we propose a clustering based analysis method that aims at discovering movement patterns, which involves zones and their relations, based on a big data of smart card transaction systems. Moreover, the effectiveness of discovered movement patterns is quantitatively evaluated by using the proposed metrics. By using a real-world dataset obtained in Seoul metropolitan subway networks, we investigate and visualize hidden movement patterns in Seoul.

Building Energy Time Series Data Mining for Behavior Analytics and Forecasting Energy consumption

  • Balachander, K;Paulraj, D
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.1957-1980
    • /
    • 2021
  • The significant aim of this research has always been to evaluate the mechanism for efficient and inherently aware usage of vitality in-home devices, thus improving the information of smart metering systems with regard to the usage of selected homes and the time of use. Advances in information processing are commonly used to quantify gigantic building activity data steps to boost the activity efficiency of the building energy systems. Here, some smart data mining models are offered to measure, and predict the time series for energy in order to expose different ephemeral principles for using energy. Such considerations illustrate the use of machines in relation to time, such as day hour, time of day, week, month and year relationships within a family unit, which are key components in gathering and separating the effect of consumers behaviors in the use of energy and their pattern of energy prediction. It is necessary to determine the multiple relations through the usage of different appliances from simultaneous information flows. In comparison, specific relations among interval-based instances where multiple appliances use continue for certain duration are difficult to determine. In order to resolve these difficulties, an unsupervised energy time-series data clustering and a frequent pattern mining study as well as a deep learning technique for estimating energy use were presented. A broad test using true data sets that are rich in smart meter data were conducted. The exact results of the appliance designs that were recognized by the proposed model were filled out by Deep Convolutional Neural Networks (CNN) and Recurrent Neural Networks (LSTM and GRU) at each stage, with consolidated accuracy of 94.79%, 97.99%, 99.61%, for 25%, 50%, and 75%, respectively.

Proposition of balanced comparative confidence considering all available diagnostic tools (모든 가능한 진단도구를 활용한 균형비교신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.611-618
    • /
    • 2015
  • By Wikipedia, big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Association rule is a well researched method for discovering interesting relationships between itemsets in huge databases and has been applied in various fields. There are positive, negative, and inverse association rules according to the direction of association. If you want to set the evaluation criteria of association rule, it may be desirable to consider three types of association rules at the same time. To this end, we proposed a balanced comparative confidence considering sensitivity, specificity, false positive, and false negative, checked the conditions for association threshold by Piatetsky-Shapiro, and compared it with comparative confidence and inversely comparative confidence through a few experiments.

Clustering Algorithm using the DFP-Tree based on the MapReduce (맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘)

  • Seo, Young-Won;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.23-30
    • /
    • 2015
  • As BigData is issued, many applications that operate based on the results of data analysis have been developed, typically applications are products recommend service of e-commerce application service system, search service on the search engine service and friend list recommend system of social network service. In this paper, we suggests a decision frequent pattern tree that is combined the origin frequent pattern tree that is mining similar pattern to appear in the data set of the existing data mining techniques and decision tree based on the theory of computer science. The decision frequent pattern tree algorithm improves about problem of frequent pattern tree that have to make some a lot's pattern so it is to hard to analyze about data. We also proposes to model for a Mapredue framework that is a programming model to help to operate in distributed environment.

Morphometric Analyses on 24 Species (13 Families of Six Orders) of Korean Mammals (한국산 포유동물 24종(13과 6목)의 형태적 형질의 분석)

  • 고홍선
    • The Korean Journal of Zoology
    • /
    • v.32 no.1
    • /
    • pp.14-21
    • /
    • 1989
  • Four external and 22 cranial characters of 279 specimens representing 24 species of six orders of Korean mammals were measured. The data were analyzed by phenetic methods such as ordination as well as clustering techniques. Morphological distances were also calculated. Phenetic studies yield taxonomic placements of Siberian mink, Palearetic squirrel, and big white-toothed shrew which are incorrect. Morphological differences among Korean mammals at ordinal level in the taxonomic hierarchy are larger than those among other mammals: morphological differences below ordinal level are comparable to those among other mammals. Average taxonomic distances and morphological differences among Korean mammals at various levels in the taxonomic hierarchy are jointly monotonic, although the value of Pearson's product-moment correlation coefficient between average taxonomic distance matrix and morphological difference marrix is 0.59.

  • PDF