• 제목/요약/키워드: Big Data Clustering

검색결과 146건 처리시간 0.019초

방대한 기상 레이더 데이터의 원할한 처리를 위한 순환 가중최소자승법 기반 RBF 뉴럴 네트워크 설계 및 응용 (Design of RBF Neural Networks Based on Recursive Weighted Least Square Estimation for Processing Massive Meteorological Radar Data and Its Application)

  • 강전성;오성권
    • 전기학회논문지
    • /
    • 제64권1호
    • /
    • pp.99-106
    • /
    • 2015
  • In this study, we propose Radial basis function Neural Network(RBFNN) using Recursive Weighted Least Square Estimation(RWLSE) to effectively deal with big data class meteorological radar data. In the condition part of the RBFNN, Fuzzy C-Means(FCM) clustering is used to obtain fitness values taking into account characteristics of input data, and connection weights are defined as linear polynomial function in the conclusion part. The coefficients of the polynomial function are estimated by using RWLSE in order to cope with big data. As recursive learning technique, RWLSE which is based on WLSE is carried out to efficiently process big data. This study is experimented with both widely used some Machine Learning (ML) dataset and big data obtained from meteorological radar to evaluate the performance of the proposed classifier. The meteorological radar data as big data consists of precipitation echo and non-precipitation echo, and the proposed classifier is used to efficiently classify these echoes.

빅데이터 분석을 이용한 이러닝 수강 후기 분석 (e-Learning Course Reviews Analysis based on Big Data Analytics)

  • 김장영;박은혜
    • 한국정보통신학회논문지
    • /
    • 제21권2호
    • /
    • pp.423-428
    • /
    • 2017
  • 인터넷과 스마트 기기의 사용량 증가로 인해 다양한 교육정보와 많은 양의 데이터가 생성되어 빠르게 확산되고 있다. 최근 이러닝 이용률이 증가하면서 발생하는 빅데이터를 활용하여 학습자들의 교육 성과와 교육 시스템의 효과성을 극대화 하는 것을 목표로 하는 교육 데이터 관련 연구 분야에 대한 관심이 높아지고 있으며 온라인에서 학습자들이 학습한 수많은 기록과 데이터들이 정보로 쌓이게 된다. 이에 본 논문에서는 이러닝 학습자들이 시스템에 남긴 수강 기록을 기반으로 학습자 현황에 대해 객관적으로 파악할 수 있도록 신경망 알고리즘인 Word2Vec을 적용하여 단어 간 유사도를 구하고 클러스터링 알고리즘을 이용하여 군집화 하였다. Word2vec을 이용하여 학습을 시키면 연관된 의미의 단어가 나타나게 되고 학습을 반복해 나가는 과정에서 점차 가까운 벡터를 지니게 된다. 또한 클러스터 알고리즘을 이용하여 명사, 동사, 형용사, 부사가 중심점에서 최소의 거리를 두고 같은 거리에 위치해 있음을 실험 검증하였다.

Similarity Analysis of Hospitalization using Crowding Distance

  • Jung, Yong Gyu;Choi, Young Jin;Cha, Byeong Heon
    • International journal of advanced smart convergence
    • /
    • 제5권2호
    • /
    • pp.53-58
    • /
    • 2016
  • With the growing use of big data and data mining, it serves to understand how such techniques can be used to understand various relationships in the healthcare field. This study uses hierarchical methods of data analysis to explore similarities in hospitalization across several New York state counties. The study utilized methods of measuring crowding distance of data for age-specific hospitalization period. Crowding distance is defined as the longest distance, or least similarity, between urban cities. It is expected that the city of Clinton have the greatest distance, while Albany the other cities are closer because they are connected by the shortest distance to each step. Similarities were stronger across hospital stays categorized by age. Hierarchical clustering can be applied to predict the similarity of data across the 10 cities of hospitalization with the measurement of crowding distance. In order to enhance the performance of hierarchical clustering, comparison can be made across congestion distance when crowding distance is applied first through the application of converting text to an attribute vector. Measurements of similarity between two objects are dependent on the measurement method used in clustering but is distinguished from the similarity of the distance; where the smaller the distance value the more similar two things are to one other. By applying this specific technique, it is found that the distance between crowding is reduced consistently in relationship to similarity between the data increases to enhance the performance of the experiments through the application of special techniques. Furthermore, through the similarity by city hospitalization period, when the construction of hospital wards in cities, by referring to results of experiments, or predict possible will land to the extent of the size of the hospital facilities hospital stay is expected to be useful in efficiently managing the patient in a similar area.

Interaction of the Lysophospholipase PNPLA7 with Lipid Droplets through the Catalytic Region

  • Chang, Pingan;Sun, Tengteng;Heier, Christoph;Gao, Hao;Xu, Hongmei;Huang, Feifei
    • Molecules and Cells
    • /
    • 제43권3호
    • /
    • pp.286-297
    • /
    • 2020
  • Mammalian patatin-like phospholipase domain containing proteins (PNPLAs) play critical roles in triglyceride hydrolysis, phospholipids metabolism, and lipid droplet (LD) homeostasis. PNPLA7 is a lysophosphatidylcholine hydrolase anchored on the endoplasmic reticulum which associates with LDs through its catalytic region (PNPLA7-C) in response to increased cyclic nucleotide levels. However, the interaction of PNPLA7 with LDs through its catalytic region is unknown. Herein, we demonstrate that PNPLA7-C localizes to the mature LDs ex vivo and also colocalizes with pre-existing LDs. Localization of PNPLA7-C with LDs induces LDs clustering via non-enzymatic intermolecular associations, while PNPLA7 alone does not induce LD clustering. Residues 742-1016 contains four putative transmembrane domains which act as a LD targeting motif and are required for the localization of PNPLA7-C to LDs. Furthermore, the N-terminal flanking region of the LD targeting motif, residues 681-741, contributes to the LD targeting, whereas the C-terminal flanking region (1169-1326) has an anti-LD targeting effect. Interestingly, the LD targeting motif does not exhibit lysophosphatidylcholine hydrolase activity even though it associates with LDs phospholipid membranes. These findings characterize the specific functional domains of PNPLA7 mediating subcellular positioning and interactions with LDs, as wells as providing critical insights into the structure of this evolutionarily conserved phospholipid-metabolizing enzyme family.

머신러닝 알고리즘 분석 및 비교를 통한 Big-5 기반 성격 분석 연구 (A Study on Big-5 based Personality Analysis through Analysis and Comparison of Machine Learning Algorithm)

  • 김용준
    • 한국인터넷방송통신학회논문지
    • /
    • 제19권4호
    • /
    • pp.169-174
    • /
    • 2019
  • 본 연구에서는 설문지를 이용한 데이터 수집과 데이터 마이닝에서 클러스터링 기법으로 군집하여 지도학습을 이용하여 유사성을 판단하고, 성격들의 상관 관계의 적합성을 분석하기 위해 특징 추출 알고리즘들과 지도학습을 이용하는 것을 목표로 진행한다. 연구 수행은 설문조사를 진행 후 그 설문조사를 토대로 모인 데이터들을 정제하고, 오픈 소스 기반의 데이터 마이닝 도구인 WEKA의 클러스터링 기법들을 통해 데이터 세트를 분류하고 지도학습을 이용하여 유사성을 판단한다. 그리고 특징 추출 알고리즘들과 지도학습을 이용하여 성격에 대해 적합한 결과가 나오는지에 대한 적합성을 판단한다. 그 결과 유사성 판단에 가장 정확도 높게 도움을 주는 것은 EM 클러스터링으로 3개의 분류하고 Naïve Bayes 지도학습을 시킨 것이 가장 높은 유사성 분류 결과를 도출하였고, 적합성을 판단하는데 도움이 되도록 특징추출과 지도학습을 수행하였을 때, Big-5 각 성격마다 문항에 추가되고 삭제되는 것에 따라 정확도가 변하는 모습을 찾게 되었고, 각 성격 마다 차이에 대한 분석을 완료하였다.

Identification of Plastic Wastes by Using Fuzzy Radial Basis Function Neural Networks Classifier with Conditional Fuzzy C-Means Clustering

  • Roh, Seok-Beom;Oh, Sung-Kwun
    • Journal of Electrical Engineering and Technology
    • /
    • 제11권6호
    • /
    • pp.1872-1879
    • /
    • 2016
  • The techniques to recycle and reuse plastics attract public attention. These public attraction and needs result in improving the recycling technique. However, the identification technique for black plastic wastes still have big problem that the spectrum extracted from near infrared radiation spectroscopy is not clear and is contaminated by noise. To overcome this problem, we apply Raman spectroscopy to extract a clear spectrum of plastic material. In addition, to improve the classification ability of fuzzy Radial Basis Function Neural Networks, we apply supervised learning based clustering method instead of unsupervised clustering method. The conditional fuzzy C-Means clustering method, which is a kind of supervised learning based clustering algorithms, is used to determine the location of radial basis functions. The conditional fuzzy C-Means clustering analyzes the data distribution over input space under the supervision of auxiliary information. The auxiliary information is defined by using k Nearest Neighbor approach.

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제21권3호
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

군집분석과 연관규칙을 활용한 고객 분류 및 장바구니 분석: 소매 유통 빅데이터를 중심으로 (Customer Classification and Market Basket Analysis Using K-Means Clustering and Association Rules: Evidence from Distribution Big Data of Korean Retailing Company)

  • 리우룬칭;이영찬;무홍레이
    • 지식경영연구
    • /
    • 제19권4호
    • /
    • pp.59-76
    • /
    • 2018
  • With the arrival of the big data era, customer data and data mining analysis have gradually dominated the process of Customer Relationship Management (CRM). This phenomenon indicates that customer data along with the use of information techniques (IT) have become the basis for building a successful CRM strategy. However, some companies can not discover valuable information through a large amount of customer data, which leads to the failure of making appropriate business strategy. Without suitable strategies, the companies may lose the competitive advantage or probably go bankrupt. The purpose of this study is to propose CRM strategies by segmenting customers into VIPs and Non-VIPs and identifying purchase patterns using the the VIPs' transaction data and data mining techniques (K-means clustering and association rules) of online shopping mall in Korea. The results of this paper indicate that 227 customers were segmented into VIPs among 1866 customers. And according to 51,080 transactions data of VIPs, home product and women wear are frequently associated with food, which means that the purchase of home product or women wears mainly affect the purchase of food. Therefore, marketing managers of shopping mall should consider these shopping patterns when they build CRM strategy.

An Optimization Method for the Calculation of SCADA Main Grid's Theoretical Line Loss Based on DBSCAN

  • Cao, Hongyi;Ren, Qiaomu;Zou, Xiuguo;Zhang, Shuaitang;Qian, Yan
    • Journal of Information Processing Systems
    • /
    • 제15권5호
    • /
    • pp.1156-1170
    • /
    • 2019
  • In recent years, the problem of data drifted of the smart grid due to manual operation has been widely studied by researchers in the related domain areas. It has become an important research topic to effectively and reliably find the reasonable data needed in the Supervisory Control and Data Acquisition (SCADA) system has become an important research topic. This paper analyzes the data composition of the smart grid, and explains the power model in two smart grid applications, followed by an analysis on the application of each parameter in density-based spatial clustering of applications with noise (DBSCAN) algorithm. Then a comparison is carried out for the processing effects of the boxplot method, probability weight analysis method and DBSCAN clustering algorithm on the big data driven power grid. According to the comparison results, the performance of the DBSCAN algorithm outperforming other methods in processing effect. The experimental verification shows that the DBSCAN clustering algorithm can effectively screen the power grid data, thereby significantly improving the accuracy and reliability of the calculation result of the main grid's theoretical line loss.

Digital Forensic for Location Information using Hierarchical Clustering and k-means Algorithm

  • Lee, Chanjin;Chung, Mokdong
    • 한국멀티미디어학회논문지
    • /
    • 제19권1호
    • /
    • pp.30-40
    • /
    • 2016
  • Recently, the competition among global IT companies for the market occupancy of the IoT(Internet of Things) is fierce. Internet of Things are all the things and people around the world connected to the Internet, and it is becoming more and more intelligent. In addition, for the purpose of providing users with a customized services to variety of context-awareness, IoT platform and related research have been active area. In this paper, we analyze third party instant messengers of Windows 8 Style UI and propose a digital forensic methodology. And, we are well aware of the Android-based map and navigation applications. What we want to show is GPS information analysis by using the R. In addition, we propose a structured data analysis applying the hierarchical clustering model using GPS data in the digital forensics modules. The proposed model is expected to help support the IOT services and efficient criminal investigation process.