• Title/Summary/Keyword: Big Data Clustering

Search Result 146, Processing Time 0.032 seconds

A study on development method for practical use of Big Data related to recommendation to financial item (금융 상품 추천에 관련된 빅 데이터 활용을 위한 개발 방법)

  • Kim, Seok-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.8
    • /
    • pp.73-81
    • /
    • 2014
  • This study proposed development method for practical use techniques compromise data storage layer, data processing layer, data analysis layer, visualization layer. Data of storage, process, analysis of each phase can see visualization. After data process through Hadoop, the result visualize from Mahout. According to this course, we can capture several features of customer, we can choose recommendation of financial item on time. This study introduce background and problem of big data and discuss development method and case study that how to create big data has new business opportunity through financial item recommendation case.

Clustering of Smart Meter Big Data Based on KNIME Analytic Platform (KNIME 분석 플랫폼 기반 스마트 미터 빅 데이터 클러스터링)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.13-20
    • /
    • 2020
  • One of the major issues surrounding big data is the availability of massive time-based or telemetry data. Now, the appearance of low cost capture and storage devices has become possible to get very detailed time data to be used for further analysis. Thus, we can use these time data to get more knowledge about the underlying system or to predict future events with higher accuracy. In particular, it is very important to define custom tailored contract offers for many households and businesses having smart meter records and predict the future electricity usage to protect the electricity companies from power shortage or power surplus. It is required to identify a few groups with common electricity behavior to make it worth the creation of customized contract offers. This study suggests big data transformation as a side effect and clustering technique to understand the electricity usage pattern by using the open data related to smart meter and KNIME which is an open source platform for data analytics, providing a user-friendly graphical workbench for the entire analysis process. While the big data components are not open source, they are also available for a trial if required. After importing, cleaning and transforming the smart meter big data, it is possible to interpret each meter data in terms of electricity usage behavior through a dynamic time warping method.

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.

Data-Compression-Based Resource Management in Cloud Computing for Biology and Medicine

  • Zhu, Changming
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.21-31
    • /
    • 2016
  • With the application and development of biomedical techniques such as next-generation sequencing, mass spectrometry, and medical imaging, the amount of biomedical data have been growing explosively. In terms of processing such data, we face the problems surrounding big data, highly intensive computation, and high dimensionality data. Fortunately, cloud computing represents significant advantages of resource allocation, data storage, computation, and sharing and offers a solution to solve big data problems of biomedical research. In order to improve the efficiency of resource management in cloud computing, this paper proposes a clustering method and adopts Radial Basis Function in order to compress comprehensive data sets found in biology and medicine in high quality, and stores these data with resource management in cloud computing. Experiments have validated that with such a data-compression-based resource management in cloud computing, one can store large data sets from biology and medicine in fewer capacities. Furthermore, with reverse operation of the Radial Basis Function, these compressed data can be reconstructed with high accuracy.

Design of Fuzzy Neural Networks Based on Fuzzy Clustering with Uncertainty (불확실성을 고려한 퍼지 클러스터링 기반 퍼지뉴럴네트워크 설계)

  • Park, Keon-Jun;Kim, Yong-Kab;Hoang, Geun-Chang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.1
    • /
    • pp.173-181
    • /
    • 2017
  • As the industries have developed, a myriad of big data have been produced and the inherent uncertainty in the data has also increased accordingly. In this paper, we propose an interval type-2 fuzzy clustering method to deal with the inherent uncertainty in the data and, using this method, design and optimize the fuzzy neural network. Fuzzy rules using the proposed clustering method are designed and carried out the learning process. Genetic algorithms are used as an optimization method and the model parameters are optimally explored. Experiments were performed with two pattern classification, both of the experiments show the superior pattern recognition results. The proposed network will be able to provide a way to deal with the uncertainty increasing.

Information Visualization Process for Spatial Big Data (공간빅데이터를 위한 정보 시각화 방법)

  • Seo, Yang Mo;Kim, Won Kyun
    • Spatial Information Research
    • /
    • v.23 no.6
    • /
    • pp.109-116
    • /
    • 2015
  • In this study, define the concept of spatial big data and special feature of spatial big data, examine information visualization methodology for increase the insight into the data. Also presented problems and solutions in the visualization process. Spatial big data is defined as a result of quantitative expansion from spatial information and qualitative expansion from big data. Characteristics of spatial big data id defined as 6V (Volume, Variety, Velocity, Value, Veracity, Visualization), As the utilization and service aspects of spatial big data at issue, visualization of spatial big data has received attention for provide insight into the spatial big data to improve the data value. Methods of information visualization is organized in a variety of ways through Matthias, Ben, information design textbook, etc, but visualization of the spatial big data will go through the process of organizing data in the target because of the vast amounts of raw data, need to extract information from data for want delivered to user. The extracted information is used efficient visual representation of the characteristic, The large amounts of data representing visually can not provide accurate information to user, need to data reduction methods such as filtering, sampling, data binning, clustering.

Decombined Distributed Parallel VQ Codebook Generation Based on MapReduce (맵리듀스를 사용한 디컴바인드 분산 VQ 코드북 생성 방법)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.15 no.3
    • /
    • pp.365-371
    • /
    • 2014
  • In the era of big data, algorithms for the existing IT environment cannot accept on a distributed architecture such as hadoop. Thus, new distributed algorithms which apply a distributed framework such as MapReduce are needed. Lloyd's algorithm commonly used for vector quantization is developed using MapReduce recently. In this paper, we proposed a decombined distributed VQ codebook generation algorithm based on a distributed VQ codebook generation algorithm using MapReduce to get a result more fast. The result of applying the proposed algorithm to big data showed higher performance than the conventional method.

Analysis of Big Data Visualization Technology Based on Patent Analysis (특허분석을 통한 빅 데이터의 시각화 기술 분석)

  • Rho, Seungmin;Choi, YongSoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.7
    • /
    • pp.149-154
    • /
    • 2014
  • Modern data computing developments have led to big improvements in graphic capabilities and there are many new possibilities for data displays. The visualization has proven effective for not only presenting essential information in vast amounts of data but also driving complex analyses. Big-data analytics and discovery present new research opportunities to the computer graphics and visualization community. In this paper, we discuss the patent analysis of big data visualization technology development in major countries. Especially, we analyzed 160 patent applications and registered patents in four countries on November 2012. According to the result of analysis provided by this paper, the text clustering analysis and 2D visualization are important and urgent development is needed to be oriented. In particular, due to the increase of use of smart devices and social networks in domestic, the development of three-dimensional visualization for Big Data can be seen very urgent.

Spatial clustering of pedestrian traffic accidents in Daegu (대구광역시 교통약자 보행자 교통사고 공간 군집 분석)

  • Hwang, Yeongeun;Park, Seonghee;Choi, Hwabeen;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.75-83
    • /
    • 2022
  • Korea, which has the highest pedestrian fatality rate among OECD countries, is making efforts to improve the safe walking environment by enacting laws focusing on pedestrian. Spatial clustering was conducted with scan statistics after examining the social network data related to traffic accidents for children and seniors. The word cloud was used to examine people's recognition Campaigns for children and literature survey for seniors were in main concern. Naedang and Yongsan are the regions with the highest relative risk of weak pedestrian for children and seniors. On the contrary, Bongmu and Beomeo are the lowest relative risk region. Naedang-dong and Yongsan-dong of Daegu Metropolitan City were identified as vulnerable areas for pedestrian safety due to the high risk of pedestrian accidents for children and the elderly. This means that the scan statistics are effective in searching for traffic accident risk areas.

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.