• Title/Summary/Keyword: Big data Problem

Search Result 571, Processing Time 0.04 seconds

A Design of the Cloud Aggregator on the MapReduce in the Multi Cloud

  • Hwang, Chigon;Shin, Hyoyoung;Lee, Jong-Yong;Jung, Kye-Dong
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.83-90
    • /
    • 2016
  • The emergence of cloud has been able to provide a variety of IT service to the user. As organizations and companies are increased that provide these cloud service, many problems arises on integration. However, with the advent of latest technologies such as big data, document-oriented database, and MapReduce, this problem can be easily solved. This paper is intended to design the Cloud Aggregator to provide them as a service to collect information of the cloud system providing each service. To do this, we use the DBaaS(DataBase as a Service) and MapReduce techniques. This makes it possible to maintain the functionality of existing system and correct the problem that may occur depending on the combination.

Guideline on Security Measures and Implementation of Power System Utilizing AI Technology (인공지능을 적용한 전력 시스템을 위한 보안 가이드라인)

  • Choi, Inji;Jang, Minhae;Choi, Moonsuk
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.6 no.4
    • /
    • pp.399-404
    • /
    • 2020
  • There are many attempts to apply AI technology to diagnose facilities or improve the work efficiency of the power industry. The emergence of new machine learning technologies, such as deep learning, is accelerating the digital transformation of the power sector. The problem is that traditional power systems face security risks when adopting state-of-the-art AI systems. This adoption has convergence characteristics and reveals new cybersecurity threats and vulnerabilities to the power system. This paper deals with the security measures and implementations of the power system using machine learning. Through building a commercial facility operations forecasting system using machine learning technology utilizing power big data, this paper identifies and addresses security vulnerabilities that must compensated to protect customer information and power system safety. Furthermore, it provides security guidelines by generalizing security measures to be considered when applying AI.

Anomaly Detection of Big Time Series Data Using Machine Learning (머신러닝 기법을 활용한 대용량 시계열 데이터 이상 시점탐지 방법론 : 발전기 부품신호 사례 중심)

  • Kwon, Sehyug
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.33-38
    • /
    • 2020
  • Anomaly detection of Machine Learning such as PCA anomaly detection and CNN image classification has been focused on cross-sectional data. In this paper, two approaches has been suggested to apply ML techniques for identifying the failure time of big time series data. PCA anomaly detection to identify time rows as normal or abnormal was suggested by converting subjects identification problem to time domain. CNN image classification was suggested to identify the failure time by re-structuring of time series data, which computed the correlation matrix of one minute data and converted to tiff image format. Also, LASSO, one of feature selection methods, was applied to select the most affecting variables which could identify the failure status. For the empirical study, time series data was collected in seconds from a power generator of 214 components for 25 minutes including 20 minutes before the failure time. The failure time was predicted and detected 9 minutes 17 seconds before the failure time by PCA anomaly detection, but was not detected by the combination of LASSO and PCA because the target variable was binary variable which was assigned on the base of the failure time. CNN image classification with the train data of 10 normal status image and 5 failure status images detected just one minute before.

A Safety IO Throttling Method Inducting Differential End of Life to Improving the Reliability of Big Data Maintenance in the SSD based RAID (SSD기반 RAID 시스템에서 빅데이터 유지 보수의 신뢰성을 향상시키기 위한 차등 수명 마감을 유도하는 안전한 IO 조절 기법)

  • Lee, Hyun-Seob
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.593-598
    • /
    • 2022
  • Recently, data production has seen explosive growth, and the storage systems to store these big data safely and quickly is evolving in various ways. A typical configuration of storage systems is the use of SSDs with fast data processing speed as a RAID group that can maintain reliable data. However, since NAND flash memory, which composes SSD, has the feature that deterioration if writes more than a certain number of times are repeated, can increase the likelihood of simultaneous failure on multiple SSDs in a RAID group. And this can result in serious reliability problems that data cannot be recovered. Thus, in order to solve this problem, we propose a method of throttling IOs so that each SSD within a RAID group leads to a different life-end. The technique proposed in this paper utilizes SMART to control the state of each SSD and the number of IOs allocated according to the data pattern used step by step. In addition, this method has the advantage of preventing large amounts of concurrency defects in RAID because it induces differential lifetime finishes of SSDs.

Design Thinking Methodology for Social Innovation using Big Data and Qualitative Research (사회혁신분야에서 근거이론 기반 질적연구와 빅데이터 분석을 활용한 디자인 씽킹 방법론)

  • Park, Sang Hyeok;Oh, Seung Hee;Park, Soon Hwa
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.13 no.4
    • /
    • pp.169-181
    • /
    • 2018
  • Under the constantly intensifying global competition environment, many companies are exploring new business opportunities in the field of social innovation using creating shared value. In seeking social innovation, it is a key starting point of social innovation to clarify the problem to be solved and to grasp the cause of the problem. Among the many problem solving methodologies, design thinking is getting the most attention recently in various fields. Design Thinking is a creative problem solving method which is used as a business innovation tool to empathize with human needs and find out the potential desires that the public does not know, and is actively used as a tool for social innovation to solve social problems. However, one of the difficulties experienced by many of the design thinking project participants is that it is difficult to analyze the observed data efficiently. When analyzing data only offline, it takes a long time to analyze a large amount of data, and it has a limit in processing unstructured data. This makes it difficult to find fundamental problems from the data collected through observation while performing design thinking. The purpose of this study is to integrate qualitative data analysis and quantitative data analysis methods in order to make the data analysis collected at the observation stage of the design thinking project for social innovation more scientific to complement the limit of the design thinking process. The integrated methodology presented in this study is expected to contribute to innovation performance through design thinking by providing practical guidelines and implications for design thinking implementers as a valuable tool for social innovation.

A Prediction of Number of Patients and Risk of Disease in Each Region Based on Pharmaceutical Prescription Data (의약품 처방 데이터 기반의 지역별 예상 환자수 및 위험도 예측)

  • Chang, Jeong Hyeon;Kim, Young Jae;Choi, Jong Hyeok;Kim, Chang Su;Aziz, Nasridinov
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.271-280
    • /
    • 2018
  • Recently, big data has been growing rapidly due to the development of IT technology. Especially in the medical field, big data is utilized to provide services such as patient-customized medical care, disease management and disease prediction. In Korea, 'National Health Alarm Service' is provided by National Health Insurance Corporation. However, the prediction model has a problem of short-term prediction within 3 days and unreliability of social data used in prediction model. In order to solve these problems, this paper proposes a disease prediction model using medicine prescription data generated from actual patients. This model predicts the total number of patients and the risk of disease in each region and uses the ARIMA model for long-term predictions.

File Formats with a Multi-Layer Structure and API Design (다중 레이어 구조로 된 보안 파일 포맷 및 API 설계)

  • Park, Jong-Moon;Yoon, Jeong-Ho;Jo, Hyeon-Tae;Kim, Ki-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.123-127
    • /
    • 2012
  • Since the propagation of computers and Internet along with proliferation of smartphones rise, a large amount of data is being produced and modified daily. As the usage of data soars, a way of securely storing data emerged as a new problem. In this paper, saving big-data by using hierarchical data structure with multi-layer form, to come up with new security file format and API by applying encryption on each layers, is introduced. Moreover, we expect to see shown file format in this paper to be used in various fields.

  • PDF

A Solution to The Data Dependency Problem from the Big Data on Parallel Distributed Systems (병렬 분산 시스템에서 대용량 데이터의 의존성 해결을 위한 방법)

  • Kim, Hyun-Jun;Kim, Tae-Won;Kim, Joon-Mo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.163-165
    • /
    • 2012
  • 본 논문은 대용량 데이터를 분할하여 병렬 처리하는 경우 데이터간의 의존성성에 의해 발생할 수 있는 문제점을 회피하거나 극복하기 위한 방법에 대한 연구이다. 의존성 문제를 해소하기 위한 병렬 분산 처리 시스템을 개발하여, 대용량 파일 처리의 효율을 높이고자 한다. 분산처리의 성능 평가를 위하여 동영상 파일의 분산 저장 및 재인코딩 시간을 측정하여 성능의 지표로 활용한다.

A Trip Mobility Analysis using Big Data (빅데이터 기반의 모빌리티 분석)

  • Cho, Bumchul;Kim, Juyoung;Kim, Dong-ho
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.85-95
    • /
    • 2020
  • In this study, a mobility analysis method is suggested to estimate an O/D trip demand estimation using Mobile Phone Signaling Data. Using mobile data based on mobile base station location information, a trip chain database was established for each person and daily traffic patterns were analyzed. In addition, a new algorithm was developed to determine the traffic characteristics of their mobilities. To correct the ping pong handover problem of communication data itself, the methodology was developed and the criteria for stay time was set to distinguish pass by between stay within the influence area. The big-data based method is applied to analyze the mobility pattern in inter-regional trip and intra-regional trip in both of an urban area and a rural city. When comparing it with the results with traditional methods, it seems that the new methodology has a possibility to be applied to the national survey projects in the future.

Prediction of spatio-temporal AQI data

  • KyeongEun Kim;MiRu Ma;KyeongWon Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.2
    • /
    • pp.119-133
    • /
    • 2023
  • With the rapid growth of the economy and fossil fuel consumption, the concentration of air pollutants has increased significantly and the air pollution problem is no longer limited to small areas. We conduct statistical analysis with the actual data related to air quality that covers the entire of South Korea using R and Python. Some factors such as SO2, CO, O3, NO2, PM10, precipitation, wind speed, wind direction, vapor pressure, local pressure, sea level pressure, temperature, humidity, and others are used as covariates. The main goal of this paper is to predict air quality index (AQI) spatio-temporal data. The observations of spatio-temporal big datasets like AQI data are correlated both spatially and temporally, and computation of the prediction or forecasting with dependence structure is often infeasible. As such, the likelihood function based on the spatio-temporal model may be complicated and some special modelings are useful for statistically reliable predictions. In this paper, we propose several methods for this big spatio-temporal AQI data. First, random effects with spatio-temporal basis functions model, a classical statistical analysis, is proposed. Next, neural networks model, a deep learning method based on artificial neural networks, is applied. Finally, random forest model, a machine learning method that is closer to computational science, will be introduced. Then we compare the forecasting performance of each other in terms of predictive diagnostics. As a result of the analysis, all three methods predicted the normal level of PM2.5 well, but the performance seems to be poor at the extreme value.