• Title/Summary/Keyword: 시계열 데이터 분석

Search Result 731, Processing Time 0.027 seconds

A Study of Search Methodology for Efficient Clustering (효율적 군집화를 위한 탐색 방법 연구)

  • Jeon, Jin-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.571-573
    • /
    • 2010
  • Most real world system such as world economy, management, medical and engineering applications contain a series of complex phenomena. One of common methods to understand these system is to build a model and analyze the behavior of the system. As a first step, Determining the best clusters on data. As a second step, Determining the model of the cluster. In this paper, we investigated heuristic search methods for efficient clustering.

  • PDF

Development of Web Contents for Statistical Analysis Using Statistical Package and Active Server Page (통계패키지와 Active Server Page를 이용한 통계 분석 웹 컨텐츠 개발)

  • Kang, Tae-Gu;Lee, Jae-Kwan;Kim, Mi-Ah;Park, Chan-Keun;Heo, Tae-Young
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.1
    • /
    • pp.109-114
    • /
    • 2010
  • In this paper, we developed the web content of statistical analysis using statistical package and Active Server Page (ASP). A statistical package is very difficult to learn and use for non-statisticians, however, non-statisticians want to do analyze the data without learning statistical packages such as SAS, S-plus, and R. Therefore, we developed the web based statistical analysis contents using S-plus which is the popular statistical package and ASP. In real application, we developed the web content for various statistical analyses such as exploratory data analysis, analysis of variance, and time series on the web using water quality data. The developed statistical analysis web content is very useful for non-statisticians such as public service person and researcher. Consequently, combining a web based contents with a statistical package, the users can access the site quickly and analyze data easily.

Estimation of Missing Records in Daily Climate Data over the Korean Peninsula (한반도의 과거 기후 데이터 구축을 위한 누락된 기록 추정)

  • Noh, Gyu-Ho;Ahn, Kuk-Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.135-135
    • /
    • 2020
  • 우리나라의 기후 자료는 일반적으로 기상청에서 발표하는 종관기상관측(ASOS)과 방재기상관측(AWS), 그리고 북한이 세계기상기구(WMO, World Meteorogical Organization)의 기상통신망(GTS)을 통해 보낸 북한기상관측(NKO)을 사용 할 수 있다. 그러나 이 중 40년 이상의 완전한 관측 자료를 얻을 수 있는 건 ASOS가 유일하지만 공간적인 표현에 한계를 갖고 있다. AWS는 관측소가 많다는 장점이 있지만 관측 기간이 길지 않고 이용 가능한 기간에도 관측이 연속적이지 못한 경우가 많다. NKO는 비록 27개의 관측소가 있지만 많은 데이터가 누락되어 일별 기후자료의 사용에 한계를 갖고 있다. 이러한 미관측 기간이나 관측 자료의 누락은 연속적인 시계열 자료분석을 기반으로 하는 수자원 모델링에 있어서 문제를 야기한다. 본 연구는 1973년부터 2019년까지 47년의 신뢰도 높은 한반도 일일 기후 자료를 구축하기 위해 다양한 방법론을 비교하였다. 추정에 사용한 방법은 총 7개로 EM algorithm for probabilistic principal components (PPCA-EM), Inverse distance weight method (IDWM), Nearest neighbor method (NNM), Multivariate normal copulas (Copula), Elastic net model (Elastic), Ordinary kriging (OK), Regularized principal components with EM algorithm (RPCA-EM)를 살펴보았다. 다양한 형태의 결측치를 가정하여 그 결과값을 비교하였고 이는 Root mean squared error(RMSE), Kling-Gupta efficiency(KGE), Nash-Sutcliffe efficiency(NSE)를 통해 평가하였다. 최종 선택된 방법론을 통하여 한반도 전역을 그리드 기반의 강수 및 최저온도/최고온도의 일별자료로 생성하였다.

  • PDF

Functional clustering for electricity demand data: A case study (시간단위 전력수요자료의 함수적 군집분석: 사례연구)

  • Yoon, Sanghoo;Choi, Youngjean
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.4
    • /
    • pp.885-894
    • /
    • 2015
  • It is necessary to forecast the electricity demand for reliable and effective operation of the power system. In this study, we try to categorize a functional data, the mean curve in accordance with the time of daily power demand pattern. The data were collected between January 1, 2009 and December 31, 2011. And it were converted to time series data consisting of seasonal components and error component through log transformation and removing trend. Functional clustering by Ma et al. (2006) are applied and parameters are estimated using EM algorithm and generalized cross validation. The number of clusters is determined by classifying holidays or weekdays. Monday, weekday (Tuesday to Friday), Saturday, Sunday or holiday and season are described the mean curve of daily power demand pattern.

Topic Modeling-Based Domestic and Foreign Public Data Research Trends Comparative Analysis (토픽 모델링 기반의 국내외 공공데이터 연구 동향 비교 분석)

  • Park, Dae-Yeong;Kim, Deok-Hyeon;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.2
    • /
    • pp.1-12
    • /
    • 2021
  • With the recent 4th Industrial Revolution, the growth and value of big data are continuously increasing, and the government is also actively making efforts to open and utilize public data. However, the situation still does not reach the level of demand for public data use by citizens, At this point, it is necessary to identify research trends in the public data field and seek directions for development. In this study, in order to understand the research trends related to public data, the analysis was performed using topic modeling, which is mainly used in text mining techniques. To this end, we collected papers containing keywords of 'Public data' among domestic and foreign research papers (1,437 domestically, 9,607 overseas) and performed topic modeling based on the LDA algorithm, and compared domestic and foreign public data research trends. After analysis, policy implications were presented. Looking at the time series by topic, research in the fields of 'personal information protection', 'public data management', and 'urban environment' has increased in Korea. Overseas, it was confirmed that research in the fields of 'urban policy', 'cell biology', 'deep learning', and 'cloud·security' is active.

The Effect of Seasonal Input on Predicting Groundwater Level Using Artificial Neural Network (인공신경망을 이용한 지하수위 예측과 계절효과 반영을 위한 입력치의 영향)

  • Kim, Incheol;Lee, Junhwan
    • Ecology and Resilient Infrastructure
    • /
    • v.5 no.3
    • /
    • pp.125-133
    • /
    • 2018
  • Artificial neural network (ANN) is a powerful model to predict time series data and have been frequently adopted to predict groundwater level (GWL). Many researchers have also tried to improve the performance of ANN prediction for GWL in many ways. Dummies are usually used in ANN as input to reflect the seasonal effect on predicted results, which is necessary for improving the predicting performance of ANN. In this study, the effect of Dummy on the prediction performance was analyzed qualitatively and quantitatively using several graphical methods, correlation coefficient and performance index. It was observed that results predicted using dummies for ANN model indicated worse performance than those without dummies.

A Study on Price Discovery Process for International Crude Oil using Error Correction Model and Graph Theory (오차수정모형과 그래프 이론을 이용한 국제유가의 동시 및 단기 가격발견과정에 관한 연구)

  • Park, Hojeong;Yun, Won-Cheol
    • Environmental and Resource Economics Review
    • /
    • v.15 no.3
    • /
    • pp.479-504
    • /
    • 2006
  • This paper analyzes a price discovery process for international crude oils including the WTI, Brent and Dubai. Error correction model is employed considering non-stationarity property of crude oil price and the contemporaneous causality is constructed by graph theory to analyze the short-term causality. The empirical analysis for January 4., 1999 to July 15., 2005 reveals that the Brent price interconnects between the WTI price and the Dubai price. This result implies the substantial influence of the Brent price as a marker oil.

  • PDF

Prediction of electricity consumption in A hotel using ensemble learning with temperature (앙상블 학습과 온도 변수를 이용한 A 호텔의 전력소모량 예측)

  • Kim, Jaehwi;Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.319-330
    • /
    • 2019
  • Forecasting the electricity consumption through analyzing the past electricity consumption a advantageous for energy planing and policy. Machine learning is widely used as a method to predict electricity consumption. Among them, ensemble learning is a method to avoid the overfitting of models and reduce variance to improve prediction accuracy. However, ensemble learning applied to daily data shows the disadvantages of predicting a center value without showing a peak due to the characteristics of ensemble learning. In this study, we overcome the shortcomings of ensemble learning by considering the temperature trend. We compare nine models and propose a model using random forest with the linear trend of temperature.

Synthetic data generation by probabilistic PCA (주성분 분석을 활용한 재현자료 생성)

  • Min-Jeong Park
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.4
    • /
    • pp.279-294
    • /
    • 2023
  • It is well known to generate synthetic data sets by the sequential regression multiple imputation (SRMI) method. The R-package synthpop are widely used for generating synthetic data by the SRMI approaches. In this paper, I suggest generating synthetic data based on the probabilistic principal component analysis (PPCA) method. Two simple data sets are used for a simulation study to compare the SRMI and PPCA approaches. Simulation results demonstrate that pairwise coefficients in synthetic data sets by PPCA can be closer to original ones than by SRMI. Furthermore, for the various data types that PPCA applications are well established, such as time series data, the PPCA approach can be extended to generate synthetic data sets.

Investigation of Research Trends in the D(Data)·N(Network)·A(A.I) Field Using the Dynamic Topic Model (다이나믹 토픽 모델을 활용한 D(Data)·N(Network)·A(A.I) 중심의 연구동향 분석)

  • Wo, Chang Woo;Lee, Jong Yun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.21-29
    • /
    • 2020
  • The Topic Modeling research, the methodology for deduction keyword within literature, has become active with the explosion of data from digital society transition. The research objective is to investigate research trends in D.N.A.(Data, Network, Artificial Intelligence) field using DTM(Dynamic Topic Model). DTM model was applied to the 1,519 of research projects with SW·A.I technology classifications among ICT(Information and Communication Technology) field projects between 6 years(2015~2020). As a result, technology keyword for D.N.A. field; Big data, Cloud, Artificial Intelligence, extended keyword; Unstructured, Edge Computing, Learning, Recognition was appeared every year, and accordingly that the above technology is being researched inclusively from other projects can be inferred. Finally, it is expected that the result from this paper become useful for future policy·R&D planning and corporation's technology·marketing strategy.