• Title/Summary/Keyword: Time-Series data

Search Result 3,627, Processing Time 0.043 seconds

Time-Series based Dataset Selection Method for Effective Text Classification (효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법)

  • Chae, Yeonghun;Jeong, Do-Heon
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.39-49
    • /
    • 2017
  • As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance.

Test of Homogeneity for Panel Bilinear Time Series Model (패널 중선형 시계열 모형의 동질성 검정)

  • Lee, ShinHyung;Kim, SunWoo;Lee, SungDuck
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.521-529
    • /
    • 2013
  • The acceptance of the test of the homogeneity for panel time series models allows for the pooling of the series to achieve parsimony. In this paper, we introduce a panel bilinear time series model as well as derive the stationary condition and the limiting distribution of the test statistic of the homogeneity test for the model. For the applications study, we use Korea Mumps data from January 2001 to December 2008. Finally, we perform test of homogeneity for the panel data with 8 independent bilinear time series.

Evaluating Efficacy of Hilbert-Huang Transform in Analyzing Manufacturing Time Series Data with Periodic Components (제조업의 주기성 시계열분석에서 힐버트 황 변환의 효용성 평가)

  • Lee, Sae-Jae;Suh, Jung-Yul
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.35 no.2
    • /
    • pp.106-112
    • /
    • 2012
  • Real-life time series characteristic data has significant amount of non-stationary components, especially periodic components in nature. Extracting such components has required many ad-hoc techniques with external parameters set by users in case-by-case manner. In our study, we evaluate whether Hilbert-Huang Transform, a new tool of time-series analysis can be used for effective analysis of such data. It is divided into two points : 1) how effective it is in finding periodic components, 2) whether we can use its results directly in detecting values outside control limits, for which a traditional method such as ARIMA had been used. We use glass furnace temperature data to illustrate the method.

Control Limits of Time Series Data using Hilbert-Huang Transform : Dealing with Nested Periods (힐버트-황 변환을 이용한 시계열 데이터 관리한계 : 중첩주기의 사례)

  • Suh, Jung-Yul;Lee, Sae Jae
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.37 no.4
    • /
    • pp.35-41
    • /
    • 2014
  • Real-life time series characteristic data has significant amount of non-stationary components, especially periodic components in nature. Extracting such components has required many ad-hoc techniques with external parameters set by users in a case-by-case manner. In this study, we used Empirical Mode Decomposition Method from Hilbert-Huang Transform to extract them in a systematic manner with least number of ad-hoc parameters set by users. After the periodic components are removed, the remaining time-series data can be analyzed with traditional methods such as ARIMA model. Then we suggest a different way of setting control chart limits for characteristic data with periodic components in addition to ARIMA components.

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.6
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Reconstruction of gusty wind speed time series from autonomous data logger records

  • Amezcua, Javier;Munoz, Raul;Probst, Oliver
    • Wind and Structures
    • /
    • v.14 no.4
    • /
    • pp.337-357
    • /
    • 2011
  • The collection of wind speed time series by means of digital data loggers occurs in many domains, including civil engineering, environmental sciences and wind turbine technology. Since averaging intervals are often significantly larger than typical system time scales, the information lost has to be recovered in order to reconstruct the true dynamics of the system. In the present work we present a simple algorithm capable of generating a real-time wind speed time series from data logger records containing the average, maximum, and minimum values of the wind speed in a fixed interval, as well as the standard deviation. The signal is generated from a generalized random Fourier series. The spectrum can be matched to any desired theoretical or measured frequency distribution. Extreme values are specified through a postprocessing step based on the concept of constrained simulation. Applications of the algorithm to 10-min wind speed records logged at a test site at 60 m height above the ground show that the recorded 10-min values can be reproduced by the simulated time series to a high degree of accuracy.

Time-series Analysis and Prediction of Future Trends of Groundwater Level in Water Curtain Cultivation Areas Using the ARIMA Model (ARIMA 모델을 이용한 수막재배지역 지하수위 시계열 분석 및 미래추세 예측)

  • Baek, Mi Kyung;Kim, Sang Min
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.65 no.2
    • /
    • pp.1-11
    • /
    • 2023
  • This study analyzed the impact of greenhouse cultivation area and groundwater level changes due to the water curtain cultivation in the greenhouse complexes. The groundwater observation data in the Miryang study area were used and classified into greenhouse and field cultivation areas to compare the groundwater impact of water curtain cultivation in the greenhouse complex. We identified the characteristics of the groundwater time series data by the terrain of the study area and selected the optimal model through time series analysis. We analyzed the time series data for each terrain's two representative groundwater observation wells. The Seasonal ARIMA model was chosen as the optimal model for riverside well, and for plain and mountain well, the ARIMA model and Seasonal ARIMA model were selected as the optimal model. A suitable prediction model is not limited to one model due to a change in a groundwater level fluctuation pattern caused by a surrounding environment change but may change over time. Therefore, it is necessary to periodically check and revise the optimal model rather than continuously applying one selected ARIMA model. Groundwater forecasting results through time series analysis can be used for sustainable groundwater resource management.

Comparison of long-term forecasting performance of export growth rate using time series analysis models and machine learning analysis (시계열 분석 모형 및 머신 러닝 분석을 이용한 수출 증가율 장기예측 성능 비교)

  • Seong-Hwi Nam
    • Korea Trade Review
    • /
    • v.46 no.6
    • /
    • pp.191-209
    • /
    • 2021
  • In this paper, various time series analysis models and machine learning models are presented for long-term prediction of export growth rate, and the prediction performance is compared and reviewed by RMSE and MAE. Export growth rate is one of the major economic indicators to evaluate the economic status. And It is also used to predict economic forecast. The export growth rate may have a negative (-) value as well as a positive (+) value. Therefore, Instead of using the ReLU function, which is often used for time series prediction of deep learning models, the PReLU function, which can have a negative (-) value as an output value, was used as the activation function of deep learning models. The time series prediction performance of each model for three types of data was compared and reviewed. The forecast data of long-term prediction of export growth rate was deduced by three forecast methods such as a fixed forecast method, a recursive forecast method and a rolling forecast method. As a result of the forecast, the traditional time series analysis model, ARDL, showed excellent performance, but as the time period of learning data increases, the performance of machine learning models including LSTM was relatively improved.

Outlier detection in time series data (시계열 자료에서의 특이치 발견)

  • Choi, Jeong In;Um, In Ok;Choa, Hyung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.907-920
    • /
    • 2016
  • This study suggests an outlier detection algorithm that uses quantile autoregressive model in time series data, eventually applying it to actual stock manipulation cases by comparing its performance to existing methods. Studies on outlier detection have traditionally been conducted mostly in general data and those in time series data are insufficient. They have also been limited to a parametric model, which is not convenient as it is complicated with an analysis that takes a long time. Thus, we suggest a new algorithm of outlier detection in time series data and through various simulations, compare it to existing algorithms. Especially, the outlier detection algorithm in time series data can be useful in finding stock manipulation. If stock price which had a certain pattern goes out of flow and generates an outlier, it can be due to intentional intervention and manipulation. We examined how fast the model can detect stock manipulations by applying it to actual stock manipulation cases.

Efficient Compression Algorithm with Limited Resource for Continuous Surveillance

  • Yin, Ling;Liu, Chuanren;Lu, Xinjiang;Chen, Jiafeng;Liu, Caixing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5476-5496
    • /
    • 2016
  • Energy efficiency of resource-constrained wireless sensor networks is critical in applications such as real-time monitoring/surveillance. To improve the energy efficiency and reduce the energy consumption, the time series data can be compressed before transmission. However, most of the compression algorithms for time series data were developed only for single variate scenarios, while in practice there are often multiple sensor nodes in one application and the collected data is actually multivariate time series. In this paper, we propose to compress the time series data by the Lasso (least absolute shrinkage and selection operator) approximation. We show that, our approach can be naturally extended for compressing the multivariate time series data. Our extension is novel since it constructs an optimal projection of the original multivariates where the best energy efficiency can be realized. The two algorithms are named by ULasso (Univariate Lasso) and MLasso (Multivariate Lasso), for which we also provide practical guidance for parameter selection. Finally, empirically evaluation is implemented with several publicly available real-world data sets from different application domains. We quantify the algorithm performance by measuring the approximation error, compression ratio, and computation complexity. The results show that ULasso and MLasso are superior to or at least equivalent to compression performance of LTC and PLAMlis. Particularly, MLasso can significantly reduce the smooth multivariate time series data, without breaking the major trends and important changes of the sensor network system.