• 제목/요약/키워드: Time-Series data

검색결과 3,627건 처리시간 0.045초

효율적인 문헌 분류를 위한 시계열 기반 데이터 집합 선정 기법 (Time-Series based Dataset Selection Method for Effective Text Classification)

  • 채영훈;정도헌
    • 한국콘텐츠학회논문지
    • /
    • 제17권1호
    • /
    • pp.39-49
    • /
    • 2017
  • 인터넷 기술이 발전함에 따라 온라인상의 데이터는 급격하게 증가하고 있고, 증가하는 데이터에 대해 점진적인 기계학습 기법을 통해 효율적으로 학습하기 위한 연구가 진행되고 있다. 온라인상의 문서는 대부분 게시일, 출판일과 같은 시계열적 정보를 포함하고 있고, 이를 분류에 반영한다면 효율적인 분류가 가능할 것이다. 본 연구에서는 웹 문서상에서 나타나는 어휘의 시계열적 변화를 분석하였고, 분석한 시계열 정보를 기반으로 데이터 집합을 분할하여 효율적인 분류 학습 기법을 제안한다. 실험 및 검증을 위해 온라인상의 뉴스 기사 100만 건을 시계열 정보를 포함하여 수집하였다. 수집된 데이터를 바탕으로 데이터 집합을 분할하여 $Na{\ddot{i}}ve$ Bayes 및 SVM 분류기를 사용하여 실험을 진행하였고, 각 모델에서 전체 데이터 집합 학습 대비 최대 2.02% 포인트, 2.32% 포인트의 성능 향상을 확인하였다. 본 연구를 통해 시계열적 어휘의 변화를 분류에 반영하여 분류의 성능을 향상시킬 수 있음을 확인하였다.

패널 중선형 시계열 모형의 동질성 검정 (Test of Homogeneity for Panel Bilinear Time Series Model)

  • 이신형;김선우;이성덕
    • 응용통계연구
    • /
    • 제26권3호
    • /
    • pp.521-529
    • /
    • 2013
  • 패널 시계열자료 분석에서 모수축약의 원칙에 충실하기 위해서 동질성 검정을 수행한다. 본 논문에서는 독립적인 중선형 시계열 패널 자료의 동질성 검정을 수행하기 위하여 먼저 중선형 시계열 모형의 정상성 조건을 구하고 최우추정량과 동질성 검정통계량과 극한 분포를 이끌어내며, 실증분석으로 우리나라 8도의 Mumps 패널자료를 이용해 8개 지역의 발병 추이에 대한 동질성 검정을 수행한다.

제조업의 주기성 시계열분석에서 힐버트 황 변환의 효용성 평가 (Evaluating Efficacy of Hilbert-Huang Transform in Analyzing Manufacturing Time Series Data with Periodic Components)

  • 이세재;서정렬
    • 산업경영시스템학회지
    • /
    • 제35권2호
    • /
    • pp.106-112
    • /
    • 2012
  • Real-life time series characteristic data has significant amount of non-stationary components, especially periodic components in nature. Extracting such components has required many ad-hoc techniques with external parameters set by users in case-by-case manner. In our study, we evaluate whether Hilbert-Huang Transform, a new tool of time-series analysis can be used for effective analysis of such data. It is divided into two points : 1) how effective it is in finding periodic components, 2) whether we can use its results directly in detecting values outside control limits, for which a traditional method such as ARIMA had been used. We use glass furnace temperature data to illustrate the method.

힐버트-황 변환을 이용한 시계열 데이터 관리한계 : 중첩주기의 사례 (Control Limits of Time Series Data using Hilbert-Huang Transform : Dealing with Nested Periods)

  • 서정열;이세재
    • 산업경영시스템학회지
    • /
    • 제37권4호
    • /
    • pp.35-41
    • /
    • 2014
  • Real-life time series characteristic data has significant amount of non-stationary components, especially periodic components in nature. Extracting such components has required many ad-hoc techniques with external parameters set by users in a case-by-case manner. In this study, we used Empirical Mode Decomposition Method from Hilbert-Huang Transform to extract them in a systematic manner with least number of ad-hoc parameters set by users. After the periodic components are removed, the remaining time-series data can be analyzed with traditional methods such as ARIMA model. Then we suggest a different way of setting control chart limits for characteristic data with periodic components in addition to ARIMA components.

Comparison of time series clustering methods and application to power consumption pattern clustering

  • Kim, Jaehwi;Kim, Jaehee
    • Communications for Statistical Applications and Methods
    • /
    • 제27권6호
    • /
    • pp.589-602
    • /
    • 2020
  • The development of smart grids has enabled the easy collection of a large amount of power data. There are some common patterns that make it useful to cluster power consumption patterns when analyzing s power big data. In this paper, clustering analysis is based on distance functions for time series and clustering algorithms to discover patterns for power consumption data. In clustering, we use 10 distance measures to find the clusters that consider the characteristics of time series data. A simulation study is done to compare the distance measures for clustering. Cluster validity measures are also calculated and compared such as error rate, similarity index, Dunn index and silhouette values. Real power consumption data are used for clustering, with five distance measures whose performances are better than others in the simulation.

Reconstruction of gusty wind speed time series from autonomous data logger records

  • Amezcua, Javier;Munoz, Raul;Probst, Oliver
    • Wind and Structures
    • /
    • 제14권4호
    • /
    • pp.337-357
    • /
    • 2011
  • The collection of wind speed time series by means of digital data loggers occurs in many domains, including civil engineering, environmental sciences and wind turbine technology. Since averaging intervals are often significantly larger than typical system time scales, the information lost has to be recovered in order to reconstruct the true dynamics of the system. In the present work we present a simple algorithm capable of generating a real-time wind speed time series from data logger records containing the average, maximum, and minimum values of the wind speed in a fixed interval, as well as the standard deviation. The signal is generated from a generalized random Fourier series. The spectrum can be matched to any desired theoretical or measured frequency distribution. Extreme values are specified through a postprocessing step based on the concept of constrained simulation. Applications of the algorithm to 10-min wind speed records logged at a test site at 60 m height above the ground show that the recorded 10-min values can be reproduced by the simulated time series to a high degree of accuracy.

ARIMA 모델을 이용한 수막재배지역 지하수위 시계열 분석 및 미래추세 예측 (Time-series Analysis and Prediction of Future Trends of Groundwater Level in Water Curtain Cultivation Areas Using the ARIMA Model)

  • 백미경;김상민
    • 한국농공학회논문집
    • /
    • 제65권2호
    • /
    • pp.1-11
    • /
    • 2023
  • This study analyzed the impact of greenhouse cultivation area and groundwater level changes due to the water curtain cultivation in the greenhouse complexes. The groundwater observation data in the Miryang study area were used and classified into greenhouse and field cultivation areas to compare the groundwater impact of water curtain cultivation in the greenhouse complex. We identified the characteristics of the groundwater time series data by the terrain of the study area and selected the optimal model through time series analysis. We analyzed the time series data for each terrain's two representative groundwater observation wells. The Seasonal ARIMA model was chosen as the optimal model for riverside well, and for plain and mountain well, the ARIMA model and Seasonal ARIMA model were selected as the optimal model. A suitable prediction model is not limited to one model due to a change in a groundwater level fluctuation pattern caused by a surrounding environment change but may change over time. Therefore, it is necessary to periodically check and revise the optimal model rather than continuously applying one selected ARIMA model. Groundwater forecasting results through time series analysis can be used for sustainable groundwater resource management.

시계열 분석 모형 및 머신 러닝 분석을 이용한 수출 증가율 장기예측 성능 비교 (Comparison of long-term forecasting performance of export growth rate using time series analysis models and machine learning analysis)

  • 남성휘
    • 무역학회지
    • /
    • 제46권6호
    • /
    • pp.191-209
    • /
    • 2021
  • In this paper, various time series analysis models and machine learning models are presented for long-term prediction of export growth rate, and the prediction performance is compared and reviewed by RMSE and MAE. Export growth rate is one of the major economic indicators to evaluate the economic status. And It is also used to predict economic forecast. The export growth rate may have a negative (-) value as well as a positive (+) value. Therefore, Instead of using the ReLU function, which is often used for time series prediction of deep learning models, the PReLU function, which can have a negative (-) value as an output value, was used as the activation function of deep learning models. The time series prediction performance of each model for three types of data was compared and reviewed. The forecast data of long-term prediction of export growth rate was deduced by three forecast methods such as a fixed forecast method, a recursive forecast method and a rolling forecast method. As a result of the forecast, the traditional time series analysis model, ARDL, showed excellent performance, but as the time period of learning data increases, the performance of machine learning models including LSTM was relatively improved.

시계열 자료에서의 특이치 발견 (Outlier detection in time series data)

  • 최정인;엄인옥;조형준
    • 응용통계연구
    • /
    • 제29권5호
    • /
    • pp.907-920
    • /
    • 2016
  • 본 논문의 목표는 분위수 자기회귀모형을 활용하여 시계열 자료에서 특이치를 발견하는 알고리즘을 제안하고, 기존의 방법들과 그 성능을 비교하여 실제 주가 조작 사례에 적용해 보는 것이다. 지금까지의 특이치 발견 연구는 대부분 일반적인 데이터 형태에서만 있어왔기 때문에 시계열 데이터에서의 연구는 미미한 편이다. 또한 모수적인 방법에만 제한되었는데, 모수적 모형은 복잡할 뿐만 아니라 소요되는 분석 시간도 길기 때문에 편리하지 않다. 따라서 본 연구에서는 분위수 자기회귀모형을 활용한 특이치 발견 알고리즘을 새롭게 제시하고, 다양한 경우의 모의실험을 통해 기존 알고리즘과 비교하도록 한다. 특히 시계열 자료에서의 특이치 발견은 주가 조작을 적발하는 데에 유용하게 활용될 수 있다. 시간에 따라 관측되던 주가가 갑자기 그 동안의 흐름에서 벗어나 특이치로 발견되었다면 혹시 인위적인 개입으로 조작된 것은 아닌지 의심해 볼 수 있기 때문이다. 따라서 실제 주가 조작 사례에 적용해 봄으로써 얼마나 빠른 시일 내에 주가 조작을 적발해 낼 수 있는지 살펴보았다.

Efficient Compression Algorithm with Limited Resource for Continuous Surveillance

  • Yin, Ling;Liu, Chuanren;Lu, Xinjiang;Chen, Jiafeng;Liu, Caixing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권11호
    • /
    • pp.5476-5496
    • /
    • 2016
  • Energy efficiency of resource-constrained wireless sensor networks is critical in applications such as real-time monitoring/surveillance. To improve the energy efficiency and reduce the energy consumption, the time series data can be compressed before transmission. However, most of the compression algorithms for time series data were developed only for single variate scenarios, while in practice there are often multiple sensor nodes in one application and the collected data is actually multivariate time series. In this paper, we propose to compress the time series data by the Lasso (least absolute shrinkage and selection operator) approximation. We show that, our approach can be naturally extended for compressing the multivariate time series data. Our extension is novel since it constructs an optimal projection of the original multivariates where the best energy efficiency can be realized. The two algorithms are named by ULasso (Univariate Lasso) and MLasso (Multivariate Lasso), for which we also provide practical guidance for parameter selection. Finally, empirically evaluation is implemented with several publicly available real-world data sets from different application domains. We quantify the algorithm performance by measuring the approximation error, compression ratio, and computation complexity. The results show that ULasso and MLasso are superior to or at least equivalent to compression performance of LTC and PLAMlis. Particularly, MLasso can significantly reduce the smooth multivariate time series data, without breaking the major trends and important changes of the sensor network system.