• Title/Summary/Keyword: Time-Series clustering

Search Result 185, Processing Time 0.024 seconds

Daily Behavior Pattern Extraction using Time-Series Behavioral Data of Dairy Cows and k-Means Clustering (행동 시계열 데이터와 k-평균 군집화를 통한 젖소의 일일 행동패턴 검출)

  • Lee, Seonghun;Park, Gicheol;Park, Jaehwa
    • Journal of Software Assessment and Valuation
    • /
    • v.17 no.1
    • /
    • pp.83-92
    • /
    • 2021
  • There are continuous and tremendous attempts to apply various sensor systems and ICTs into the dairy science for data accumulation and improvement of dairy productivity. However, these only concerns the fields which directly affect to the dairy productivity such as the number of individuals and the milk production amount, while researches on the physiology aspects of dairy cows are not enough which are fundamentally involved in the dairy productivity. This paper proposes the basic approach for extraction of daily behavior pattern from hourly behavioral data of dairy cows to identify the health status and stress. Total four clusters were grouped by k-means clustering and the reasonability was proved by visualization of the data in each groups and the representatives of each groups. We hope that provided results should lead to the further researches on catching abnormalities and disease signs of dairy cows.

Trend Analysis of Data Mining Research Using Topic Network Analysis

  • Kim, Hyon Hee;Rhee, Hey Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.141-148
    • /
    • 2016
  • In this paper, we propose a topic network analysis approach which integrates topic modeling and social network analysis. We collected 2,039 scientific papers from five top journals in the field of data mining published from 1996 to 2015, and analyzed them with the proposed approach. To identify topic trends, time-series analysis of topic network is performed based on 4 intervals. Our experimental results show centralization of the topic network has the highest score from 1996 to 2000, and decreases for next 5 years and increases again. For last 5 years, centralization of the degree centrality increases, while centralization of the betweenness centrality and closeness centrality decreases again. Also, clustering is identified as the most interrelated topic among other topics. Topics with the highest degree centrality evolves clustering, web applications, clustering and dimensionality reduction according to time. Our approach extracts the interrelationships of topics, which cannot be detected with conventional topic modeling approaches, and provides topical trends of data mining research fields.

An Ensemble Model for Machine Failure Prediction (앙상블 모델 기반의 기계 고장 예측 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.1
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.

A Study On Predicting Stock Prices Of Hallyu Content Companies Using Two-Stage k-Means Clustering (2단계 k-평균 군집화를 활용한 한류컨텐츠 기업 주가 예측 연구)

  • Kim, Jeong-Woo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.169-179
    • /
    • 2021
  • This study shows that the two-stage k-means clustering method can improve prediction performance by predicting the stock price, To this end, this study introduces the two-stage k-means clustering algorithm and tests the prediction performance through comparison with various machine learning techniques. It selects the cluster close to the prediction target obtained from the k-means clustering, and reapplies the k-means clustering method to the cluster to search for a cluster closer to the actual value. As a result, the predicted value of this method is shown to be closer to the actual stock price than the predicted values of other machine learning techniques. Furthermore, it shows a relatively stable predicted value despite the use of a relatively small cluster. Accordingly, this method can simultaneously improve the accuracy and stability of prediction, and it can be considered as the new clustering method useful for small data. In the future, developing the two-stage k-means clustering is required for the large-scale data application.

Sensor clustering technique for practical structural monitoring and maintenance

  • Celik, Ozan;Terrell, Thomas;Gul, Mustafa;Catbas, F. Necati
    • Structural Monitoring and Maintenance
    • /
    • v.5 no.2
    • /
    • pp.273-295
    • /
    • 2018
  • In this study, an investigation of a damage detection methodology for global condition assessment is presented. A particular emphasis is put on the utilization of wireless sensors for more practical, less time consuming, less expensive and safer monitoring and eventually maintenance purposes. Wireless sensors are deployed with a sensor roving technique to maintain a dense sensor field yet requiring fewer sensors. The time series analysis method called ARX models (Auto-Regressive models with eXogeneous input) for different sensor clusters is implemented for the exploration of artificially induced damage and their locations. The performance of the technique is verified by making use of the data sets acquired from a 4-span bridge-type steel structure in a controlled laboratory environment. In that, the free response vibration data of the structure for a specific sensor cluster is measured by both wired and wireless sensors and the acceleration output of each sensor is used as an input to ARX model to estimate the response of the reference channel of that cluster. Using both data types, the ARX based time series analysis method is shown to be effective for damage detection and localization along with the interpretations and conclusions.

An Empirical Study for the Existence of Long-term Memory Properties and Influential Factors in Financial Time Series (주식가격변화의 장기기억속성 존재 및 영향요인에 대한 실증연구)

  • Eom, Cheol-Jun;Oh, Gab-Jin;Kim, Seung-Hwan;Kim, Tae-Hyuk
    • The Korean Journal of Financial Management
    • /
    • v.24 no.3
    • /
    • pp.63-89
    • /
    • 2007
  • This study aims at empirically verifying whether long memory properties exist in returns and volatility of the financial time series and then, empirically observing influential factors of long-memory properties. The presence of long memory properties in the financial time series is examined with the Hurst exponent. The Hurst exponent is measured by DFA(detrended fluctuation analysis). The empirical results are summarized as follows. First, the presence of significant long memory properties is not identified in return time series. But, in volatility time series, as the Hurst exponent has the high value on average, a strong presence of long memory properties is observed. Then, according to the results empirically confirming influential factors of long memory properties, as the Hurst exponent measured with volatility of residual returns filtered by GARCH(1, 1) model reflecting properties of volatility clustering has the level of $H{\approx}0.5$ on average, long memory properties presented in the data before filtering are no longer observed. That is, we positively find out that the observed long memory properties are considerably due to volatility clustering effect.

  • PDF

Design of Fuzzy Prediction System based on Dual Tuning using Enhanced Genetic Algorithms (강화된 유전알고리즘을 이용한 이중 동조 기반 퍼지 예측시스템 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.1
    • /
    • pp.184-191
    • /
    • 2010
  • Many researchers have been considering genetic algorithms to system optimization problems. Especially, real-coded genetic algorithms are very effective techniques because they are simpler in coding procedures than binary-coded genetic algorithms and can reduce extra works that increase the length of chromosome for wide search space. Thus, this paper presents a fuzzy system design technique to improve the performance of the fuzzy system. The proposed system consists of two procedures. The primary tuning procedure coarsely tunes fuzzy sets of the system using the k-means clustering algorithm of which the structure is very simple, and then the secondary tuning procedure finely tunes the fuzzy sets using enhanced real-coded genetic algorithms based on the primary procedure. In addition, this paper constructs multiple fuzzy systems using a data preprocessing procedure which is contrived for reflecting various characteristics of nonlinear data. Finally, the proposed fuzzy system is applied to the field of time series prediction and the effectiveness of the proposed techniques are verified by simulations of typical time series examples.

Design of Multiple Model Fuzzy Predictors using Data Preprocessing and its Application (데이터 전처리를 이용한 다중 모델 퍼지 예측기의 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.1
    • /
    • pp.173-180
    • /
    • 2009
  • It is difficult to predict non-stationary or chaotic time series which includes the drift and/or the non-linearity as well as uncertainty. To solve it, we propose an effective prediction method which adopts data preprocessing and multiple model TS fuzzy predictors combined with model selection mechanism. In data preprocessing procedure, the candidates of the optimal difference interval are determined based on the correlation analysis, and corresponding difference data sets are generated in order to use them as predictor input instead of the original ones because the difference data can stabilize the statistical characteristics of those time series and better reveals their implicit properties. Then, TS fuzzy predictors are constructed for multiple model bank, where k-means clustering algorithm is used for fuzzy partition of input space, and the least squares method is applied to parameter identification of fuzzy rules. Among the predictors in the model bank, the one which best minimizes the performance index is selected, and it is used for prediction thereafter. Finally, the error compensation procedure based on correlation analysis is added to improve the prediction accuracy. Some computer simulations are performed to verify the effectiveness of the proposed method.

Time Series Patterns and Clustering of Rotifer Community in Relation with Topographical Characteristics in Lentic Ecosystems (정수생태계의 지형적인 요인 변화와 윤충류 출현 종 수 및 개체군 밀도 변동에 대한 연구)

  • Oh, Hye-Ji;Heo, Yu-Ji;Chang, Kwang-Hyeon;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.54 no.4
    • /
    • pp.390-397
    • /
    • 2021
  • The time series data of rotifer community focusing on the species number and total density were collected from 29 reservoirs located at Jeonnam Province from 2008 to 2016 quarterly. The reservoirs had similar weather condition during the study period, but their sizes and water qualities were different. To analyze the temporal dynamics of rotifer community, the medians, ranges, outliers and coefficient of variation (CV) value of rotifer species number and abundance were compared. For the temporal trend analysis, time series of each reservoir data were compared and clustered using the dynamic time warping function of the R package "dtwclust". Small-sized reservoirs showed higher variability in rotifer abundance with more frequent outliers than large-sized reservoirs. On the other hand, apparent pattern was not observed for the rotifer species number. For the temporal pattern of rotifer density, COD, phytoplankton abundance fluctuation, and cladoceran abundance fluctuation have been suggested as potential factor affecting the rotifer abundance dynamics.