• Title/Summary/Keyword: time series clustering

Search Result 185, Processing Time 0.026 seconds

Hierarchical Regression for Single Image Super Resolution via Clustering and Sparse Representation

  • Qiu, Kang;Yi, Benshun;Li, Weizhong;Huang, Taiqi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2539-2554
    • /
    • 2017
  • Regression-based image super resolution (SR) methods have shown great advantage in time consumption while maintaining similar or improved quality performance compared to other learning-based methods. In this paper, we propose a novel single image SR method based on hierarchical regression to further improve the quality performance. As an improvement to other regression-based methods, we introduce a hierarchical scheme into the process of learning multiple regressors. First, training samples are grouped into different clusters according to their geometry similarity, which generates the structure layer. Then in each cluster, a compact dictionary can be learned by Sparse Coding (SC) method and the training samples can be further grouped by dictionary atoms to form the detail layer. Last, a series of projection matrixes, which anchored to dictionary atoms, can be learned by linear regression. Experiment results show that hierarchical scheme can lead to regression that is more precise. Our method achieves superior high quality results compared with several state-of-the-art methods.

Finding Pseudo Periods over Data Streams based on Multiple Hash Functions (다중 해시함수 기반 데이터 스트림에서의 아이템 의사 주기 탐사 기법)

  • Lee, Hak-Joo;Kim, Jae-Wan;Lee, Won-Suk
    • Journal of Information Technology Services
    • /
    • v.16 no.1
    • /
    • pp.73-82
    • /
    • 2017
  • Recently in-memory data stream processing has been actively applied to various subjects such as query processing, OLAP, data mining, i.e., frequent item sets, association rules, clustering. However, finding regular periodic patterns of events in an infinite data stream gets less attention. Most researches about finding periods use autocorrelation functions to find certain changes in periodic patterns, not period itself. And they usually find periodic patterns in time-series databases, not in data streams. Literally a period means the length or era of time that some phenomenon recur in a certain time interval. However in real applications a data set indeed evolves with tiny differences as time elapses. This kind of a period is called as a pseudo-period. This paper proposes a new scheme called FPMH (Finding Periods using Multiple Hash functions) algorithm to find such a set of pseudo-periods over a data stream based on multiple hash functions. According to the type of pseudo period, this paper categorizes FPMH into three, FPMH-E, FPMH-PC, FPMH-PP. To maximize the performance of the algorithm in the data stream environment and to keep most recent periodic patterns in memory, we applied decay mechanism to FPMH algorithms. FPMH algorithm minimizes the usage of memory as well as processing time with acceptable accuracy.

Structural Design of FCM-based Fuzzy Inference System : A Comparative Study of WLSE and LSE (FCM기반 퍼지추론 시스템의 구조 설계: WLSE 및 LSE의 비교 연구)

  • Park, Wook-Dong;Oh, Sung-Kwun;Kim, Hyun-Ki
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.5
    • /
    • pp.981-989
    • /
    • 2010
  • In this study, we introduce a new architecture of fuzzy inference system. In the fuzzy inference system, we use Fuzzy C-Means clustering algorithm to form the premise part of the rules. The membership functions standing in the premise part of fuzzy rules do not assume any explicit functional forms, but for any input the resulting activation levels of such radial basis functions directly depend upon the distance between data points by means of the Fuzzy C-Means clustering. As the consequent part of fuzzy rules of the fuzzy inference system (being the local model representing input output relation in the corresponding sub-space), four types of polynomial are considered, namely constant, linear, quadratic and modified quadratic. This offers a significant level of design flexibility as each rule could come with a different type of the local model in its consequence. Either the Least Square Estimator (LSE) or the weighted Least Square Estimator (WLSE)-based learning is exploited to estimate the coefficients of the consequent polynomial of fuzzy rules. In fuzzy modeling, complexity and interpretability (or simplicity) as well as accuracy of the obtained model are essential design criteria. The performance of the fuzzy inference system is directly affected by some parameters such as e.g., the fuzzification coefficient used in the FCM, the number of rules(clusters) and the order of polynomial in the consequent part of the rules. Accordingly we can obtain preferred model structure through an adjustment of such parameters of the fuzzy inference system. Moreover the comparative experimental study between WLSE and LSE is analyzed according to the change of the number of clusters(rules) as well as polynomial type. The superiority of the proposed model is illustrated and also demonstrated with the use of Automobile Miles per Gallon(MPG), Boston housing called Machine Learning dataset, and Mackey-glass time series dataset.

An Exploratory Methodology for Longitudinal Data Analysis Using SOM Clustering (자기조직화지도 클러스터링을 이용한 종단자료의 탐색적 분석방법론)

  • Cho, Yeong Bin
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.5
    • /
    • pp.100-106
    • /
    • 2022
  • A longitudinal study refers to a research method based on longitudinal data repeatedly measured on the same object. Most of the longitudinal analysis methods are suitable for prediction or inference, and are often not suitable for use in exploratory study. In this study, an exploratory method to analyze longitudinal data is presented, which is to find the longitudinal trajectory after determining the best number of clusters by clustering longitudinal data using self-organizing map technique. The proposed methodology was applied to the longitudinal data of the Employment Information Service, and a total of 2,610 samples were analyzed. As a result of applying the methodology to the actual data applied, time-series clustering results were obtained for each panel. This indicates that it is more effective to cluster longitudinal data in advance and perform multilevel longitudinal analysis.

A ground condition prediction ahead of tunnel face utilizing time series analysis of shield TBM data in soil tunnel (토사터널의 쉴드 TBM 데이터 시계열 분석을 통한 막장 전방 예측 연구)

  • Jung, Jee-Hee;Kim, Byung-Kyu;Chung, Heeyoung;Kim, Hae-Mahn;Lee, In-Mo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.21 no.2
    • /
    • pp.227-242
    • /
    • 2019
  • This paper presents a method to predict ground types ahead of a tunnel face utilizing operational data of the earth pressure-balanced (EPB) shield tunnel boring machine (TBM) when running through soil ground. The time series analysis model which was applicable to predict the mixed ground composed of soils and rocks was modified to be applicable to soil tunnels. Using the modified model, the feasibility on the choice of the soil conditioning materials dependent upon soil types was studied. To do this, a self-organizing map (SOM) clustering was performed. Firstly, it was confirmed that the ground types should be classified based on the percentage of 35% passing through the #200 sieve. Then, the possibility of predicting the ground types by employing the modified model, in which the TBM operational data were analyzed, was studied. The efficacy of the modified model is demonstrated by its 98% accuracy in predicting ground types ten rings ahead of the tunnel face. Especially, the average prediction accuracy was approximately 93% in areas where ground type variations occur.

Machine Learning Approach for Pattern Analysis of Energy Consumption in Factory (머신러닝 기법을 활용한 공장 에너지 사용량 데이터 분석)

  • Sung, Jong Hoon;Cho, Yeong Sik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.4
    • /
    • pp.87-92
    • /
    • 2019
  • This paper describes the pattern analysis for data of the factory energy consumption by using machine learning method. While usual statistical methods or approaches require specific equations to represent the physical characteristics of the plant, machine learning based approach uses historical data and calculate the result effectively. Although rule-based approach calculates energy usage with the physical equations, it is hard to identify the exact equations that represent the factory's characteristics and hidden variables affecting the results. Whereas the machine learning approach is relatively useful to find the relations quickly between the data. The factory has several components directly affecting to the electricity consumption which are machines, light, computers and indoor systems like HVAC (heating, ventilation and air conditioning). The energy loads from those components are generated in real-time and these data can be shown in time-series. The various sensors were installed in the factory to construct the database by collecting the energy usage data from the components. After preliminary statistical analysis for data mining, time-series clustering techniques are applied to extract the energy load pattern. This research can attributes to develop Factory Energy Management System (FEMS).

Analysis of Temporal and Spatial Distribution of Traffic Accidents in Jinju (진주시 교통사고의 시계열적 공간분포특성 분석)

  • Sung, Byeong Jun;Bae, Gyu Han;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.2
    • /
    • pp.3-9
    • /
    • 2015
  • Since changes in land use in urban space cause traffic volume and it is closely related to traffic accidents. Therefore, an analysis on the causes of traffic accidents is judged to be an essential factor to establish the measure to reduce traffic accidents. In this regard, the analysis was conducted on the clustering by using the nearest neighbor indexes with regard to the occurrence frequencies of commercial and residential zone based on traffic accident data of the past five years (2009-2013) with the target of local small-medium sized city, Jinju-si. The analysis results, obtained in this study, are as follows: the occurrence frequency of traffic accidents was the highest in spring and the lowest in winter respectively. The clustering of traffic accident occurrence at nighttime was stronger than at daytime. In addition, terms of the analysis on the clustering of traffic accident according to land use, changes according to the seasons was not significant in commercial areas, while clustering density in winter tended to become significantly lower in residential areas. The analysis results of traffic accident types showed that the side-right angle collision of cars was the highest in frequency occurrence, and widespread in both commercial areas and residential areas. These results can provide us with important information to identify the occurrence pattern of traffic accidents in the structure of urban space, and it is expected that they will be appropriately utilized to establish measures to reduce traffic accidents.

An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams (적응적 격자기반 다차원 데이터 스트림 클러스터링 방법)

  • Park, Nam-Hun;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.7
    • /
    • pp.733-742
    • /
    • 2007
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, memory usage for data stream analysis should be confined finitely although new data elements are continuously generated in a data stream. To satisfy this requirement, data stream processing sacrifices the correctness of its analysis result by allowing some errors. The old distribution statistics are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. This paper proposes a grid based clustering algorithm for a data stream. Given a set of initial grid cells, the dense range of a grid cell is recursively partitioned into a smaller cell based on the distribution statistics of data elements by a top down manner until the smallest cell, called a unit cell, is identified. Since only the distribution statistics of data elements are maintained by dynamically partitioned grid cells, the clusters of a data stream can be effectively found without maintaining the data elements physically. Furthermore, the memory usage of the proposed algorithm is adjusted adaptively to the size of confined memory space by flexibly resizing the size of a unit cell. As a result, the confined memory space can be fully utilized to generate the result of clustering as accurately as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics

Spatial pattern and temporal mode analysis of microarray time-series data by independent component analysis (독립성분분석에 의한 유전자 발현 시계열 데이터의 공간적 패턴과 시간적 모드 분석)

  • Sookjeong, Kim;Seungjin, Choi
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.250-252
    • /
    • 2004
  • In this paper we apply several variations of independent component analysis( ICA) methods, such as spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA), to yeast cell cycle datasets, and compare their performance in finding components that result in gene clusters coherent with annotations and in extract ins meaningful temporal modes. It turns out that the results of tICA are superior to those of PCA, sICA, and stICA in terms of gene clustering and the temporal modes extracted by stICA highlights particular cellular processes.

  • PDF

R programming: Language and Environment for Statistical Computing and Data Visualization (R 프로그래밍: 통계 계산과 데이터 시각화를 위한 환경)

  • Lee, D.H.;Ren, Ye
    • Electronics and Telecommunications Trends
    • /
    • v.28 no.1
    • /
    • pp.42-51
    • /
    • 2013
  • The R language is an open source programming language and a software environment for statistical computing and data visualization. The R language is widely used among a lot of statisticians and data scientists to develop statistical software and data analysis. The R language provides a variety of statistical and graphical techniques, including basic descriptive statistics, linear or nonlinear modeling, conventional or advanced statistical tests, time series analysis, clustering, simulation, and others. In this paper, we first introduce the R language and investigate its features as a data analytics tool. As results, we may explore the application possibility of the R language in the field of data analytics.

  • PDF