• Title/Summary/Keyword: 시계열 데이터 마이닝

Search Result 70, Processing Time 0.028 seconds

Effectiveness Evaluations of Subsequence Matching Methods Using KOSPI Data (한국 주식 데이터를 이용한 서브시퀀스 매칭 방법의 효과성 평가)

  • Yoo Seung Keun;Lee Sang Ho
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.355-364
    • /
    • 2005
  • Previous researches on subsequence matching have been focused on how to make indexes in order to speed up the matching time, and do not take into account the effectiveness issues of subsequence matching methods. This paper considers the effectiveness of subsequence matching methods and proposes two metrics for effectiveness evaluations of subsequence matching algorithms. We have applied the proposed metrics to Korean stock data and five known matching algorithms. The analysis on the empirical data shows that two methods (i.e., the method supporting normalization, and the method supporting scaling and shifting) outperform the others in terms of the effectiveness of subsequence matching.

Constructing Gene Regulatory Networks using Frequent Gene Expression Pattern and Chain Rules (빈발 유전자 발현 패턴과 연쇄 규칙을 이용한 유전자 조절 네트워크 구축)

  • Lee, Heon-Gyu;Ryu, Keun-Ho;Joung, Doo-Young
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.9-20
    • /
    • 2007
  • Groups of genes control the functioning of a cell by complex interactions. Such interactions of gene groups are tailed Gene Regulatory Networks(GRNs). Two previous data mining approaches, clustering and classification, have been used to analyze gene expression data. Though these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rules. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and gene expression patterns we detected by applying the FP-growth algorithm. Next, we construct a gene regulatory network from frequent gene patterns using chain rules. Finally, we validate our proposed method through our experimental results, which are consistent with published results.

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열분석을 통한 부상기술 예측)

  • Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.650-652
    • /
    • 2014
  • 국가와 기업의 연구개발투자 및 경영정책 전략 수립에서 미래 부상기술 예측은 매우 중요한 역할을 한다. 기술예측을 위한 다양한 방법들이 사용되고 있으며 특허를 이용한 기술예측 또한 활발히 진행되고 있다. 최근에는 텍스트마이닝을 이용해 특허데이터의 정량적인 분석이 이루어지고 있다. 본 논문에서는 텍스트마이닝과 지수평활법을 이용한 기술예측 방법을 제안한다.

Technology Development Strategy of Piggyback Transportation System Using Topic Modeling Based on LDA Algorithm

  • Jun, Sung-Chan;Han, Seong-Ho;Kim, Sang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.261-270
    • /
    • 2020
  • In this study, we identify promising technologies for Piggyback transportation system by analyzing the relevant patent information. In order for this, we first develop the patent database by extracting relevant technology keywords from the pioneering research papers for the Piggyback flactcar system. We then employed textmining to identify the frequently referred words from the patent database, and using these words, we applied the LDA (Latent Dirichlet Allocation) algorithm in order to identify "topics" that are corresponding to "key" technologies for the Piggyback system. Finally, we employ the ARIMA model to forecast the trends of these "key" technologies for technology forecasting, and identify the promising technologies for the Piggyback system. with keyword search method the patent analysis. The results show that data-driven integrated management system, operation planning system and special cargo (especially fluid and gas) handling/storage technologies are identified to be the "key" promising technolgies for the future of the Piggyback system, and data reception/analysis techniques must be developed in order to improve the system performance. The proposed procedure and analysis method provides useful insights to develop the R&D strategy and the technology roadmap for the Piggyback system.

Identifying Seoul city issues based on topic modeling of news article (토픽 모델링 기반 뉴스기사 분석을 통한 서울시 이슈 도출)

  • Kwon, Min-Ji
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.11-13
    • /
    • 2019
  • 대중들에게 정보를 빠르고 정확하게 제공하는 대표 매체인 뉴스 기사는 일 평균 1만 5천 건 이상이 보도되고 있다. 특정 주제 또는 분야에 대한 전반적인 동향을 파악하고자 대량의 텍스트 데이터를 수집하여 텍스트 마이닝(Text mining)과 머신러닝 등을 적용하는 연구들이 활발하게 수행되고 있다. 본 연구에서는 서울시의 이슈 및 문제를 파악하고자 약 5년간 뉴스 기사를 수집하여 키워드 분석 및 토픽 모델링을 적용하였다. 분석 결과 5년간의 뉴스 기사에서 빈번하게 출현하는 키워드들을 도출하였고 연도별로 도출된 키워드들을 비교분석하였다. 또한 토픽 모델링 적용 결과 뉴스 기사를 구성하는 20개의 주제를 도출하였으며 이를 기반으로 서울시의 주요 이슈들을 파악할 수 있다. 본 연구는 연도별, 분야별 세부 내용 및 시계열 분석, 다른 도시들의 이슈 및 문제를 도출하는데 활용될 것으로 기대된다.

  • PDF

Wind power forecasting based on time series and machine learning models (시계열 모형과 기계학습 모형을 이용한 풍력 발전량 예측 연구)

  • Park, Sujin;Lee, Jin-Young;Kim, Sahm
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.723-734
    • /
    • 2021
  • Wind energy is one of the rapidly developing renewable energies which is being developed and invested in response to climate change. As renewable energy policies and power plant installations are promoted, the supply of wind power in Korea is gradually expanding and attempts to accurately predict demand are expanding. In this paper, the ARIMA and ARIMAX models which are Time series techniques and the SVR, Random Forest and XGBoost models which are machine learning models were compared and analyzed to predict wind power generation in the Jeonnam and Gyeongbuk regions. Mean absolute error (MAE) and mean absolute percentage error (MAPE) were used as indicators to compare the predicted results of the model. After subtracting the hourly raw data from January 1, 2018 to October 24, 2020, the model was trained to predict wind power generation for 168 hours from October 25, 2020 to October 31, 2020. As a result of comparing the predictive power of the models, the Random Forest and XGBoost models showed the best performance in the order of Jeonnam and Gyeongbuk. In future research, we will try not only machine learning models but also forecasting wind power generation based on data mining techniques that have been actively researched recently.

Investigation of Research Trends in the D(Data)·N(Network)·A(A.I) Field Using the Dynamic Topic Model (다이나믹 토픽 모델을 활용한 D(Data)·N(Network)·A(A.I) 중심의 연구동향 분석)

  • Wo, Chang Woo;Lee, Jong Yun
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.9
    • /
    • pp.21-29
    • /
    • 2020
  • The Topic Modeling research, the methodology for deduction keyword within literature, has become active with the explosion of data from digital society transition. The research objective is to investigate research trends in D.N.A.(Data, Network, Artificial Intelligence) field using DTM(Dynamic Topic Model). DTM model was applied to the 1,519 of research projects with SW·A.I technology classifications among ICT(Information and Communication Technology) field projects between 6 years(2015~2020). As a result, technology keyword for D.N.A. field; Big data, Cloud, Artificial Intelligence, extended keyword; Unstructured, Edge Computing, Learning, Recognition was appeared every year, and accordingly that the above technology is being researched inclusively from other projects can be inferred. Finally, it is expected that the result from this paper become useful for future policy·R&D planning and corporation's technology·marketing strategy.

Short-term demand forecasting Using Data Mining Method (데이터마이닝을 이용한 단기부하예측)

  • Choi, Sang-Yule;Kim, Hyoung-Joong
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.21 no.10
    • /
    • pp.126-133
    • /
    • 2007
  • This paper proposes information technology based data mining to forecast short term power demand. A time-series analyses have been applied to power demand forecasting, but this method needs not only heavy computational calculation but also large amount of coefficient data. Therefore, it is hard to analyze data in fast way. To overcome time consuming process, the author take advantage of universally easily available information technology based data-mining technique to analyze patterns of days and special days(holidays, etc.). This technique consists of two steps, one is constructing decision tree, the other is estimating and forecasting power flow using decision tree analysis. To validate the efficiency, the author compares the estimated demand with real demand from the Korea Power Exchange.

The Development of the Short-Term Predict Model for Solar Power Generation (태양광발전 단기예측모델 개발)

  • Kim, Kwang-Deuk
    • Journal of the Korean Solar Energy Society
    • /
    • v.33 no.6
    • /
    • pp.62-69
    • /
    • 2013
  • In this paper, Korea Institute of Energy Research, building integrated renewable energy monitoring system that utilizes solar power generation forecast data forecast model is proposed. Renewable energy integration of real-time monitoring system based on monitoring data were building a database and the database of the weather conditions and to study the correlation structure was tailoring. The weather forecast cloud cover data, generation data, and solar radiation data, a data mining and time series analysis using the method developed models to forecast solar power. The development of solar power in order to forecast model of weather forecast data it is important to secure. To this end, in three hours, including a three-day forecast today Meteorological data were used from the KMA(korea Meteorological Administration) site offers. In order to verify the accuracy of the predicted solar circle for each prediction and the actual environment can be applied to generation and were analyzed.

A Single Index Approach for Subsequence Matching that Supports Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 단일 색인을 사용한 정규화 변환 지원 서브시퀀스 매칭)

  • Moon Yang-Sae;Kim Jin-Ho;Loh Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.13D no.4 s.107
    • /
    • pp.513-524
    • /
    • 2006
  • Normalization transform is very useful for finding the overall trend of the time-series data since it enables finding sequences with similar fluctuation patterns. The previous subsequence matching method with normalization transform, however, would incur index overhead both in storage space and in update maintenance since it should build multiple indexes for supporting arbitrary length of query sequences. To solve this problem, we propose a single index approach for the normalization transformed subsequence matching that supports arbitrary length of query sequences. For the single index approach, we first provide the notion of inclusion-normalization transform by generalizing the original definition of normalization transform. The inclusion-normalization transform normalizes a window by using the mean and the standard deviation of a subsequence that includes the window. Next, we formally prove correctness of the proposed method that uses the inclusion-normalization transform for the normalization transformed subsequence matching. We then propose subsequence matching and index building algorithms to implement the proposed method. Experimental results for real stock data show that our method improves performance by up to $2.5{\sim}2.8$ times over the previous method. Our approach has an additional advantage of being generalized to support many sorts of other transforms as well as normalization transform. Therefore, we believe our work will be widely used in many sorts of transform-based subsequence matching methods.