• Title/Summary/Keyword: 시계열 데이터 마이닝

Search Result 70, Processing Time 0.029 seconds

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

The Method for Extracting Meaningful Patterns Over the Time of Multi Blocks Stream Data (시간의 흐름과 위치 변화에 따른 멀티 블록 스트림 데이터의 의미 있는 패턴 추출 방법)

  • Cho, Kyeong-Rae;Kim, Ki-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.10
    • /
    • pp.377-382
    • /
    • 2014
  • Analysis techniques of the data over time from the mobile environment and IoT, is mainly used for extracting patterns from the collected data, to find meaningful information. However, analytical methods existing, is based to be analyzed in a state where the data collection is complete, to reflect changes in time series data associated with the passage of time is difficult. In this paper, we introduce a method for analyzing multi-block streaming data(AM-MBSD: Analysis Method for Multi-Block Stream Data) for the analysis of the data stream with multiple properties, such as variability of pattern and large capacitive and continuity of data. The multi-block streaming data, define a plurality of blocks of data to be continuously generated, each block, by using the analysis method of the proposed method of analysis to extract meaningful patterns. The patterns that are extracted, generation time, frequency, were collected and consideration of such errors. Through analysis experiments using time series data.

Time Series Analysis of Patent Keywords for Forecasting Emerging Technology (특허 키워드 시계열 분석을 통한 부상 기술 예측)

  • Kim, Jong-Chan;Lee, Joon-Hyuck;Kim, Gab-Jo;Park, Sang-Sung;Jang, Dong-Sick
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.355-360
    • /
    • 2014
  • Forecasting of emerging technology plays important roles in business strategy and R&D investment. There are various ways for technology forecasting including patent analysis. Qualitative analysis methods through experts' evaluations and opinions have been mainly used for technology forecasting using patents. However qualitative methods do not assure objectivity of analysis results and requires high cost and long time. To make up for the weaknesses, we are able to analyze patent data quantitatively and statistically by using text mining technique. In this paper, we suggest a new method of technology forecasting using text mining and ARIMA analysis.

Hybrid Lower-Dimensional Transformation for Similar Sequence Matching (유사 시퀀스 매칭을 위한 하이브리드 저차원 변환)

  • Moon, Yang-Sae;Kim, Jin-Ho
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.31-40
    • /
    • 2008
  • We generally use lower-dimensional transformations to convert high-dimensional sequences into low-dimensional points in similar sequence matching. These traditional transformations, however, show different characteristics in indexing performance by the type of time-series data. It means that the selection of lower-dimensional transformations makes a significant influence on the indexing performance in similar sequence matching. To solve this problem, in this paper we propose a hybrid approach that integrates multiple transformations and uses them in a single multidimensional index. We first propose a new notion of hybrid lower-dimensional transformation that exploits different lower-dimensional transformations for a sequence. We next define the hybrid distance to compute the distance between the transformed sequences. We then formally prove that the hybrid approach performs the similar sequence matching correctly. We also present the index building and the similar sequence matching algorithms that use the hybrid approach. Experimental results for various time-series data sets show that our hybrid approach outperforms the single transformation-based approach. These results indicate that the hybrid approach can be widely used for various time-series data with different characteristics.

Predicting changes of realtime search words using time series analysis and artificial neural networks (시계열분석과 인공신경망을 이용한 실시간검색어 변화 예측)

  • Chong, Min-Yeong
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.333-340
    • /
    • 2017
  • Since realtime search words are centered on the fact that the search growth rate of an issue is rapidly increasing in a short period of time, it is not possible to express an issue that maintains interest for a certain period of time. In order to overcome these limitations, this paper evaluates the daily and hourly persistence of the realtime words that belong to the top 10 for a certain period of time and extracts the search word that are constantly interested. Then, we present the method of using the time series analysis and the neural network to know how the interest of the upper search word changes, and show the result of forecasting the near future change through the actual example derived through the method. It can be seen that forecasting through time series analysis by date and artificial neural networks learning by time shows good results.

Trend of Research and Industry-Related Analysis in Data Quality Using Time Series Network Analysis (시계열 네트워크분석을 통한 데이터품질 연구경향 및 산업연관 분석)

  • Jang, Kyoung-Ae;Lee, Kwang-Suk;Kim, Woo-Je
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.295-306
    • /
    • 2016
  • The purpose of this paper is both to analyze research trends and to predict industrial flows using the meta-data from the previous studies on data quality. There have been many attempts to analyze the research trends in various fields till lately. However, analysis of previous studies on data quality has produced poor results because of its vast scope and data. Therefore, in this paper, we used a text mining, social network analysis for time series network analysis to analyze the vast scope and data of data quality collected from a Web of Science index database of papers published in the international data quality-field journals for 10 years. The analysis results are as follows: Decreases in Mathematical & Computational Biology, Chemistry, Health Care Sciences & Services, Biochemistry & Molecular Biology, Biochemistry & Molecular Biology, and Medical Information Science. Increases, on the contrary, in Environmental Sciences, Water Resources, Geology, and Instruments & Instrumentation. In addition, the social network analysis results show that the subjects which have the high centrality are analysis, algorithm, and network, and also, image, model, sensor, and optimization are increasing subjects in the data quality field. Furthermore, the industrial connection analysis result on data quality shows that there is high correlation between technique, industry, health, infrastructure, and customer service. And it predicted that the Environmental Sciences, Biotechnology, and Health Industry will be continuously developed. This paper will be useful for people, not only who are in the data quality industry field, but also the researchers who analyze research patterns and find out the industry connection on data quality.

A Statistical Analysis of the Causes of Marine Incidents occurring during Berthing (정박 중 발생한 준해양사고 원인에 대한 통계 분석 연구)

  • Roh, Boem-Seok;Kang, Suk-Young
    • Journal of Navigation and Port Research
    • /
    • v.45 no.3
    • /
    • pp.95-101
    • /
    • 2021
  • Marine Incidents based on Heinrich's law are very important in preventing accidents. However, marine Incident data are mainly qualitative and are used to prevent similar accidents through case sharing rather than statistical analysis, which can be confirmed in the marine Incident-related data posted in the Korea Maritime Safety Tribunal. Therefore, this study derived quantitative results by analyzing the causes of marine incidents during berthing using various methods of statistical analysis. To this end, data involving marine incidents from various shipping companies were collected and reclassified for easy analysis. The main keywords were derived via primary analysis using text mining. Only meaningful words were selected via verification by an expert group, and time series and cluster analysis were performed to predict marine incidents that may occur during berthing. Although the role of an expert group was still required during the analysis, it was confirmed that quantitative analysis of marine incidents was feasible, and iused to provide cause and accident prevention information.

Streaming Decision Tree for Continuity Data with Changed Pattern (패턴의 변화를 가지는 연속성 데이터를 위한 스트리밍 의사결정나무)

  • Yoon, Tae-Bok;Sim, Hak-Joon;Lee, Jee-Hyong;Choi, Young-Mee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.94-100
    • /
    • 2010
  • Data Mining is mainly used for pattern extracting and information discovery from collected data. However previous methods is difficult to reflect changing patterns with time. In this paper, we introduce Streaming Decision Tree(SDT) analyzing data with continuity, large scale, and changed patterns. SDT defines continuity data as blocks and extracts rules using a Decision Tree's learning method. The extracted rules are combined considering time of occurrence, frequency, and contradiction. In experiment, we applied time series data and confirmed resonable result.

The Prediction of Cryptocurrency on Using Text Mining and Deep Learning Techniques : Comparison of Korean and USA Market (텍스트 마이닝과 딥러닝을 활용한 암호화폐 가격 예측 : 한국과 미국시장 비교)

  • Won, Jonggwan;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • In this study, we predicted the bitcoin prices of Bithum and Coinbase, a leading exchange in Korea and USA, using ARIMA and Recurrent Neural Networks(RNNs). And we used news articles from each country to suggest a separated RNN model. The suggested model identifies the datasets based on the changing trend of prices in the training data, and then applies time series prediction technique(RNNs) to create multiple models. Then we used daily news data to create a term-based dictionary for each trend change point. We explored trend change points in the test data using the daily news keyword data of testset and term-based dictionary, and apply a matching model to produce prediction results. With this approach we obtained higher accuracy than the model which predicted price by applying just time series prediction technique. This study presents that the limitations of the time series prediction techniques could be overcome by exploring trend change points using news data and various time series prediction techniques with text mining techniques could be applied to improve the performance of the model in the further research.

Building Data Warehouse System for Weblog Analysis (웹로그 분석을 위한 데이터 웨어하우스 시스템 구축)

  • Lee, Joo-Il;Baek, Kyung-Min;Shin, Joo-Hahn;Lee, Won-Suk
    • 한국IT서비스학회:학술대회논문집
    • /
    • 2010.05a
    • /
    • pp.291-295
    • /
    • 2010
  • 최근 급격한 하드웨어 기술과 데이터베이스 시스템의 발전은 우리 주변에서 발생하는 다양한 분야의 데이터를 자동으로 수집하는 것을 가능하게 하였다. 흔히 데이터 스트림(data stream)이라고 언급되는 끊임없이 생산되는 대용량의 데이터를 효율적으로 처리하여 유용한 정보를 얻어내는 기술은 이미 많은 응용 분야에서 광범위하게 연구되고 있다. 인터넷은 이러한 데이터 스트림을 양산해 내는 주요 원천 중의 하나이다. 인터넷 비즈니스의 활성화와 더불어 웹로그 데이터 스트림은 마케팅, 전략 수립, 고객관리 등 여러 부분에 광범위하게 활용되기 시작했으며, 보다 정확하고 효율적인 분석에 대한 요구사항도 점점 늘어나고 있다. 데이터 웨어하우스(Data Warehouse)는 수집된 데이터를 주제 기반으로 통합하여 시계열 형태로 적재하는 저장소서 유용한 분석이나 의사결정에 많이 사용되어 왔다. 데이터웨어하우스는 데이터를 요약하고 통합 및 정제하는 기능을 제공하여 대용량의 데이터 처리에 적합하고 데이터의 품질을 향상시키기 때문에 데이터 마이닝 분야에서 전처리 과정으로도 많이 이용되어 왔다. 본 논문에서는 웹로그 데이터 스트림에 대한 데이터 웨어하우스를 구축하여 보다 고품질의 유용한 정보를 효율적으로 얻어내는 시스템을 제안한다.

  • PDF