• Title/Summary/Keyword: 시계열 데이터 분석

Search Result 731, Processing Time 0.027 seconds

The Performance Bottleneck of Subsequence Matching in Time-Series Databases: Observation, Solution, and Performance Evaluation (시계열 데이타베이스에서 서브시퀀스 매칭의 성능 병목 : 관찰, 해결 방안, 성능 평가)

  • 김상욱
    • Journal of KIISE:Databases
    • /
    • v.30 no.4
    • /
    • pp.381-396
    • /
    • 2003
  • Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper points out the performance bottleneck in subsequence matching, and then proposes an effective method that improves the performance of entire subsequence matching significantly by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post processing steps through preliminary experiments. Based on their results, we show that the post processing step is the main performance bottleneck in subsequence matching, and them claim that its optimization is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple but quite effective method that processes the post processing step in the optimal way. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancy of disk accesses and CPU processing occurred in the post processing step. We formally prove that our method is optimal and also does not incur any false dismissal. We show the effectiveness of our method by extensive experiments. The results show that our method achieves significant speed-up in the post processing step 3.91 to 9.42 times when using a data set of real-world stock sequences and 4.97 to 5.61 times when using data sets of a large volume of synthetic sequences. Also, the results show that our method reduces the weight of the post processing step in entire subsequence matching from about 90% to less than 70%. This implies that our method successfully resolves th performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. The experimental results reveal that it is 3.05 to 5.60 times faster when using a data set of real-world stock sequences and 3.68 to 4.21 times faster when using data sets of a large volume of synthetic sequences compared with the previous one.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

A Study on Micro Clustering Technology for Breeding Pig Behavior Analysis (모돈 행동 특성 분석을 위한 마이크로 클러스터링 기술 연구)

  • Cho, Jinho;Oh, Jong-woo;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.165-165
    • /
    • 2017
  • 모돈은 사육 특성상 제한된 파일롯 공간 안에 장시간 머물기 때문에 과중한 몸무게에 의한 지제 이상, 섭식 등의 불량, 수면상태의 불량 등을 지속적으로 관찰해야 하는 대상이다. 측면에 다수의 초음파 센서를 설치하여 기립의 상태 및 운동 시 몸체 궤적의 특성을 분석하여 종합적으로 모돈의 행동 특성을 정량화 하고자 하였다. 이 과정에서 계측 신호의 값을 대수적으로 비교하는 방식에 한계가 있음을 발견하였고, 이를 해결하고자 10 Hz/Ch 내외의 시계열 상대거리 궤적 신호를 주파수 도메인으로 변경하여 분석을 수행하였다. 일정 주파수에 집중되어 있는 주파수 값의 크기 변화(파워 스펙트럼 밀도)를 기준으로 모돈의 움직임의 정상 상태 유무 판별이 가능하였다. 단, 이러한 분석은 계측 데이터를 일괄 처리 방식으로 분석하는 방법으로 도출이 되었으므로, 계측과 정량 분석을 동시에 수행하기 위한 개선이 필요하였다. 계측 시스템에서 사용한 마이크로 프로세서는 Nucleo-446(STMelectronics, CA, USA)로 180 Mhz의 클럭 속도로 작동하나, 총 100 Hz 내외의 16비트 계측 신호에 대해 추가적으로 FFT 등의 주파수 변환 신호 처리를 수행하기에는 연산 능력이 부족하였다. 한편, 주파수 분석의 주기를 1분 단위로 할 경우 처리해야할 정보의 크기는 $100{\times}60{\times}5{\times}2Byte$ 이므로 1분 내에 해당 연산을 종료할 수 있는 추가의 연산 장치가 필요하였다. 계측과 주파수 도메인 변환 연산을 동시에 수행하기 위하여 1 Ghz의 연산능력을 가진 ARM A9 계열의 초소형 멀티코어 AP인 NanoPi Neo Air(Friendlyarm, Guangzhou, China)을 선정하였다. 4개의 코어를 각각 계측, Median 필터링, Smoothing 연산, FFT 분석에 사용하여 1분 단위, 2분 단위, 5분 단위의 주파수 분석을 동시에 수행하였다. 병렬 연산 라이브러리는 오픈 소스인 MPICH(www.mpich.org)를 이용하였다. 상대적으로 여유있는 자원을 보유하고 코어를 실시간으로 결정하여 다수의 모돈 개체 동시 모니터링을 위한 네트워크 연결 역할을 동시에 수행하도록 하였다. 1주일 내외의 요인 실험 수행 결과, 약 70 Mbyte의 데이터가 축적이 되었으며, 1분 단위, 2분 단위, 5분 단위의 주파수 도메인 변환 후 결과를 동시에 취득할 수 있었다. 일부 주파수 도메인 상의 파워 밀도 값이 모돈의 행동 특성에 분석에 유효한 정보를 제공함을 발견하였다. 모돈사 내 현장 보급이 가능한 초소형 AP와 멀티 코어 기반 병렬 처리 기법을 이용한 현장 진단 시스템 개발 연구를 지속적으로 수행할 것이다.

  • PDF

Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications (텍스트 마이닝을 이용한 매체별 에볼라 주제 분석 - 바이오 분야 연구논문과 뉴스 텍스트 데이터를 이용하여 -)

  • An, Juyoung;Ahn, Kyubin;Song, Min
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.289-307
    • /
    • 2016
  • Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

Analysis of Changes in Urban Spatial Structure for Balanced Urban Development (도시균형발전을 위한 도시공간구조 변화 진단)

  • KIM, Ho-Yong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.2
    • /
    • pp.40-51
    • /
    • 2021
  • The purpose of this study is to diagnose urban spatial structures using spatial modeling techniques for balanced urban development as part of sustainable urban growth management. Since urban spatial structure is an interaction of various activities, it is necessary to interpret the analysis results in conjunction with the analysis of changes in spatial structural elements. In this study, population and transportation were approached for research purposes. Population data were applied to the Getis-Ord Gi* method, a spatial statistical technique, to analyze the concentration-decreasing region of the population. Traffic data analyzed the trend of centrality change by applying commuting traffic O-D data to Social Network Analysis techniques. The analysis showed that urban imbalance was growing, and the centrality of transportation was changing. The results of the analysis of spatial structure elements could be interpreted by linking the results of each factor to each neighborhood unit, predicting changes in urban spatial structure and suggesting directions for sustainable urban growth management.These results could also be used as a decision-making tool for various urban growth management policies introduced to cope with rapid urban development and uncontrollable development in many cities around the world.

A Study on Resolving Barriers to Entry into the Resell Market by Exploring and Predicting Price Increases Using the XGBoost Model (XGBoost 모형을 활용한 가격 상승 요인 탐색 및 예측을 통한 리셀 시장 진입 장벽 해소에 관한 연구)

  • Yoon, HyunSeop;Kang, Juyoung
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.3
    • /
    • pp.155-174
    • /
    • 2021
  • This study noted the emergence of the Resell investment within the fashion market, among emerging investment techniques. Worldwide, the market size is growing rapidly, and currently, there is a craze taking place throughout Korea. Therefore, we would like to use shoe data from StockX, the representative site of Resell, to present basic guidelines to consumers and to break down barriers to entry into the Resell market. Moreover, it showed the current status of the Resell craze, which was based on information from various media outlets, and then presented the current status and research model of the Resell market through prior research. Raw data was collected and analyzed using the XGBoost algorithm and the Prophet model. Analysis showed that the factors that affect the Resell market were identified, and the shoes suitable for the Resell market were also identified. Furthermore, historical data on shoes allowed us to predict future prices, thereby predicting future profitability. Through this study, the market will allow unfamiliar consumers to actively participate in the market with the given information. It also provides a variety of vital information regarding Resell investments, thus. forming a fundamental guideline for the market and further contributing to addressing entry barriers.

CNN Model-based Arrhythmia Classification using Image-typed ECG Data (이미지 타입의 ECG 데이터를 사용한 CNN 모델 기반 부정맥 분류)

  • Yeon-Suk Bang;Myung-Soo Jang;Yousik Hong;Sang-Suk Lee;Jun-Sang Yu;Woo-Beom Lee
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.205-212
    • /
    • 2023
  • Among cardiac diseases, arrhythmias can lead to serious complications such as stroke, heart attack, and heart failure if left untreated, so continuous and accurate ECG monitoring is crucial for clinical care. However, the accurate interpretation of electrocardiogram (ECG) data is entirely dependent on medical doctors, which requires additional time and cost. Therefore, this paper proposes an arrhythmia recognition module for the purpose of developing a medical platform through the analysis of abnormal pulse waveforms based on Lifelogs. The proposed method is to convert ECG data into image format instead of time series data, apply visual pattern recognition technology, and then detect arrhythmia using CNN model. In order to validate the arrhythmia classification of the CNN model by image type conversion of ECG data proposed in this paper, the MIT-BIH arrhythmia dataset was used, and the result showed an accuracy of 97%.

The Correlation Factors on the Analysis of Demand Factors for Apartments (주택수요 예측인자 영향도 분석에 의한 상관인자선정)

  • Yang Seung-Won;Park Keun-Joon
    • Korean Journal of Construction Engineering and Management
    • /
    • v.6 no.1 s.23
    • /
    • pp.80-88
    • /
    • 2005
  • This research describes an interactive process of analysing the demand factors for apartment on Cheonan area Using subjective statistical data for demand factor the process are categorized into main factors explained for the sensitiveness of correlation coefficient. This investigation is based on an analysis of the work of time series data One of the propose of this research is determining the correlation factors that can be effectively used in the model of forcasting. The results show a significant correlation coefficient on correlation matrix to iud the optimum correlation factors. The paper thus shows how to gain greater influntial factors on principal component analysis Consequently, this paper provides useful information about correlationship, but has limit of regional boundary for effectiveness.

Analysis of time series models for PM10 concentrations at the Suwon city in Korea (경기도 수원시 미세먼지 농도의 시계열모형 연구)

  • Lee, Hoon-Ja
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1117-1124
    • /
    • 2010
  • The PM10 (Promethium 10) data is one of the important environmental data for measurement of the atmospheric condition of the country. In this article, the Autoregressive Error (ARE) model has been considered for analyzing the monthly PM10 data at the southern part of the Gyeonggi-Do, Suwon monitoring site in Korea. In the ARE model, six meteorological variables and four pollution variables are used as the explanatory variables for the PM10 data set. The six meteorological variables are daily maximum temperature, wind speed, relative humidity, rainfall, radiation, and amount of cloud. The four air pollution explanatory variables are sulfur dioxide ($SO_2$), nitrogen dioxide ($NO_2$), carbon monoxide (CO), and ozone ($O_3$). The result showed that the monthly ARE models explained about 13-49% for describing the PM10 concentration.

A Study on Prediction of PM2.5 Concentration Using DNN (Deep Neural Network를 활용한 초미세먼지 농도 예측에 관한 연구)

  • Choi, Inho;Lee, Wonyoung;Eun, Beomjin;Heo, Jeongsook;Chang, Kwang-Hyeon;Oh, Jongmin
    • Journal of Environmental Impact Assessment
    • /
    • v.31 no.2
    • /
    • pp.83-94
    • /
    • 2022
  • In this study, DNN-based models were learned using air quality determination data for 2017, 2019, and 2020 provided by the National Measurement Network (Air Korea), and this models evaluated using data from 2016 and 2018. Based on Pearson correlation coefficient 0.2, four items (SO2, CO, NO2, PM10) were initially modeled as independent variables. In order to improve the accuracy of prediction, monthly independent modeling was carried out. The error was calculated by RMSE (Root Mean Square Error) method, and the initial model of RMSE was 5.78, which was about 46% betterthan the national moving average modelresult (10.77). In addition, the performance improvement of the independent monthly model was observed in months other than November compared to the initial model. Therefore, this study confirms that DNN modeling was effective in predicting PM2.5 concentrations based on air pollutants concentrations, and that the learning performance of the model could be improved by selecting additional independent variables.