• Title/Summary/Keyword: 시계열 데이터 예측

Search Result 535, Processing Time 0.023 seconds

'Hot Search Keyword' Rank-Change Prediction (인기 검색어의 순위 변화 예측)

  • Kim, Dohyeong;Kang, Byeong Ho;Lee, Sungyoung
    • Journal of KIISE
    • /
    • v.44 no.8
    • /
    • pp.782-790
    • /
    • 2017
  • The service, 'Hot Search Keywords', provides a list of the most hot search terms of different web services such as Naver or Daum. The service, bases the changes in rank of a specific search keyword on changes in its users' interest. This paper introduces a temporal modelling framework for predicting the rank change of hot search keywords using past rank data and machine learning. Past rank data shows that more than 70% of hot search keywords tend to disappear and reappear later. The authors processed missing rank value, using deletion, dummy variables, mean substitution, and expectation maximization. It is however crucial to calculate the optimal window size of the past rank data. We proposed an optimal window size selection approach based on the minimum amount of time a topic within the same or a differing context disappeared. The experiments were conducted with four different machine-learning techniques using the Naver, Daum, and Nate 'Hot Search Keywords' datasets, which were collected for 2 years.

Development of a Machine Learning Model for Imputing Time Series Data with Massive Missing Values (결측치 비율이 높은 시계열 데이터 분석 및 예측을 위한 머신러닝 모델 구축)

  • Bangwon Ko;Yong Hee Han
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.3
    • /
    • pp.176-182
    • /
    • 2024
  • In this study, we compared and analyzed various methods of missing data handling to build a machine learning model that can effectively analyze and predict time series data with a high percentage of missing values. For this purpose, Predictive State Model Filtering (PSMF), MissForest, and Imputation By Feature Importance (IBFI) methods were applied, and their prediction performance was evaluated using LightGBM, XGBoost, and Explainable Boosting Machines (EBM) machine learning models. The results of the study showed that MissForest and IBFI performed the best among the methods for handling missing values, reflecting the nonlinear data patterns, and that XGBoost and EBM models performed better than LightGBM. This study emphasizes the importance of combining nonlinear imputation methods and machine learning models in the analysis and prediction of time series data with a high percentage of missing values, and provides a practical methodology.

Stochastic Volatility Model vs. GARCH Model : A Comparative Study (확률적 변동성 모형과 자기회귀이분산 모형의 비교분석)

  • 이용흔;김삼용;황선영
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.217-224
    • /
    • 2003
  • The volatility in the financial data is usually measured by conditional variance. Two main streams for gauging conditional variance are stochastic volatility (SV) model and autoregressive type approach (GARCH). This article is conducting comparative study between SV and GARCH through the Korean Stock Prices Index (KOSPI) data. It is seen that SV model is slightly better than GARCH(1,1) in analyzing KOSPI data.

유사추론 기반 예측모형

  • Jang, Yong-Sik;Choe, Yun-Jeong
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.11a
    • /
    • pp.581-585
    • /
    • 2007
  • 본 연구는 비선형적인 시계열 자료로부터 최신 데이터와 유사한 사례를 탐색하여 미래를 예측하기 위하여 유사추론 기법을 이용한 예측 알고리즘을 제안한다. 기존의 연구들이 최신 데이터와 과거 사례와의 유사성을 비교하기 위해 유클리디언 거리 또는 평균 제곱에러 등을 이용하나, 추세의 유사성을 고려하지는 않는다. 본 연구는 사례 구간 크기, 예측 오차, 평균차이 검증, 사례간 추세의 유사성 등 다차원적 유사추론 요인을 이용한 예측방법과 그 효과를 제시한다.

  • PDF

An Analysis of the street structure and the Morphological Change using Space Syntax in Kangnam, Seoul (공간구문론을 활용한 가로체계와 공간변화 분석 - 서울 강남구를 사례로)

  • Kim, Hye-Young;Joo, Yong-Jin;Jun, Chul-Min
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2010.06a
    • /
    • pp.69-70
    • /
    • 2010
  • 우리나라의 경우 시계열적인 토지 이용의 변화특성에 대한 경향 및 유형의 분석과 예측에 관련한 연구는 그 중요성에도 불구하고 미흡한 실정이다. 따라서 본 연구는 서울시 강남구의 구축한 시계열 데이터를 바탕으로 가로체계와 토지이용 자료를 사용하여 강남구 공간구조의 시계열 공간구조변화의 패턴분석을 목적으로 한다. 또한 토지이용 변화과정을 함께 비교분석한다. 강남구는 70년대 초부터 개발로 인해 많이 변화해온 지역이다. 이를 고려하여 60,70,80,90년의 시계열별 공간구문론을 도입하여 축선도(Axial map)를 통해 정량적 분석을 한다. 향후 도로의 접근성 측면에서의 토지이용변화 예측모델 방법론과 연계가 이루어진다면 공간변화를 효과적으로 추정할 수 있을 것이라 기대한다.

  • PDF

Hourly Prediction of Particulate Matter (PM2.5) Concentration Using Time Series Data and Random Forest (시계열 데이터와 랜덤 포레스트를 활용한 시간당 초미세먼지 농도 예측)

  • Lee, Deukwoo;Lee, Soowon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.129-136
    • /
    • 2020
  • PM2.5 which is a very tiny air particulate matter even smaller than PM10 has been issued in the environmental problem. Since PM2.5 can cause eye diseases or respiratory problems and infiltrate even deep blood vessels in the brain, it is important to predict PM2.5. However, it is difficult to predict PM2.5 because there is no clear explanation yet regarding the creation and the movement of PM2.5. Thus, prediction methods which not only predict PM2.5 accurately but also have the interpretability of the result are needed. To predict hourly PM2.5 of Seoul city, we propose a method using random forest with the adjusted bootstrap number from the time series ground data preprocessed on different sources. With this method, the prediction model can be trained uniformly on hourly information and the result has the interpretability. To evaluate the prediction performance, we conducted comparative experiments. As a result, the performance of the proposed method was superior against other models in all labels. Also, the proposed method showed the importance of the variables regarding the creation of PM2.5 and the effect of China.

The Study on Traffic Accident Trend by Age with Time Series Models (연령별 사고 추세 및 시계열 분석모형에 관한 연구)

  • Yoon, Byoung-Jo;Ko, Eun-Hyeck;Yang, Sung-Ryong
    • Proceedings of the Korean Society of Disaster Information Conference
    • /
    • 2016.11a
    • /
    • pp.255-256
    • /
    • 2016
  • 우리나라의 2015년 노인 인구는 전체 인구의 13.1%를 차지하고 2015년 경찰청 교통사고통계에 의하면 65세 이상 노인의 교통사고 사망률은 전체 교통사고 사망률의 약 2.57배 높은 것으로 나타났다. 본 연구에서는 노인 운전자와 성인 운전자의 사망사고에 대한 시계열 모형을 확인하고 추세에 큰 차이가 있는지 확인하고자 하였다. 분석방법인 시계열분석은 단기예측에 신뢰성이 더 높은 것으로 알려져 있다. ARIMA 모형으로 시계열분석을 하기 위해서는 최소 50~60개 이상의 관측값이 필요하며 따라서 본 연구에서는 인천광역시를 대상으로 2010년부터 2015년까지 6년간의 교통사고 데이터를 노인 운전자와 성인 운전자로 구분하고 사망사고에 대한 시계열 모형을 확인하였다.

  • PDF

Prediction of the interest spread using VAR model (벡터자기회귀모형에 의한 금리스프레드의 예측)

  • Kim, Junhong;Jin, Dalae;Lee, Jisun;Kim, Suji;Son, Young Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1093-1102
    • /
    • 2012
  • In this paper, we predicted the interest spread using the VAR (vector autoregressive) model. Variables used in the VAR model were selected among 56 domestic and foreign macroeconomic time series through crosscorrelation and Granger causality test. The performance of the VAR model was compared with the univariate time series model, AR (autoregressive) model, in view of MAPE (mean absolute percentage error) and RMSE (root mean square error) of forecasts for the last twelve months.

Development of Prediction of Electric Arc Risk using Object Dection Model (객체 탐지 모델을 활용한 전기 아크 위험성 예측 시스템 개발)

  • Lee, Gyu-bin;Kim, Seung-yeon;An, Donghyeok
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.38-44
    • /
    • 2020
  • Due to the high dependence on electric energy, electric fires make up a significant portion of fires in Korea. Electric arcs by short circuits or poor contact cause three of four electrical fires. An electric arc is a discharge phenomenon of electrical current between the insulators, which instantaneously produces high temperature. In order to reduce the fire due to electric arc, this study aims to predict the electric arc risk. We collected arc data from the arc detectors and converted into graphs based on temporal arc data. We used machine learning for training converted graph with different number of temporal arc data. To measure the performance of the learning model, we use the test data. In the results, when the number of temporal arc data was 20, the prediction rate was high as 86%.

The Prediction of Cryptocurrency on Using Text Mining and Deep Learning Techniques : Comparison of Korean and USA Market (텍스트 마이닝과 딥러닝을 활용한 암호화폐 가격 예측 : 한국과 미국시장 비교)

  • Won, Jonggwan;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • In this study, we predicted the bitcoin prices of Bithum and Coinbase, a leading exchange in Korea and USA, using ARIMA and Recurrent Neural Networks(RNNs). And we used news articles from each country to suggest a separated RNN model. The suggested model identifies the datasets based on the changing trend of prices in the training data, and then applies time series prediction technique(RNNs) to create multiple models. Then we used daily news data to create a term-based dictionary for each trend change point. We explored trend change points in the test data using the daily news keyword data of testset and term-based dictionary, and apply a matching model to produce prediction results. With this approach we obtained higher accuracy than the model which predicted price by applying just time series prediction technique. This study presents that the limitations of the time series prediction techniques could be overcome by exploring trend change points using news data and various time series prediction techniques with text mining techniques could be applied to improve the performance of the model in the further research.