• Title/Summary/Keyword: support vector regression machine

Search Result 381, Processing Time 0.036 seconds

Differences of Cold-heat Patterns between Healthy and Disease Group (건강군과 질환군의 한열지표 차이에 관한 고찰)

  • Kim Ji-Eun;Lee Seung-Gi;Ryu Hwa-Seung;Park Kyung-Mo
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.20 no.1
    • /
    • pp.224-228
    • /
    • 2006
  • The pattern identification of exterior-interior syndrome and cold-heat syndrome is one of the diagnostic methods using most frequently in Oriental medicine. There was no systematic studies analyzing the characteristics of the 'exterior-interior and cold-heat' between healthy and disease group. In this study, cold-heat pattern, blood pressure, pulse rate, height and weight are recorded from 100 healthy subjects and 196 disease subjects with age ranging from 30 to 59 years. To analyze the differences between healthy and disease group, we used the descriptive statistics. And linear regression function, linear support vector machine and bayesian classifier were used for distinguishing healthy group from disease group. The score of both exterior-heat and interior-cold in healthy group is higher than the score in disease group. This means that if one belongs to the disease group, his(or her) exterior gets cold and his interior gets hot. And also, these result have no relevance to age. But, the attempt to classify healthy group from disease group with a exterior-interior and cold-heat and other vital signs did not have good performance. It mean that even though they have a different trend each other, only these kinds of information couldn't classify healthy group and disease group.

Effective Drought Prediction Based on Machine Learning (머신러닝 기반 효과적인 가뭄예측)

  • Kim, Kyosik;Yoo, Jae Hwan;Kim, Byunghyun;Han, Kun-Yeun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.326-326
    • /
    • 2021
  • 장기간에 걸쳐 넓은 지역에 대해 발생하는 가뭄을 예측하기위해 많은 학자들의 기술적, 학술적 시도가 있어왔다. 본 연구에서는 복잡한 시계열을 가진 가뭄을 전망하는 방법 중 시나리오에 기반을 둔 가뭄전망 방법과 실시간으로 가뭄을 예측하는 비시나리오 기반의 방법 등을 이용하여 미래 가뭄전망을 실시했다. 시나리오에 기반을 둔 가뭄전망 방법으로는, 3개월 GCM(General Circulation Model) 예측 결과를 바탕으로 2009년도 PDSI(Palmer Drought Severity Index) 가뭄지수를 산정하여 가뭄심도에 대한 단기예측을 실시하였다. 또, 통계학적 방법과 물리적 모델(Physical model)에 기반을 둔 확정론적 수치해석 방법을 이용하여 비시나리오 기반 가뭄을 예측했다. 기존 가뭄을 통계학적 방법으로 예측하기 위해서 시도된 대표적인 방법으로 ARIMA(Autoregressive Integrated Moving Average) 모델의 예측에 대한 한계를 극복하기위해 서포트 벡터 회귀(support vector regression, SVR)와 웨이블릿(wavelet neural network) 신경망을 이용해 SPI를 측정하였다. 최적모델구조는 RMSE(root mean square error), MAE(mean absolute error) 및 R(correlation Coefficient)를 통해 선정하였고, 1-6개월의 선행예보 시간을 갖고 가뭄을 전망하였다. 그리고 SPI를 이용하여, 마코프 연쇄(Markov chain) 및 대수선형모델(log-linear model)을 적용하여 SPI기반 가뭄예측의 정확도를 검증하였으며, 터키의 아나톨리아(Anatolia) 지역을 대상으로 뉴로퍼지모델(Neuro-Fuzzy)을 적용하여 1964-2006년 기간의 월평균 강수량과 SPI를 바탕으로 가뭄을 예측하였다. 가뭄 빈도와 패턴이 불규칙적으로 변하며 지역별 강수량의 양극화가 심화됨에 따라 가뭄예측의 정확도를 높여야 하는 요구가 커지고 있다. 본 연구에서는 복잡하고 비선형성으로 이루어진 가뭄 패턴을 기상학적 가뭄의 정도를 나타내는 표준강수증발지수(SPEI, Standardized Precipitation Evapotranspiration Index)인 월SPEI와 일SPEI를 기계학습모델에 적용하여 예측개선 모형을 개발하고자 한다.

  • PDF

An Ensemble Classification of Mental Health in Malaysia related to the Covid-19 Pandemic using Social Media Sentiment Analysis

  • Nur 'Aisyah Binti Zakaria Adli;Muneer Ahmad;Norjihan Abdul Ghani;Sri Devi Ravana;Azah Anir Norman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.370-396
    • /
    • 2024
  • COVID-19 was declared a pandemic by the World Health Organization (WHO) on 30 January 2020. The lifestyle of people all over the world has changed since. In most cases, the pandemic has appeared to create severe mental disorders, anxieties, and depression among people. Mostly, the researchers have been conducting surveys to identify the impacts of the pandemic on the mental health of people. Despite the better quality, tailored, and more specific data that can be generated by surveys,social media offers great insights into revealing the impact of the pandemic on mental health. Since people feel connected on social media, thus, this study aims to get the people's sentiments about the pandemic related to mental issues. Word Cloud was used to visualize and identify the most frequent keywords related to COVID-19 and mental health disorders. This study employs Majority Voting Ensemble (MVE) classification and individual classifiers such as Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR) to classify the sentiment through tweets. The tweets were classified into either positive, neutral, or negative using the Valence Aware Dictionary or sEntiment Reasoner (VADER). Confusion matrix and classification reports bestow the precision, recall, and F1-score in identifying the best algorithm for classifying the sentiments.

An Accurate Cryptocurrency Price Forecasting using Reverse Walk-Forward Validation (역순 워크 포워드 검증을 이용한 암호화폐 가격 예측)

  • Ahn, Hyun;Jang, Baekcheol
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.45-55
    • /
    • 2022
  • The size of the cryptocurrency market is growing. For example, market capitalization of bitcoin exceeded 500 trillion won. Accordingly, many studies have been conducted to predict the price of cryptocurrency, and most of them have similar methodology of predicting stock prices. However, unlike stock price predictions, machine learning become best model in cryptocurrency price predictions, conceptually cryptocurrency has no passive income from ownership, and statistically, cryptocurrency has at least three times higher liquidity than stocks. Thats why we argue that a methodology different from stock price prediction should be applied to cryptocurrency price prediction studies. We propose Reverse Walk-forward Validation (RWFV), which modifies Walk-forward Validation (WFV). Unlike WFV, RWFV measures accuracy for Validation by pinning the Validation dataset directly in front of the Test dataset in time series, and gradually increasing the size of the Training dataset in front of it in time series. Train data were cut according to the size of the Train dataset with the highest accuracy among all measured Validation accuracy, and then combined with Validation data to measure the accuracy of the Test data. Logistic regression analysis and Support Vector Machine (SVM) were used as the analysis model, and various algorithms and parameters such as L1, L2, rbf, and poly were applied for the reliability of our proposed RWFV. As a result, it was confirmed that all analysis models showed improved accuracy compared to existing studies, and on average, the accuracy increased by 1.23%p. This is a significant improvement in accuracy, given that most of the accuracy of cryptocurrency price prediction remains between 50% and 60% through previous studies.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

Estimation of GARCH Models and Performance Analysis of Volatility Trading System using Support Vector Regression (Support Vector Regression을 이용한 GARCH 모형의 추정과 투자전략의 성과분석)

  • Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.107-122
    • /
    • 2017
  • Volatility in the stock market returns is a measure of investment risk. It plays a central role in portfolio optimization, asset pricing and risk management as well as most theoretical financial models. Engle(1982) presented a pioneering paper on the stock market volatility that explains the time-variant characteristics embedded in the stock market return volatility. His model, Autoregressive Conditional Heteroscedasticity (ARCH), was generalized by Bollerslev(1986) as GARCH models. Empirical studies have shown that GARCH models describes well the fat-tailed return distributions and volatility clustering phenomenon appearing in stock prices. The parameters of the GARCH models are generally estimated by the maximum likelihood estimation (MLE) based on the standard normal density. But, since 1987 Black Monday, the stock market prices have become very complex and shown a lot of noisy terms. Recent studies start to apply artificial intelligent approach in estimating the GARCH parameters as a substitute for the MLE. The paper presents SVR-based GARCH process and compares with MLE-based GARCH process to estimate the parameters of GARCH models which are known to well forecast stock market volatility. Kernel functions used in SVR estimation process are linear, polynomial and radial. We analyzed the suggested models with KOSPI 200 Index. This index is constituted by 200 blue chip stocks listed in the Korea Exchange. We sampled KOSPI 200 daily closing values from 2010 to 2015. Sample observations are 1487 days. We used 1187 days to train the suggested GARCH models and the remaining 300 days were used as testing data. First, symmetric and asymmetric GARCH models are estimated by MLE. We forecasted KOSPI 200 Index return volatility and the statistical metric MSE shows better results for the asymmetric GARCH models such as E-GARCH or GJR-GARCH. This is consistent with the documented non-normal return distribution characteristics with fat-tail and leptokurtosis. Compared with MLE estimation process, SVR-based GARCH models outperform the MLE methodology in KOSPI 200 Index return volatility forecasting. Polynomial kernel function shows exceptionally lower forecasting accuracy. We suggested Intelligent Volatility Trading System (IVTS) that utilizes the forecasted volatility results. IVTS entry rules are as follows. If forecasted tomorrow volatility will increase then buy volatility today. If forecasted tomorrow volatility will decrease then sell volatility today. If forecasted volatility direction does not change we hold the existing buy or sell positions. IVTS is assumed to buy and sell historical volatility values. This is somewhat unreal because we cannot trade historical volatility values themselves. But our simulation results are meaningful since the Korea Exchange introduced volatility futures contract that traders can trade since November 2014. The trading systems with SVR-based GARCH models show higher returns than MLE-based GARCH in the testing period. And trading profitable percentages of MLE-based GARCH IVTS models range from 47.5% to 50.0%, trading profitable percentages of SVR-based GARCH IVTS models range from 51.8% to 59.7%. MLE-based symmetric S-GARCH shows +150.2% return and SVR-based symmetric S-GARCH shows +526.4% return. MLE-based asymmetric E-GARCH shows -72% return and SVR-based asymmetric E-GARCH shows +245.6% return. MLE-based asymmetric GJR-GARCH shows -98.7% return and SVR-based asymmetric GJR-GARCH shows +126.3% return. Linear kernel function shows higher trading returns than radial kernel function. Best performance of SVR-based IVTS is +526.4% and that of MLE-based IVTS is +150.2%. SVR-based GARCH IVTS shows higher trading frequency. This study has some limitations. Our models are solely based on SVR. Other artificial intelligence models are needed to search for better performance. We do not consider costs incurred in the trading process including brokerage commissions and slippage costs. IVTS trading performance is unreal since we use historical volatility values as trading objects. The exact forecasting of stock market volatility is essential in the real trading as well as asset pricing models. Further studies on other machine learning-based GARCH models can give better information for the stock market investors.

A study on entertainment TV show ratings and the number of episodes prediction (국내 예능 시청률과 회차 예측 및 영향요인 분석)

  • Kim, Milim;Lim, Soyeon;Jang, Chohee;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.809-825
    • /
    • 2017
  • The number of TV entertainment shows is increasing. Competition among programs in the entertainment market is intensifying since cable channels air many entertainment TV shows. There is now a need for research on program ratings and the number of episodes. This study presents predictive models for entertainment TV show ratings and number of episodes. We use various data mining techniques such as linear regression, logistic regression, LASSO, random forests, gradient boosting, and support vector machine. The analysis results show that the average program ratings before the first broadcast is affected by broadcasting company, average ratings of the previous season, starting year and number of articles. The average program ratings after the first broadcast is influenced by the rating of the first broadcast, broadcasting company and program type. We also found that the predicted average ratings, starting year, type and broadcasting company are important variables in predicting of the number of episodes.

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

  • Bae, Eun Chan;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.6
    • /
    • pp.9-19
    • /
    • 2016
  • In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.