• Title/Summary/Keyword: Decision-Tree-Model

Search Result 728, Processing Time 0.034 seconds

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

A Study on the Prediction Model of the Elderly Depression

  • SEO, Beom-Seok;SUH, Eung-Kyo;KIM, Tae-Hyeong
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.7
    • /
    • pp.29-40
    • /
    • 2020
  • Purpose: In modern society, many urban problems are occurring, such as aging, hollowing out old city centers and polarization within cities. In this study, we intend to apply big data and machine learning methodologies to predict depression symptoms in the elderly population early on, thus contributing to solving the problem of elderly depression. Research design, data and methodology: Machine learning techniques used random forest and analyzed the correlation between CES-D10 and other variables, which are widely used worldwide, to estimate important variables. Dependent variables were set up as two variables that distinguish normal/depression from moderate/severe depression, and a total of 106 independent variables were included, including subjective health conditions, cognitive abilities, and daily life quality surveys, as well as the objective characteristics of the elderly as well as the subjective health, health, employment, household background, income, consumption, assets, subjective expectations, and quality of life surveys. Results: Studies have shown that satisfaction with residential areas and quality of life and cognitive ability scores have important effects in classifying elderly depression, satisfaction with living quality and economic conditions, and number of outpatient care in living areas and clinics have been important variables. In addition, the results of a random forest performance evaluation, the accuracy of classification model that classify whether elderly depression or not was 86.3%, the sensitivity 79.5%, and the specificity 93.3%. And the accuracy of classification model the degree of elderly depression was 86.1%, sensitivity 93.9% and specificity 74.7%. Conclusions: In this study, the important variables of the estimated predictive model were identified using the random forest technique and the study was conducted with a focus on the predictive performance itself. Although there are limitations in research, such as the lack of clear criteria for the classification of depression levels and the failure to reflect variables other than KLoSA data, it is expected that if additional variables are secured in the future and high-performance predictive models are estimated and utilized through various machine learning techniques, it will be able to consider ways to improve the quality of life of senior citizens through early detection of depression and thus help them make public policy decisions.

Performance Comparison of Machine Learning based Prediction Models for University Students Dropout (머신러닝 기반 대학생 중도 탈락 예측 모델의 성능 비교)

  • Seok-Bong Jeong;Du-Yon Kim
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.4
    • /
    • pp.19-26
    • /
    • 2023
  • The increase in the dropout rate of college students nationwide has a serious negative impact on universities and society as well as individual students. In order to proactive identify students at risk of dropout, this study built a decision tree, random forest, logistic regression, and deep learning-based dropout prediction model using academic data that can be easily obtained from each university's academic management system. Their performances were subsequently analyzed and compared. The analysis revealed that while the logistic regression-based prediction model exhibited the highest recall rate, its f-1 value and ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) value were comparatively lower. On the other hand, the random forest-based prediction model demonstrated superior performance across all other metrics except recall value. In addition, in order to assess model performance over distinct prediction periods, we divided these periods into short-term (within one semester), medium-term (within two semesters), and long-term (within three semesters). The results underscored that the long-term prediction yielded the highest predictive efficacy. Through this study, each university is expected to be able to identify students who are expected to be dropped out early, reduce the dropout rate through intensive management, and further contribute to the stabilization of university finances.

Monetary policy synchronization of Korea and United States reflected in the statements (통화정책 결정문에 나타난 한미 통화정책 동조화 현상 분석)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.115-126
    • /
    • 2021
  • Central banks communicate with the market through a statement on the direction of monetary policy while implementing monetary policy. The rapid contraction of the global economy due to the recent Covid-19 pandemic could be compared to the crisis situation during the 2008 global financial crisis. In this paper, we analyzed the text data from the monetary policy statements of the Bank of Korea and Fed reflecting monetary policy directions focusing on how they were affected in the face of a global crisis. For analysis, we collected the text data of the two countries' monetary policy direction reports published from October 1999 to September 2020. We examined the semantic features using word cloud and word embedding, and analyzed the trend of the similarity between two countries' documents through a piecewise regression tree model. The visualization result shows that both the Bank of Korea and the US Fed have published the statements with refined words of clear meaning for transparent and effective communication with the market. The analysis of the dissimilarity trend of documents in both countries also shows that there exists a sense of synchronization between them as the rapid changes in the global economic environment affect monetary policy.

A Study on Foreign Exchange Rate Prediction Based on KTB, IRS and CCS Rates: Empirical Evidence from the Use of Artificial Intelligence (국고채, 금리 스왑 그리고 통화 스왑 가격에 기반한 외환시장 환율예측 연구: 인공지능 활용의 실증적 증거)

  • Lim, Hyun Wook;Jeong, Seung Hwan;Lee, Hee Soo;Oh, Kyong Joo
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.71-85
    • /
    • 2021
  • The purpose of this study is to find out which artificial intelligence methodology is most suitable for creating a foreign exchange rate prediction model using the indicators of bond market and interest rate market. KTBs and MSBs, which are representative products of the Korea bond market, are sold on a large scale when a risk aversion occurs, and in such cases, the USD/KRW exchange rate often rises. When USD liquidity problems occur in the onshore Korean market, the KRW Cross-Currency Swap price in the interest rate market falls, then it plays as a signal to buy USD/KRW in the foreign exchange market. Considering that the price and movement of products traded in the bond market and interest rate market directly or indirectly affect the foreign exchange market, it may be regarded that there is a close and complementary relationship among the three markets. There have been studies that reveal the relationship and correlation between the bond market, interest rate market, and foreign exchange market, but many exchange rate prediction studies in the past have mainly focused on studies based on macroeconomic indicators such as GDP, current account surplus/deficit, and inflation while active research to predict the exchange rate of the foreign exchange market using artificial intelligence based on the bond market and interest rate market indicators has not been conducted yet. This study uses the bond market and interest rate market indicator, runs artificial neural network suitable for nonlinear data analysis, logistic regression suitable for linear data analysis, and decision tree suitable for nonlinear & linear data analysis, and proves that the artificial neural network is the most suitable methodology for predicting the foreign exchange rates which are nonlinear and times series data. Beyond revealing the simple correlation between the bond market, interest rate market, and foreign exchange market, capturing the trading signals between the three markets to reveal the active correlation and prove the mutual organic movement is not only to provide foreign exchange market traders with a new trading model but also to be expected to contribute to increasing the efficiency and the knowledge management of the entire financial market.

Analysis of Traffic Accidents Injury Severity in Seoul using Decision Trees and Spatiotemporal Data Visualization (의사결정나무와 시공간 시각화를 통한 서울시 교통사고 심각도 요인 분석)

  • Kang, Youngok;Son, Serin;Cho, Nahye
    • Journal of Cadastre & Land InformatiX
    • /
    • v.47 no.2
    • /
    • pp.233-254
    • /
    • 2017
  • The purpose of this study is to analyze the main factors influencing the severity of traffic accidents and to visualize spatiotemporal characteristics of traffic accidents in Seoul. To do this, we collected the traffic accident data that occurred in Seoul for four years from 2012 to 2015, and classified as slight, serious, and death traffic accidents according to the severity of traffic accidents. The analysis of spatiotemporal characteristics of traffic accidents was performed by kernel density analysis, hotspot analysis, space time cube analysis, and Emerging HotSpot Analysis. The factors affecting the severity of traffic accidents were analyzed using decision tree model. The results show that traffic accidents in Seoul are more frequent in suburbs than in central areas. Especially, traffic accidents concentrated in some commercial and entertainment areas in Seocho and Gangnam, and the traffic accidents were more and more intense over time. In the case of death traffic accidents, there were statistically significant hotspot areas in Yeongdeungpo-gu, Guro-gu, Jongno-gu, Jung-gu and Seongbuk. However, hotspots of death traffic accidents by time zone resulted in different patterns. In terms of traffic accident severity, the type of accident is the most important factor. The type of the road, the type of the vehicle, the time of the traffic accident, and the type of the violation of the regulations were ranked in order of importance. Regarding decision rules that cause serious traffic accidents, in case of van or truck, there is a high probability that a serious traffic accident will occur at a place where the width of the road is wide and the vehicle speed is high. In case of bicycle, car, motorcycle or the others there is a high probability that a serious traffic accident will occur under the same circumstances in the dawn time.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Cost-Effectiveness Analysis of Different Management Strategies for Detection CIN2+ of Women with Atypical Squamous Cells of Undetermined Significance (ASC-US) Pap Smear in Thailand

  • Tantitamit, Tanitra;Termrungruanglert, Wichai;Oranratanaphan, Shina;Niruthisard, Somchai;Tanbirojn, Patuou;Havanond, Piyalamporn
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.16
    • /
    • pp.6857-6862
    • /
    • 2015
  • Background: To identify the optimal cost effective strategy for the management of women having ASC-US who attended at King Chulalongkorn Memorial Hospital (KMCH). Design: An Economical Analysis based on a retrospective study. Subject: The women who were referred to the gynecological department due to screening result of ASC-US at King Chulalongkorn Memorial Hospital, a general and tertiary referral center in Bangkok Thailand, from Jan 2008 - Dec 2012. Materials and Methods: A decision tree-based was constructed to evaluate the cost effectiveness of three follow up strategies in the management of ASC-US results: repeat cytology, triage with HPV testing and immediate colposcopy. Each ASC-US woman made the decision of each strategy after receiving all details about this algorithm, advantages and disadvantages of each strategy from a doctor. The model compared the incremental costs per case of high-grade cervical intraepithelial neoplasia (CIN2+) detected as measured by incremental cost-effectiveness ratio (ICER). Results: From the provider's perspective, immediate colposcopy is the least costly strategy and also the most effective option among the three follow up strategies. Compared with HPV triage, repeat cytology triage is less costly than HPV triage, whereas the latter provides a more effective option at an incremental cost-effectiveness ratio (ICER) of 56,048 Baht per additional case of CIN 2+ detected. From the patient's perspective, the least costly and least effective is repeat cytology triage. Repeat colposcopy has an incremental cost-effectiveness (ICER) of 2,500 Baht per additional case of CIN2+ detected when compared to colposcopy. From the sensitivity analysis, immediate colposcopy triage is no longer cost effective when the cost exceeds 2,250 Baht or the cost of cytology is less than 50 Baht (1USD = 31.58 THB). Conclusions: In women with ASC-US cytology, colposcopy is more cost-effective than repeat cytology or triage with HPV testing for both provider and patient perspectives.

Analysis of the Difference Between Purchasing Decision Factors and Quality Satisfaction of Community Social Service Investment (지역사회서비스투자사업의 구매결정 요인과 품질만족 차이 분석)

  • Jang, Chun_Ok;Lee, Jung-Eun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.251-256
    • /
    • 2021
  • Currently, in the field of community service, it is expected that the demand will further increase in the future by enabling the form of providing various types of services. However, the local community service investment project is an abstract Although the structure for fair competition was created by introducing a market mechanism derived from the action or principle of psychology that affects human behavior in the field, systematic management and monitoring of the quality of social services is insufficient. The purpose of this study is to find out the relationship between service selection factors and service quality in order to improve the quality of social services in the consumer's way to meet these environmental needs, and to utilize the research results for quality improvement. The research model to be used in this paper measures the five element areas of service satisfaction such as reliability, responsiveness, empathy, certainty, and tangibility, which are used to measure the quality of local community service investment projects. In addition, we are various strategic implications that can induce the quality improvement of local community service investment projects are presented by finding the main factors of the four research hypotheses of this study and utilizing the results.

The Comparison of Risk-adjusted Mortality Rate between Korea and United States (한국과 미국 의료기관의 중증도 보정 사망률 비교)

  • Chung, Tae-Kyoung;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.11 no.5
    • /
    • pp.371-384
    • /
    • 2013
  • The purpose of this study was to develop the risk-adjusted mortality model using Korean Hospital Discharge Injury data and US National Hospital Discharge Survey data and to suggest some ways to manage hospital mortality rates through comparison of Korea and United States Hospital Standardized Mortality Ratios(HSMR). This study used data mining techniques, decision tree and logistic regression, for developing Korea and United States risk-adjustment model of in-hospital mortality. By comparing Hospital Standardized Mortality Ratio(HSMR) with standardized variables, analysis shows the concrete differences between the two countries. While Korean Hospital Standardized Mortality Ratio(HSMR) is increasing every year(101.0 in 2006, 101.3 in 2007, 103.3 in 2008), HSMR appeared to be reduced in the United States(102.3 in 2006, 100.7 in 2007, 95.9 in 2008). Korean Hospital Standardized Mortality Ratios(HSMR) by hospital beds were higher than that of the United States. A two-aspect approach to management of hospital mortality rates is suggested; national and hospital levels. The government is to release Hospital Standardized Mortality Ratio(HSMR) of large hospitals and to offer consulting on effective hospital mortality management to small and medium hospitals.