• Title/Summary/Keyword: decision tree and system analysis

Search Result 221, Processing Time 0.033 seconds

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

  • Bae, Eun Chan;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.6
    • /
    • pp.9-19
    • /
    • 2016
  • In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.

A Comparative Study on the Accuracy of Important Statistical Prediction Techniques for Marketing Data (마케팅 데이터를 대상으로 중요 통계 예측 기법의 정확성에 대한 비교 연구)

  • Cho, Min-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.775-780
    • /
    • 2019
  • Techniques for predicting the future can be categorized into statistics-based and deep-run-based techniques. Among them, statistic-based techniques are widely used because simple and highly accurate. However, working-level officials have difficulty using many analytical techniques correctly. In this study, we compared the accuracy of prediction by applying multinomial logistic regression, decision tree, random forest, support vector machine, and Bayesian inference to marketing related data. The same marketing data was used, and analysis was conducted by using R. The prediction results of various techniques reflecting the data characteristics of the marketing field will be a good reference for practitioners.

A Recommending System for Care Plan(Res-CP) in Long-Term Care Insurance System (데이터마이닝 기법을 활용한 노인장기요양급여 권고모형 개발)

  • Han, Eun-Jeong;Lee, Jung-Suk;Kim, Dong-Geon;Ka, Im-Ok
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1229-1237
    • /
    • 2009
  • In the long-term care insurance(LTCI) system, the question of how to provide the most appropriate care has become a major issue for the elderly, their family, and for policy makers. To help beneficiaries use LTC services appropriately to their needs of care, National Health Insurance Corporation(NHIC) provide them with the individualized care plan, named the Long-term Care User Guide. It includes recommendations for beneficiaries' most appropriate type of care. The purpose of this study is to develop a recommending system for care plan(Res-CP) in LTCI system. We used data set for Long-term Care User Guide in the 3rd long-term care insurance pilot programs. To develop the model, we tested four models, including a decision-tree model in data-mining, a logistic regression model, and a boosting and boosting techniques in an ensemble model. A decision-tree model was selected to describe the Res-CP, because it may be easy to explain the algorithm of Res-CP to the working groups. Res-CP might be useful in an evidence-based care planning in LTCI system and may contribute to support use of LTC services efficiently.

Analysis for Changes of Mode Choice Behavior from Providing Real-time Schedule for Public Transportation by Smartphone Application (스마트폰 애플리케이션을 이용한 대중교통 운행정보 제공에 따른 통행자 수단선택 행태변화 분석)

  • Choi, Sung-Taek;Rho, Jeong-Hyun
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.11 no.6
    • /
    • pp.60-69
    • /
    • 2012
  • Public Transport Information Service which use smartphone Apps has received attention as the way of solution that reduced transport problem. Smartphone can offer real-time information because of a LBS(Location Based Service) system. This study try to find out which factor affect mode choice ratio of public transport, especially smartphone Apps. The result shows that rising oil price, traffic congestion, public information service with smartphone apps, BIS(Bus Information System) factors get 0.39, 0.27, 0.18, 0.16 scores with paired comparison. Younger and student respondents prefer smart phone public information service. Decision Tree shows that the most important decision factor is smartphone information service factor.

Development and Effect Analysis of a Learning Support System for Underachievers Using Psychological Learning Style Tests (학습 스타일 심리검사를 이용한 부진아 학습 지원 시스템의 개발 및 효과 분석)

  • Lee, Jong-Suk;Jang, Eun-Sill;Lee, Yong-Kyu
    • Journal of The Korean Association of Information Education
    • /
    • v.11 no.3
    • /
    • pp.299-306
    • /
    • 2007
  • It is urgent to have learning support for children with learning disability according to the survey made by the government educational organization. To this end, we developed a learning support system for children with learning disability. First, the system diagnoses the children with learning disability using a decision tree based on the pre-test results. Secondly, it supports for children with learning disability one of audio-, vision- and tactility-oriented learning types according to the results from the psychological learning style test. Thirdly, one-to-one study is supported for failed students at the achievement test. For the evaluation of the system, the children with disability were divided into an experimental group and a control group and the educational achievement was evaluated. We found that 10% on the average was improved in case that learning was made after the psychological test for learning styles.

  • PDF

Short-term demand forecasting method at both direction power exchange which uses a data mining (데이터 마이닝을 이용한 양방향 전력거래상의 단기수요예측기법)

  • Kim Hyoung Joong;Lee Jong Soo;Shin Myong Chul;Choi Sang Yeoul
    • Proceedings of the KIEE Conference
    • /
    • summer
    • /
    • pp.722-724
    • /
    • 2004
  • Demand estimates in electric power systems have traditionally consisted of time-series analyses over long time periods. The resulting database consisted of huge amounts of data that were then analyzed to create the various coefficients used to forecast power demand. In this research, we take advantage of universally used analysis techniques analysis, but we also use easily available data-mining techniques to analyze patterns of days and special days(holidays, etc.). We then present a new method for estimating and forecasting power flow using decision tree analysis. And because analyzing the relationship between the estimate and power system ceiling Trices currently set by the Korea Power Exchange. We included power system ceiling prices in our estimate coefficients and estimate method.

  • PDF

Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining (데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발)

  • Yoon, Seungjin;Kim, Suhwan;Shin, Kyungshik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.1-17
    • /
    • 2015
  • In this paper, we report what we have observed with regards to a prediction model for the military based on enlisted men's internal(cumulative records) and external data(SNS data). This work is significant in the military's efforts to supervise them. In spite of their effort, many commanders have failed to prevent accidents by their subordinates. One of the important duties of officers' work is to take care of their subordinates in prevention unexpected accidents. However, it is hard to prevent accidents so we must attempt to determine a proper method. Our motivation for presenting this paper is to mate it possible to predict accidents using enlisted men's internal and external data. The biggest issue facing the military is the occurrence of accidents by enlisted men related to maladjustment and the relaxation of military discipline. The core method of preventing accidents by soldiers is to identify problems and manage them quickly. Commanders predict accidents by interviewing their soldiers and observing their surroundings. It requires considerable time and effort and results in a significant difference depending on the capabilities of the commanders. In this paper, we seek to predict accidents with objective data which can easily be obtained. Recently, records of enlisted men as well as SNS communication between commanders and soldiers, make it possible to predict and prevent accidents. This paper concerns the application of data mining to identify their interests, predict accidents and make use of internal and external data (SNS). We propose both a topic analysis and decision tree method. The study is conducted in two steps. First, topic analysis is conducted through the SNS of enlisted men. Second, the decision tree method is used to analyze the internal data with the results of the first analysis. The dependent variable for these analysis is the presence of any accidents. In order to analyze their SNS, we require tools such as text mining and topic analysis. We used SAS Enterprise Miner 12.1, which provides a text miner module. Our approach for finding their interests is composed of three main phases; collecting, topic analysis, and converting topic analysis results into points for using independent variables. In the first phase, we collect enlisted men's SNS data by commender's ID. After gathering unstructured SNS data, the topic analysis phase extracts issues from them. For simplicity, 5 topics(vacation, friends, stress, training, and sports) are extracted from 20,000 articles. In the third phase, using these 5 topics, we quantify them as personal points. After quantifying their topic, we include these results in independent variables which are composed of 15 internal data sets. Then, we make two decision trees. The first tree is composed of their internal data only. The second tree is composed of their external data(SNS) as well as their internal data. After that, we compare the results of misclassification from SAS E-miner. The first model's misclassification is 12.1%. On the other hand, second model's misclassification is 7.8%. This method predicts accidents with an accuracy of approximately 92%. The gap of the two models is 4.3%. Finally, we test if the difference between them is meaningful or not, using the McNemar test. The result of test is considered relevant.(p-value : 0.0003) This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of enlisted men's data. Additionally, various independent variables used in the decision tree model are used as categorical variables instead of continuous variables. So it suffers a loss of information. In spite of extensive efforts to provide prediction models for the military, commanders' predictions are accurate only when they have sufficient data about their subordinates. Our proposed methodology can provide support to decision-making in the military. This study is expected to contribute to the prevention of accidents in the military based on scientific analysis of enlisted men and proper management of them.

Study on Developing Program for Efficient Landscape Woody Plants Management - Mainly Focused on the Development of a Tree Inventory System - (조경수목의 효율적 관리를 위한 프로그램 개발에 관한 연구 - 관리대장(Tree Inventory) 개발을 중심으로 -)

  • 조영환;곽행구
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.24 no.4
    • /
    • pp.1-22
    • /
    • 1997
  • This paper was focused on the efficient management of landscape woody plants, and concerned itself with their important role in the urban environment. Based on the philosophy that there is nothing that can be done without an inventory, the purpose of this study was to develop an inventory system and iris proper application to a site for establishing a management plan Two different approaches were used, The first was to make a newly structured inventory system through collecting, analyzing, and evaluating various types of inventories used in Korea, the U. S. A., and Japan. The second approach was to apply a newly designed inventory system to the case study area. using GIS 'as a tool of spacial analysis and statistics for making decisions. The results could be summarized as follows; 1. In Korea, most of the Landscape Woozy Plants Inventories had datas which represented possession of trees, and only the work which they had done according to their traditional ways, There was no data related to the conditions, management needs, and site conditions of individual trees, This is essential information for organizing an inventory system . 2. There needs to be data which is balanced, containing tree characteristics and site characteristics. Through such information the management needs could be adjusted properly. The inventory list described in this paper was determined by botanical identity, placement condition, condition of tree, and types of work for maintaining as well as improving the condition of each tree One of the most important things was to determine the location data of each tree so as to compare data with other trees. The data gained from the field survey still had some problems because of lack of scientific method for supporting objective views, and because of actual situations, especially in the field of evaluating site conditions and management needs. All data should be revised to fit a computer data management system , if possible 3. The GIS(Geographic Information System) application showed good performance in handling inventory data for decision making. All the data used for the GIS application was divided into location and non-spatial data. Using the location data, it was easy to find the exact location of each tree on the monitor and on the maps generated by the computer even in the actual managed trite, along with various attribute data. Therefore it could be said that the entire management plan should start from data of individual trees with their exact locations, for making concrete management goals through actual budget planning.

  • PDF

The Role of Data Technologies with Machine Learning Approaches in Makkah Religious Seasons

  • Waleed Al Shehri
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.26-32
    • /
    • 2023
  • Hajj is a fundamental pillar of Islam that all Muslims must perform at least once in their lives. However, Umrah can be performed several times yearly, depending on people's abilities. Every year, Muslims from all over the world travel to Saudi Arabia to perform Hajj. Hajj and Umrah pilgrims face multiple issues due to the large volume of people at the same time and place during the event. Therefore, a system is needed to facilitate the people's smooth execution of Hajj and Umrah procedures. Multiple devices are already installed in Makkah, but it would be better to suggest the data architectures with the help of machine learning approaches. The proposed system analyzes the services provided to the pilgrims regarding gender, location, and foreign pilgrims. The proposed system addressed the research problem of analyzing the Hajj pilgrim dataset most effectively. In addition, Visualizations of the proposed method showed the system's performance using data architectures. Machine learning algorithms classify whether male pilgrims are more significant than female pilgrims. Several algorithms were proposed to classify the data, including logistic regression, Naive Bayes, K-nearest neighbors, decision trees, random forests, and XGBoost. The decision tree accuracy value was 62.83%, whereas K-nearest Neighbors had 62.86%; other classifiers have lower accuracy than these. The open-source dataset was analyzed using different data architectures to store the data, and then machine learning approaches were used to classify the dataset.

A Method of BDD Restructuring for Efficient MCS Extraction in BDD Converted from Fault Tree and A New Approximate Probability Formula (고장수목으로부터 변환된 BDD에서 효율적인 MCS 추출을 위한 BDD 재구성 방법과 새로운 근사확률 공식)

  • Cho, Byeong Ho;Hyun, Wonki;Yi, Woojune;Kim, Sang Ahm
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.6
    • /
    • pp.711-718
    • /
    • 2019
  • BDD is a well-known alternative to the conventional Boolean logic method in fault tree analysis. As the size of fault tree increases, the calculation time and computer resources for BDD dramatically increase. A new failure path search and path restructure method is proposed for efficient calculation of CS and MCS from BDD. Failure path grouping and bottom-up path search is proved to be efficient in failure path search in BDD and path restructure is also proved to be used in order to reduce the number of CS comparisons for MCS extraction. With these newly proposed methods, the top event probability can be calculated using the probability by ASDMP(Approximate Sum of Disjoint MCS Products), which is shown to be equivalent to the result by the conventional MCUB(Minimal Cut Upper Bound) probability.