• Title/Summary/Keyword: 분류 회귀 나무

Search Result 72, Processing Time 0.032 seconds

Analysis of Important Indicators of TCB Using GBM (일반화가속모형을 이용한 기술신용평가 주요 지표 분석)

  • Jeon, Woo-Jeong(Michael);Seo, Young-Wook
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.159-173
    • /
    • 2017
  • In order to provide technical financial support to small and medium-sized venture companies based on technology, the government implemented the TCB evaluation, which is a kind of technology rating evaluation, from the Kibo and a qualified private TCB. In this paper, we briefly review the current state of TCB evaluation and available indicators related to technology evaluation accumulated in the Korea Credit Information Services (TDB), and then use indicators that have a significant effect on the technology rating score. Multiple regression techniques will be explored. And the relative importance and classification accuracy of the indicators were calculated by applying the key indicators as independent features applied to the generalized boosting model, which is a representative machine learning classifier, as the class influence and the fitness of each model. As a result of the analysis, it was analyzed that the relative importance between the two models was not significantly different. However, GBM model had more weight on the InnoBiz certification, R&D department, patent registration and venture confirmation indicators than regression model.

Metabolic Diseases Classification Models according to Food Consumption using Machine Learning (머신러닝을 활용한 식품소비에 따른 대사성 질환 분류 모델)

  • Hong, Jun Ho;Lee, Kyung Hee;Lee, Hye Rim;Cheong, Hwan Suk;Cho, Wan-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.354-360
    • /
    • 2022
  • Metabolic disease is a disease with a prevalence of 26% in Korean, and has three of the five states of abdominal obesity, hypertension, hunger glycemic disorder, high neutral fat, and low HDL cholesterol at the same time. This paper links the consumer panel data of the Rural Development Agency(RDA) and the medical care data of the National Health Insurance Service(NHIS) to generate a classification model that can be divided into a metabolic disease group and a control group through food consumption characteristics, and attempts to compare the differences. Many existing domestic and foreign studies related to metabolic diseases and food consumption characteristics are disease correlation studies of specific food groups and specific ingredients, and this paper is logistic considering all food groups included in the general diet. We created a classification model using regression, a decision tree-based classification model, and a classification model using XGBoost. Of the three models, the high-precision model is the XGBoost classification model, but the accuracy was not high at less than 0.7. As a future study, it is necessary to extend the observation period for food consumption in the patient group to more than 5 years and to study the metabolic disease classification model after converting the food consumed into nutritional characteristics.

지능형 IoT서비스를 위한 기계학습 기반 동작 인식 기술

  • Choe, Dae-Ung;Jo, Hyeon-Jung
    • The Proceeding of the Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.27 no.4
    • /
    • pp.19-28
    • /
    • 2016
  • 최근 RFID와 같은 무선 센싱 네트워크 기술과 객체 추적을 위한 센싱 디바이스 및 다양한 컴퓨팅 자원들이 빠르게 발전함에 따라, 기존 웹의 형태는 소셜 웹에서 유비쿼터스 컴퓨팅 웹으로 자연스럽게 진화되고 있다. 유비쿼터스 컴퓨팅 웹에서 사물인터넷(IoT)은 기존의 컴퓨터를 대체할 수 있는데, 이것은 곧 한 사람과 주변 사물들 간에 연결되는 네트워크가 확장되는 것과 동시에 네트워크 안에서 생성되는 데이터의 수가 기하급수적으로 증가되는 것을 의미한다. 따라서 보다 지능적인 IoT 서비스를 위해서는, 수많은 미가공 데이터들 사이에서 사람의 의도와 상황을 실시간으로 정확히 파악할 수 있어야 한다. 이때 사물과의 상호작용을 위한 동작 인식 기술(Gesture recognition)은 집적적인 접촉을 필요로 하지 않기 때문에, 미래의 사람-사물 간 상호작용에 응용될 수 있는 잠재력을 갖고 있다. 한편, 기계학습 분야의 최신 알고리즘들은 다양한 문제에서 사람의 인지능력을 종종 뛰어넘는 성능을 보이고 있는데, 그 중에서도 의사결정나무(Decision Tree)를 기반으로 한 Decision Forest는 분류(Classification)와 회귀(Regression)를 포함한 전 영역에 걸쳐 우월한 성능을 보이고 있다. 따라서 본 논문에서는 지능형 IoT 서비스를 위한 다양한 동작 인식 기술들을 알아보고, 동작 인식을 위한 Decision Forest의 기본 개념과 구현을 위한 학습, 테스팅에 대해 구체적으로 소개한다. 특히 대표적으로 사용되는 3가지 학습방법인 배깅(Bagging), 부스팅(Boosting) 그리고 Random Forest에 대해 소개하고, 이것들이 동작 인식을 위해 어떠한 특징을 갖는지 기존의 연구결과를 토대로 알아보았다.

Cloud Computing Adoption Decision-Making Modeling Using CART (CART 방법론을 사용한 클라우드 컴퓨팅 도입 의사 결정 모델링)

  • Baek, Seung Hyun;Chang, Byeong-Yun
    • Journal of the Korea Society for Simulation
    • /
    • v.23 no.4
    • /
    • pp.189-195
    • /
    • 2014
  • In this paper, we conducted a study on place-free and time-free cloud computing (CC) adoption decision-making model. Panel survey data which is collected from 65 people and CART (classification and regression tree) which is one of data mining approaches are used to construct decision-making model. In this modeling, there are 2 steps: In the first step, significant questions (variables) are selected. After that, the CART decision-making model is constructed using the selected variables. In the variable selection stage, the 25 questions are reduced to 5 ones. The benefits of question reduction are quick response from respondent and reducing model-construction time.

The Development of Models and the Characteristics for Subway Noise Using the Classification and Regression Trees (CART 분석을 이용한 지하철 소음모형 개발 및 특성 연구)

  • Kim, Tae-Ho;Lee, Jae-Myung;Won, Jai-Mu;Song, In-Suk
    • Journal of the Korean Society for Railway
    • /
    • v.10 no.5
    • /
    • pp.480-486
    • /
    • 2007
  • The subway is a necessary public transportation in big cities, which many citizens are using now. However, the demands for subway inner circumstance by citizens are growing recently. Among them, the noise problem is the hot issue to be solved. So, in this study we classified the characteristics of subway noise using the classification and regression trees (CART) based on noise level data in line No. 5 in Seoul. After that We developed the models for effect of subway noise and analyzed the characteristics through it. The result of this study is that we need to consider the type of geometry design and operational factors when the problem of subway noise improves, because the factors which weigh with subway noise are different by type of geometry and operational part.

Stock Price Direction Prediction Using Convolutional Neural Network: Emphasis on Correlation Feature Selection (합성곱 신경망을 이용한 주가방향 예측: 상관관계 속성선택 방법을 중심으로)

  • Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.21-39
    • /
    • 2020
  • Recently, deep learning has shown high performance in various applications such as pattern analysis and image classification. Especially known as a difficult task in the field of machine learning research, stock market forecasting is an area where the effectiveness of deep learning techniques is being verified by many researchers. This study proposed a deep learning Convolutional Neural Network (CNN) model to predict the direction of stock prices. We then used the feature selection method to improve the performance of the model. We compared the performance of machine learning classifiers against CNN. The classifiers used in this study are as follows: Logistic Regression, Decision Tree, Neural Network, Support Vector Machine, Adaboost, Bagging, and Random Forest. The results of this study confirmed that the CNN showed higher performancecompared with other classifiers in the case of feature selection. The results show that the CNN model effectively predicted the stock price direction by analyzing the embedded values of the financial data

Development of a Detection Model for the Companies Designated as Administrative Issue in KOSDAQ Market (KOSDAQ 시장의 관리종목 지정 탐지 모형 개발)

  • Shin, Dong-In;Kwahk, Kee-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.157-176
    • /
    • 2018
  • The purpose of this research is to develop a detection model for companies designated as administrative issue in KOSDAQ market using financial data. Administration issue designates the companies with high potential for delisting, which gives them time to overcome the reasons for the delisting under certain restrictions of the Korean stock market. It acts as an alarm to inform investors and market participants of which companies are likely to be delisted and warns them to make safe investments. Despite this importance, there are relatively few studies on administration issues prediction model in comparison with the lots of studies on bankruptcy prediction model. Therefore, this study develops and verifies the detection model of the companies designated as administrative issue using financial data of KOSDAQ companies. In this study, logistic regression and decision tree are proposed as the data mining models for detecting administrative issues. According to the results of the analysis, the logistic regression model predicted the companies designated as administrative issue using three variables - ROE(Earnings before tax), Cash flows/Shareholder's equity, and Asset turnover ratio, and its overall accuracy was 86% for the validation dataset. The decision tree (Classification and Regression Trees, CART) model applied the classification rules using Cash flows/Total assets and ROA(Net income), and the overall accuracy reached 87%. Implications of the financial indictors selected in our logistic regression and decision tree models are as follows. First, ROE(Earnings before tax) in the logistic detection model shows the profit and loss of the business segment that will continue without including the revenue and expenses of the discontinued business. Therefore, the weakening of the variable means that the competitiveness of the core business is weakened. If a large part of the profits is generated from one-off profit, it is very likely that the deterioration of business management is further intensified. As the ROE of a KOSDAQ company decreases significantly, it is highly likely that the company can be delisted. Second, cash flows to shareholder's equity represents that the firm's ability to generate cash flow under the condition that the financial condition of the subsidiary company is excluded. In other words, the weakening of the management capacity of the parent company, excluding the subsidiary's competence, can be a main reason for the increase of the possibility of administrative issue designation. Third, low asset turnover ratio means that current assets and non-current assets are ineffectively used by corporation, or that asset investment by corporation is excessive. If the asset turnover ratio of a KOSDAQ-listed company decreases, it is necessary to examine in detail corporate activities from various perspectives such as weakening sales or increasing or decreasing inventories of company. Cash flow / total assets, a variable selected by the decision tree detection model, is a key indicator of the company's cash condition and its ability to generate cash from operating activities. Cash flow indicates whether a firm can perform its main activities(maintaining its operating ability, repaying debts, paying dividends and making new investments) without relying on external financial resources. Therefore, if the index of the variable is negative(-), it indicates the possibility that a company has serious problems in business activities. If the cash flow from operating activities of a specific company is smaller than the net profit, it means that the net profit has not been cashed, indicating that there is a serious problem in managing the trade receivables and inventory assets of the company. Therefore, it can be understood that as the cash flows / total assets decrease, the probability of administrative issue designation and the probability of delisting are increased. In summary, the logistic regression-based detection model in this study was found to be affected by the company's financial activities including ROE(Earnings before tax). However, decision tree-based detection model predicts the designation based on the cash flows of the company.

A study on the development of severity-adjusted mortality prediction model for discharged patient with acute stroke using machine learning (머신러닝을 이용한 급성 뇌졸중 퇴원 환자의 중증도 보정 사망 예측 모형 개발에 관한 연구)

  • Baek, Seol-Kyung;Park, Jong-Ho;Kang, Sung-Hong;Park, Hye-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.126-136
    • /
    • 2018
  • The purpose of this study was to develop a severity-adjustment model for predicting mortality in acute stroke patients using machine learning. Using the Korean National Hospital Discharge In-depth Injury Survey from 2006 to 2015, the study population with disease code I60-I63 (KCD 7) were extracted for further analysis. Three tools were used for the severity-adjustment of comorbidity: the Charlson Comorbidity Index (CCI), the Elixhauser comorbidity index (ECI), and the Clinical Classification Software (CCS). The severity-adjustment models for mortality prediction in patients with acute stroke were developed using logistic regression, decision tree, neural network, and support vector machine methods. The most common comorbid disease in stroke patients were hypertension, uncomplicated (43.8%) in the ECI, and essential hypertension (43.9%) in the CCS. Among the CCI, ECI, and CCS, CCS had the highest AUC value. CCS was confirmed as the best severity correction tool. In addition, the AUC values for variables of CCS including main diagnosis, gender, age, hospitalization route, and existence of surgery were 0.808 for the logistic regression analysis, 0.785 for the decision tree, 0.809 for the neural network and 0.830 for the support vector machine. Therefore, the best predictive power was achieved by the support vector machine technique. The results of this study can be used in the establishment of health policy in the future.

A Study on Propriety of Pilot Aptitude Test Using Phased Analysis of Pilot Training (비행교육과정 단계별 분석을 통한 조종적성검사 항목 타당성 연구)

  • Kim, HeeYoung;Kim, SuHwan;Moon, HoSeok
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.218-225
    • /
    • 2016
  • It is important to select the personnel with ideal pilot aptitude considering dramatically advancing aircraft performance and complexity of military operations as a consequence to the highly developed science and technology. The opportunity cost lost from dropouts and human error being the first cause of aviation accidents are the realistic reasons for the significance of personnel selection based on their aptitude. This study analyses the ROKAF pilot aptitude test that was improved in 2004, using various classification models. This study discusses the significance of the selected variables along with the direction of ROKAF pilot aptitude test for its development in the future. The accuracy of the classification models was improved by taking into account differing personnel characteristics of individuals on the test.

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents (차대차 교통사고에 대한 상해 심각도 예측 연구)

  • Ko, Changwan;Kim, Hyeonmin;Jeong, Young-Seon;Kim, Jaehee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.13-29
    • /
    • 2020
  • Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.