• 제목/요약/키워드: Decision Tree analysis

검색결과 723건 처리시간 0.027초

Development and application of a floor failure depth prediction system based on the WEKA platform

  • Lu, Yao;Bai, Liyang;Chen, Juntao;Tong, Weixin;Jiang, Zhe
    • Geomechanics and Engineering
    • /
    • 제23권1호
    • /
    • pp.51-59
    • /
    • 2020
  • In this paper, the WEKA platform was used to mine and analyze measured data of floor failure depth and a prediction system of floor failure depth was developed with Java. Based on the standardization and discretization of 35-set measured data of floor failure depth in China, the grey correlation degree analysis on five factors affecting the floor failure depth was carried out. The correlation order from big to small is: mining depth, working face length, floor failure resistance, mining thickness, dip angle of coal seams. Naive Bayes model, neural network model and decision tree model were used for learning and training, and the accuracy of the confusion matrix, detailed accuracy and node error rate were analyzed. Finally, artificial neural network was concluded to be the optimal model. Based on Java language, a prediction system of floor failure depth was developed. With the easy operation in the system, the prediction from measured data and error analyses were performed for nine sets of data. The results show that the WEKA prediction formula has the smallest relative error and the best prediction effect. Besides, the applicability of WEKA prediction formula was analyzed. The results show that WEKA prediction has a better applicability under the coal seam mining depth of 110 m~550 m, dip angle of coal seams of 0°~15° and working face length of 30 m~135 m.

The Development of Korean Rehabilitation Patient Group Version 1.0 (한국형 재활환자분류체계 버전 1.0 개발)

  • Hwang, Soojin;Kim, Aeryun;Moon, Sunhye;Kim, Jihee;Kim, Jinhwi;Ha, Younghea;Yang, Okyoung
    • Health Policy and Management
    • /
    • 제26권4호
    • /
    • pp.289-304
    • /
    • 2016
  • Background: Rehabilitations in subacute phase are different from acute treatments regarding the characteristics and required resource consumption of the treatments. Lack of accuracy and validity of the Korean Diagnosis Related Group and Korean Out-Patient Group for the acute patients as the case-mix and payment tool for rehabilitation inpatients have been problematic issues. The objective of the study was to develop the Korean Rehabilitation Patient Group (KRPG) reflecting the characteristics of rehabilitation inpatients. Methods: As a retrospective medical record survey regarding rehabilitation inpatients, 4,207 episodes were collected through 42 hospitals. Considering the opinions of clinical experts and the decision-tree analysis, the variables for the KRPG system demonstrating the characteristics of rehabilitation inpatients were derived, and the splitting standards of the relevant variables were also set. Using the derived variables, we have drawn the rehabilitation inpatient classification model reflecting the clinical situation of Korea. The performance evaluation was conducted on the KRPG system. Results: The KRPG was targeted at the inpatients with brain or spinal cord injury. The etiologic disease, functional status (cognitive function, activity of daily living, muscle strength, spasticity, level and grade of spinal cord injury), and the patient's age were the variables in the rehabilitation patients. The algorithm of KRPG system after applying the derived variables and total 204 rehabilitation patient groups were developed. The KRPG explained 11.8% of variance in charge for rehabilitation inpatients. It also explained 13.8% of variance in length of stay for them. Conclusion: The KRPG version 1.0 reflecting the clinical characteristics of rehabilitation inpatients was classified as 204 groups.

A Development of PM10 Forecasting System (미세먼지 예보시스템 개발)

  • Koo, Youn-Seo;Yun, Hui-Young;Kwon, Hee-Yong;Yu, Suk-Hyun
    • Journal of Korean Society for Atmospheric Environment
    • /
    • 제26권6호
    • /
    • pp.666-682
    • /
    • 2010
  • The forecasting system for Today's and Tomorrow's PM10 was developed based on the statistical model and the forecasting was performed at 9 AM to predict Today's 24 hour average PM10 concentration and at 5 PM to predict Tomorrow's 24 hour average PM10. The Today's forecasting model was operated based on measured air quality and meteorological data while Tomorrow's model was run by monitored data as well as the meteorological data calculated from the weather forecasting model such as MM5 (Mesoscale Meteorological Model version 5). The observed air quality data at ambient air quality monitoring stations as well as measured and forecasted meteorological data were reviewed to find the relationship with target PM10 concentrations by the regression analysis. The PM concentration, wind speed, precipitation rate, mixing height and dew-point deficit temperature were major variables to determine the level of PM10 and the wind direction at 500 hpa height was also a good indicator to identify the influence of long-range transport from other countries. The neural network, regression model, and decision tree method were used as the forecasting models to predict the class of a comprehensive air quality index and the final forecasting index was determined by the most frequent index among the three model's predicted indexes. The accuracy, false alarm rate, and probability of detection in Tomorrow's model were 72.4%, 0.0%, and 42.9% while those in Today's model were 80.8%, 12.5%, and 77.8%, respectively. The statistical model had the limitation to predict the rapid changing PM10 concentration by long-range transport from the outside of Korea and in this case the chemical transport model would be an alternative method.

Development of Hypertension Predictive Model (고혈압 발생 예측 모형 개발)

  • Yong, Wang-Sik;Park, Il-Su;Kang, Sung-Hong;Kim, Won-Joong;Kim, Kong-Hyun;Kim, Kwang-Kee;Park, No-Yai
    • Korean Journal of Health Education and Promotion
    • /
    • 제23권4호
    • /
    • pp.13-28
    • /
    • 2006
  • Objectives: This study used the characteristics of the knowledge discovery and data mining algorithms to develop hypertension predictive model for hypertension management using the Korea National Health Insurance Corporation database(the insureds' screening and health care benefit data). Methods: This study validated the predictive power of data mining algorithms by comparing the performance of logistic regression, decision tree, and ensemble technique. On the basis of internal and external validation, it was found that the model performance of logistic regression method was the best among the above three techniques. Results: Major results of logistic regression analysis suggested that the probability of hypertension was: - lower for the female(compared with the male)(OR=0.834) - higher for the persons whose ages were 60 or above(compared with below 40)(OR=4.628) - higher for obese persons(compared with normal persons)(OR= 2.103) - higher for the persons with high level of glucose(compared with normal persons)(OR=1.086) - higher for the persons who had family history of hypertension(compared with the persons who had not)(OR=1.512) - higher for the persons who periodically drank alcohol(compared with the persons who did not)$(OR=1.037{\sim}1.291)$ Conclusions: This study produced several factors affecting the outbreak of hypertension using screening. It is considered to be a contributing factor towards the nation's building of a Hypertension Management System in the near future by bringing forth representative results on the rise and care of hypertension.

Objective Classification of Fog Type and Analysis of Fog Characteristics Using Visibility Meter and Satellite Observation Data over South Korea (시정계와 위성 관측 자료를 활용한 남한 안개의 객관적인 유형 분류와 특성 분석)

  • Lee, Hyun-Kyoung;Suh, Myoung-Seok
    • Atmosphere
    • /
    • 제29권5호
    • /
    • pp.639-658
    • /
    • 2019
  • The classification of fog type and the characteristics of fog based on fog events over South Korea were investigated using a 3-year (2015~2017) visibility meter data. One-minute visibility meter data were used to identify fog with present weather codes and surface observation data. The concept of fog events was adopted for the better definition of fog properties and more objective classification through the detailed investigation of life cycle of fog. Decision tree method was used to classify the fog types and the final fog types were radiation fog, advection fog, precipitation fog, cloud base lowering fog and morning evaporation fog. We enhanced objectivity in classifying the types of fog by adding the satellite and the buoy observations to the conventional usage of AWS and ceilometer data. Radiation fog, the most common type in South Korea, frequently occurs in inland during autumn. A considerable number of advection fogs occur in island area in summer, especially in July. Precipitation fog accounts for more than a quarter of the total fog events and frequently occurs in islands and coastal areas. Cloud base lowering fog, classified using ceilometer, occurs occasionally for all areas but the occurrence rate is relatively high in east and west coastal area. Morning evaporation fog type is rarely observed in inland. The occurrence rate of thick fog with visibility less than 100 meters is amount to 21% of total fog events. Although advection fog develops into thick fog frequently, radiation fog shows the minimum visibility, in some cases.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • 제18D권5호
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Development of the Computer-Assisted HACCP System Program and Developing HACCP-Based Evaluation Tools of Sanitation for Institutional Foodservice Operations (단체급식의 HACCP 전산프로그램 및 위생관리 평가도구 개발)

  • 이정숙;홍희정;곽동경
    • Korean Journal of Community Nutrition
    • /
    • 제3권4호
    • /
    • pp.655-667
    • /
    • 1998
  • The Computer-assisted Hazard Analysis and Critical Control Point(HACCP) program has been developed for a systematic implementation of HACCP principles in identifying, assessing and controlling hazards in institutional foodservics operations. The HACCP-based sanitation evaluation tool has been developed, based on the results of the computerized assisted HACCP program in 4 service sites of C contracted foodservice company, including 2 general hospitals with 650-beds, one office operation of 400 meals per day, and one factory foodservice of 1,000 meals per day. All database files and processing programs were created by using Unify Vision tool with Windows 95 of user environments. The results of this study can be summarized as follows : 1. This program consists of the pre-stage for HACCP study and the implementation stage of the HACCP system. 1) The pre-stage for HACCP study includes the selection of menu items, the development of the HACCP recipe, the construction of product flow diagrams, and printing the HACCP recipes and product flow diagrams. 2) The implementation of the HACCP system includes the identification of microbiological hazards, the determination of critical control points based on the decision tree base files. 3) The HACCP-based sanitation evaluation tool consisted of 3 dimensions of time-temperature relationship, personal hygiene, and equipment-facility sanitation. The Cronbach's alphas calculation indicated that the tool was reliable. The results showed that the focus groups rated the mean of importance in time-temperature relationship, personal hygiene, and equipment-facility sanitation as 4.57, 4.59 and 4.55 respectively. Based on the results, this HACCP-based sanitation evaluation tool was considered as an effective tool for assuring product quality. This program will assist foodservice managers to encourage a standardized approach in the HACCP study and to maintain a systematic approach for ensuring that the HACCP principles are applied correctly.

  • PDF

Discriminating Eggs from Two Local Breeds Based on Fatty Acid Profile and Flavor Characteristics Combined with Classification Algorithms

  • Dong, Xiao-Guang;Gao, Li-Bing;Zhang, Hai-Jun;Wang, Jing;Qiu, Kai;Qi, Guang-Hai;Wu, Shu-Geng
    • Food Science of Animal Resources
    • /
    • 제41권6호
    • /
    • pp.936-949
    • /
    • 2021
  • This study discriminated fatty acid profile and flavor characteristics of Beijing You Chicken (BYC) as a precious local breed and Dwarf Beijing You Chicken (DBYC) eggs. Fatty acid profile and flavor characteristics were analyzed to identify differences between BYC and DBYC eggs. Four classification algorithms were used to build classification models. Arachidic acid, oleic acid (OA), eicosatrienoic acid, docosapentaenoic acid (DPA), hexadecenoic acid, monounsaturated fatty acids (MUFA), polyunsaturated fatty acids (PUFA), unsaturated fatty acids (UFA) and 35 volatile compounds had significant differences in fatty acids and volatile compounds by gas chromatography-mass spectrometry (GC-MS) (p<0.05). For fatty acid data, k-nearest neighbor (KNN) and support vector machine (SVM) got 91.7% classification accuracy. SPME-GC-MS data failed in classification models. For electronic nose data, classification accuracy of KNN, linear discriminant analysis (LDA), SVM and decision tree was all 100%. The overall results indicated that BYC and DBYC eggs could be discriminated based on electronic nose with suitable classification algorithms. This research compared the differentiation of the fatty acid profile and volatile compounds of various egg yolks. The results could be applied to evaluate egg nutrition and distinguish avian eggs.

Forecasting of the COVID-19 pandemic situation of Korea

  • Goo, Taewan;Apio, Catherine;Heo, Gyujin;Lee, Doeun;Lee, Jong Hyeok;Lim, Jisun;Han, Kyulhee;Park, Taesung
    • Genomics & Informatics
    • /
    • 제19권1호
    • /
    • pp.11.1-11.8
    • /
    • 2021
  • For the novel coronavirus disease 2019 (COVID-19), predictive modeling, in the literature, uses broadly susceptible exposed infected recoverd (SEIR)/SIR, agent-based, curve-fitting models. Governments and legislative bodies rely on insights from prediction models to suggest new policies and to assess the effectiveness of enforced policies. Therefore, access to accurate outbreak prediction models is essential to obtain insights into the likely spread and consequences of infectious diseases. The objective of this study is to predict the future COVID-19 situation of Korea. Here, we employed 5 models for this analysis; SEIR, local linear regression (LLR), negative binomial (NB) regression, segment Poisson, deep-learning based long short-term memory models (LSTM) and tree based gradient boosting machine (GBM). After prediction, model performance comparison was evelauated using relative mean squared errors (RMSE) for two sets of train (January 20, 2020-December 31, 2020 and January 20, 2020-January 31, 2021) and testing data (January 1, 2021-February 28, 2021 and February 1, 2021-February 28, 2021) . Except for segmented Poisson model, the other models predicted a decline in the daily confirmed cases in the country for the coming future. RMSE values' comparison showed that LLR, GBM, SEIR, NB, and LSTM respectively, performed well in the forecasting of the pandemic situation of the country. A good understanding of the epidemic dynamics would greatly enhance the control and prevention of COVID-19 and other infectious diseases. Therefore, with increasing daily confirmed cases since this year, these results could help in the pandemic response by informing decisions about planning, resource allocation, and decision concerning social distancing policies.

CNN based Raman Spectroscopy Algorithm That is Robust to Noise and Spectral Shift (잡음과 스펙트럼 이동에 강인한 CNN 기반 라만 분광 알고리즘)

  • Park, Jae-Hyeon;Yu, Hyeong-Geun;Lee, Chang Sik;Chang, Dong Eui;Park, Dong-Jo;Nam, Hyunwoo;Park, Byeong Hwang
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • 제24권3호
    • /
    • pp.264-271
    • /
    • 2021
  • Raman spectroscopy is an equipment that is widely used for classifying chemicals in chemical defense operations. However, the classification performance of Raman spectrum may deteriorate due to dark current noise, background noise, spectral shift by vibration of equipment, spectral shift by pressure change, etc. In this paper, we compare the classification accuracy of various machine learning algorithms including k-nearest neighbor, decision tree, linear discriminant analysis, linear support vector machine, nonlinear support vector machine, and convolutional neural network under noisy and spectral shifted conditions. Experimental results show that convolutional neural network maintains a high classification accuracy of over 95 % despite noise and spectral shift. This implies that convolutional neural network can be an ideal classification algorithm in a real combat situation where there is a lot of noise and spectral shift.