• Title/Summary/Keyword: Classification and regression tree

Search Result 211, Processing Time 0.025 seconds

Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data (시계열 자료의 데이터마이닝을 위한 패턴분류 모델설계 및 성능비교)

  • Lee, Soo-Yong;Lee, Kyoung-Joung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.730-736
    • /
    • 2011
  • In this paper, we designed the models for pattern classification which can reflect the latest trend in time series. It has been shown that fusion models based on statistical and AI methods are superior to traditional ones for the pattern classification model supporting decision making. Especially, the hit rates of pattern classification models combined with fuzzy theory are relatively increased. The statistical SVM models combined with fuzzy membership function, or the models combining neural network and FCM has shown good performance. BPN, PNN, FNN, FCM, SVM, FSVM, Decision Tree, Time Series Analysis, and Regression Analysis were used for pattern classification models in the experiments of this paper. The economical indices DB with time series properties of the financial market(Korea, KOSPI200 DB) and the electrocardiogram DB of arrhythmia patients in hospital emergencies(USA, MIT-BIH DB) were used for data base.

Comparisons of the Accuracy of Classification Methods in Sasang Constitution Diagnosis with Pulse Waves (맥파를 이용한 사상체질의 진단에 있어서 분류방법에 따른 진단의 정확도 비교)

  • Shin, Sang-Hoon;Kim, Jong-Yeol
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.249-257
    • /
    • 2009
  • The purpose of this study is to find a classification method with high accuracy in regard with sasang constitutional diagnosis. The BMI, blood pressure, pulse wave, and Sasang constitution diagnosed by a specialist was collected from 2848 subjects who were apparently healthy. Through a selective procedure, the data of 1635 subjects was used in the analysis. The results with the classification methods such as the discriminant analysis, regression, decision tree and neural network were compared with the diagnosis of a Sasang constitutional specialist. In result, the discriminant analysis method was hard to qualify the assumption of the equality of covariance matrices within constitutional groups. Moreover, without BMI, the decision tree and neural network methods were very sensitive to the change of the analysis data. Therefore, the Logistic regression and the decision tree is recommended on condition that the decisive factors of constitution are well concerned.

Assessing the Impact of Pedestrian Traffic Volumes on Locational Goodwill (보행자통행량이 상가권리금에 미치는 영향의 평가)

  • Jeong, Seung-Young
    • Journal of Cadastre & Land InformatiX
    • /
    • v.45 no.1
    • /
    • pp.225-240
    • /
    • 2015
  • The effect of passing pedestrians'characteristics on locational goodwill was empirically modeled and tested. The theoretical basis for the study was central place theory, bid rent and, agglomeration theory, and demand externality theory. The data included information on goodwill, retail rents and passing pedestrians' characteristics in 100 retail trade areas in Seoul. The empirical model was tested with the sample of 1,307 retail units in Seoul, South Korea. The data set was analyzed with the Classification and Regression Tree software. As the results, using the regression tree method, the variables does affect locational goodwill in the each retail trade area were the volume of pedestrians around 2:00 pm on weekdays, volume of pedestrians around 4:00 pm on weekdays, and volume of pedestrians around 8:00 pm on weekdays. In summary, not only the economic base in the retail trade area but also the volume of passing pedestrians should be considered to determine the locational goodwill.

Measurement and Modeling of Job Stress of Electric Overhead Traveling Crane Operators

  • Krishna, Obilisetty B.;Maiti, Jhareswar;Ray, Pradip K.;Samanta, Biswajit;Mandal, Saptarshi;Sarkar, Sobhan
    • Safety and Health at Work
    • /
    • v.6 no.4
    • /
    • pp.279-288
    • /
    • 2015
  • Background: In this study, the measurement of job stress of electric overhead traveling crane operators and quantification of the effects of operator and workplace characteristics on job stress were assessed. Methods: Job stress was measured on five subscales: employee empowerment, role overload, role ambiguity, rule violation, and job hazard. The characteristics of the operators that were studied were age, experience, body weight, and body height. The workplace characteristics considered were hours of exposure, cabin type, cabin feature, and crane height. The proposed methodology included administration of a questionnaire survey to 76 electric overhead traveling crane operators followed by analysis using analysis of variance and a classification and regression tree. Results: The key findings were: (1) the five subscales can be used to measure job stress; (2) employee empowerment was the most significant factor followed by the role overload; (3) workplace characteristics contributed more towards job stress than operator's characteristics; and (4) of the workplace characteristics, crane height was the major contributor. Conclusion: The issues related to crane height and cabin feature can be fixed by providing engineering or foolproof solutions than relying on interventions related to the demographic factors.

A Prediction Model of Timely Processing on Medical Service using Classification and Regression Tree (분류회귀나무를 이용한 의료서비스 적기처리 예측모형)

  • Lee, Jong-Chan;Jeong, Seung-Woo;Lee, Won-Young
    • Journal of IKEEE
    • /
    • v.20 no.1
    • /
    • pp.16-25
    • /
    • 2016
  • Turnaround time (called, TAT) for imaging test, which is necessary for making a medical diagnosis, is directly related to the patient's waiting time and it is one of the important performance criteria for medical services. In this paper, we measured the TAT from major imaging tests to see it met the reference point set by the medical institutions. Prediction results from the algorithm of classification regression tree (called, CART) showed "clinics", "diagnosis", "modality", "test month" were identified as main factors for timely processing. This study had a contribution in providing means of prevention of the delay on medical services in advance.

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.12
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

  • Kim, Steven H.;Shin, Sung-Woo
    • Journal of Information Technology Application
    • /
    • v.1
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

A GA-based Binary Classification Method for Bankruptcy Prediction (도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발)

  • Min, Jae-H.;Jeong, Chul-Woo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.33 no.2
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter (HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상)

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.66
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF