• 제목/요약/키워드: Classification and Regression tree

검색결과 209건 처리시간 0.022초

시계열 자료의 데이터마이닝을 위한 패턴분류 모델설계 및 성능비교 (Pattern Classification Model Design and Performance Comparison for Data Mining of Time Series Data)

  • 이수용;이경중
    • 한국지능시스템학회논문지
    • /
    • 제21권6호
    • /
    • pp.730-736
    • /
    • 2011
  • 본 연구는 순차적인 시계열 자료들에서 가장 최근의 추세가 반영될 수 있는 패턴분류 모델을 설계하였다. 의사결정을 지원하는 데이터마이닝 패턴분류 모델을 설계할 때 통계 기법과 인공지능 기법을 융합한 모델들이 기존의 모델보다 우수함을 입증하였다. 특히 퍼지이론과 융합된 패턴분류 모델들의 적중률이 상대적으로 더 향상되었다. 예를 들어, 통계적 이론을 기반으로 한 SVM모델과 퍼지소속함수와의 결합, 혹은 신경망과 FCM을 결합한 모델들의 성능이 우수하였다. 실험에서 사용한 패턴분류 모델들은 BPN, PNN, FNN, FCM, SVM, FSVM, Decision Tree, Time Series Analysis, Regression Analysis 등이다. 그리고 데이터베이스는 시계열 속성을 지닌 금융시장의 경제지표 DB(한국, KOSPI200 데이터베이스)와 병원 응급실의 부정맥환자에 대한 심전도 DB(미국 MIT-BIH 데이터베이스)들을 사용하였다.

맥파를 이용한 사상체질의 진단에 있어서 분류방법에 따른 진단의 정확도 비교 (Comparisons of the Accuracy of Classification Methods in Sasang Constitution Diagnosis with Pulse Waves)

  • 신상훈;김종열
    • 한국콘텐츠학회논문지
    • /
    • 제9권10호
    • /
    • pp.249-257
    • /
    • 2009
  • 사상의학은 체질에 따라 치료하는 방법을 달리하므로, 체질진단의 객관화가 절실히 요구되고 있다. 본 연구는 맥파를 이용하여 사상체질을 객관적으로 진단함에 있어서, 정확도가 높으면서 실용적인 체질분류 방법을 탐색하는 것이 목적이다. 한방병원에 건강검진을 목적으로 내원한 2848명의 피험자를 대상으로 전문의가 진단한 체질, 체질량지수, 혈압, 맥파 자료를 입수하였다. 자료의 선별과정을 통하여 최종적으로 1635명의 자료를 분석에 사용하였다. 판별분석, 회귀분석, 의사결정나무, 신경망으로 체질을 예측하고 전문의가 진단한 결과와 비교하여 분류방법의 정확도를 비교하였다. 판별분석은 체질별로 공분산 행렬이 동일해야 한다는 가정을 만족시키기 어려웠으며, 체질량지수를 고려하지 않은 의사결정나무와 신경망 분석의 결과는 분석표본의 변동에 민감했다. 체질분류에 결정적인 영향을 미치는 변수인 체질량지수가 고려된 로지스틱 회귀분석 또는 의사결정나무 방법이 체질분류 방법으로 추천할 만하다.

보행자통행량이 상가권리금에 미치는 영향의 평가 (Assessing the Impact of Pedestrian Traffic Volumes on Locational Goodwill)

  • 정승영
    • 지적과 국토정보
    • /
    • 제45권1호
    • /
    • pp.225-240
    • /
    • 2015
  • 통행하는 보행자의 특징이 상가권리금에 미치는 효과를 실증적으로 모형을 구축하고 시험 하였다. 이 연구를 위한 이론적 근거는 중심지 이론, 입찰지대, 집적이론, 외부수요 이론이다. 자료에는 서울시 100개 $3.3m^2$당 상권의 상가권리금, $3.3m^2$당 상가보증금, $3.3m^2$당 상가월세 그리고 보행자의 특성과 관련된 정보가 포함되어 있다. 실증분석은 서울시 상가 1,307개를 표본을 사용하여 시험하였다. 그리고 자료집합은 회귀나무 및 회귀방법을 이용하여 분석이 이루어졌다. 이 분석한 결과로서, 각각의 상권에서 상가권리금에 영향을 주는 변수는 평일 오후 2시 보행자의 통행량, 평일 오후 4시 보행자의 통행량, 평일 오후 8시 보행자의 통행량이다. 요약하면, 상권의 경제적 기반을 뿐만 아니라 통행 보행자의 특징은 상가권리금을 결정하기 위해 고려되어야한다.

Measurement and Modeling of Job Stress of Electric Overhead Traveling Crane Operators

  • Krishna, Obilisetty B.;Maiti, Jhareswar;Ray, Pradip K.;Samanta, Biswajit;Mandal, Saptarshi;Sarkar, Sobhan
    • Safety and Health at Work
    • /
    • 제6권4호
    • /
    • pp.279-288
    • /
    • 2015
  • Background: In this study, the measurement of job stress of electric overhead traveling crane operators and quantification of the effects of operator and workplace characteristics on job stress were assessed. Methods: Job stress was measured on five subscales: employee empowerment, role overload, role ambiguity, rule violation, and job hazard. The characteristics of the operators that were studied were age, experience, body weight, and body height. The workplace characteristics considered were hours of exposure, cabin type, cabin feature, and crane height. The proposed methodology included administration of a questionnaire survey to 76 electric overhead traveling crane operators followed by analysis using analysis of variance and a classification and regression tree. Results: The key findings were: (1) the five subscales can be used to measure job stress; (2) employee empowerment was the most significant factor followed by the role overload; (3) workplace characteristics contributed more towards job stress than operator's characteristics; and (4) of the workplace characteristics, crane height was the major contributor. Conclusion: The issues related to crane height and cabin feature can be fixed by providing engineering or foolproof solutions than relying on interventions related to the demographic factors.

분류회귀나무를 이용한 의료서비스 적기처리 예측모형 (A Prediction Model of Timely Processing on Medical Service using Classification and Regression Tree)

  • 이종찬;정승우;이원영
    • 전기전자학회논문지
    • /
    • 제20권1호
    • /
    • pp.16-25
    • /
    • 2016
  • 의학적 진단을 내리기 위해 시행되는 검사의 소요시간(turnaround time, TAT)은 환자대기시간과 직결되며 중요한 의료서비스 평가항목 중 하나이다. 본 연구에서는 주요 영상의학검사를 대상으로 TAT를 측정하고, 그 결과가 의료기관이 설정한 기준치를 달성하는지 여부를 분석하였다. 분류회귀나무 알고리즘을 이용한 예측 결과, "진료과", "상병", "검사종류", "실시월"이 적기처리 달성에 가장 큰 영향을 주는 요인으로 확인되었다. 본 연구는 의료서비스의 적기처리를 예측하는 모형을 통하여 의료서비스 지연을 사전에 조치할 수 있는 수단을 제공하였다는 데에 큰 의미가 있다.

Machine Learning Based Automatic Categorization Model for Text Lines in Invoice Documents

  • Shin, Hyun-Kyung
    • 한국멀티미디어학회논문지
    • /
    • 제13권12호
    • /
    • pp.1786-1797
    • /
    • 2010
  • Automatic understanding of contents in document image is a very hard problem due to involvement with mathematically challenging problems originated mainly from the over-determined system induced by document segmentation process. In both academic and industrial areas, there have been incessant and various efforts to improve core parts of content retrieval technologies by the means of separating out segmentation related issues using semi-structured document, e.g., invoice,. In this paper we proposed classification models for text lines on invoice document in which text lines were clustered into the five categories in accordance with their contents: purchase order header, invoice header, summary header, surcharge header, purchase items. Our investigation was concentrated on the performance of machine learning based models in aspect of linear-discriminant-analysis (LDA) and non-LDA (logic based). In the group of LDA, na$\"{\i}$ve baysian, k-nearest neighbor, and SVM were used, in the group of non LDA, decision tree, random forest, and boost were used. We described the details of feature vector construction and the selection processes of the model and the parameter including training and validation. We also presented the experimental results of comparison on training/classification error levels for the models employed.

개선된 데이터마이닝을 위한 혼합 학습구조의 제시 (Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management)

  • Kim, Steven H.;Shin, Sung-Woo
    • 정보기술응용연구
    • /
    • 제1권
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Analysis of the Timing of Spoken Korean Using a Classification and Regression Tree (CART) Model

  • Chung, Hyun-Song;Huckvale, Mark
    • 음성과학
    • /
    • 제8권1호
    • /
    • pp.77-91
    • /
    • 2001
  • This paper investigates the timing of Korean spoken in a news-reading speech style in order to improve the naturalness of durations used in Korean speech synthesis. Each segment in a corpus of 671 read sentences was annotated with 69 segmental and prosodic features so that the measured duration could be correlated with the context in which it occurred. A CART model based on the features showed a correlation coefficient of 0.79 with an RMSE (root mean squared prediction error) of 23 ms between actual and predicted durations in reserved test data. These results are comparable with recent published results in Korean and similar to results found in other languages. An analysis of the classification tree shows that phrasal structure has the greatest effect on the segment duration, followed by syllable structure and the manner features of surrounding segments. The place features of surrounding segments only have small effects. The model has application in Korean speech synthesis systems.

  • PDF

도산예측을 위한 유전 알고리듬 기반 이진분류기법의 개발 (A GA-based Binary Classification Method for Bankruptcy Prediction)

  • 민재형;정철우
    • 한국경영과학회지
    • /
    • 제33권2호
    • /
    • pp.1-16
    • /
    • 2008
  • The purpose of this paper is to propose a new binary classification method for predicting corporate failure based on genetic algorithm, and to validate its prediction power through empirical analysis. Establishing virtual companies representing bankrupt companies and non-bankrupt ones respectively, the proposed method measures the similarity between the virtual companies and the subject for prediction, and classifies the subject into either bankrupt or non-bankrupt one. The values of the classification variables of the virtual companies and the weights of the variables are determined by the proper model to maximize the hit ratio of training data set using genetic algorithm. In order to test the validity of the proposed method, we compare its prediction accuracy with ones of other existing methods such as multi-discriminant analysis, logistic regression, decision tree, and artificial neural network, and it is shown that the binary classification method we propose in this paper can serve as a premising alternative to the existing methods for bankruptcy prediction.

HOS 특징 벡터를 이용한 장애 음성 분류 성능의 향상 (Performance Improvement of Classification Between Pathological and Normal Voice Using HOS Parameter)

  • 이지연;정상배;최흥식;한민수
    • 대한음성학회지:말소리
    • /
    • 제66호
    • /
    • pp.61-72
    • /
    • 2008
  • This paper proposes a method to improve pathological and normal voice classification performance by combining multiple features such as auditory-based and higher-order features. Their performances are measured by Gaussian mixture models (GMMs) and linear discriminant analysis (LDA). The combination of multiple features proposed by the frame-based LDA method is shown to be an effective method for pathological and normal voice classification, with a 87.0% classification rate. This is a noticeable improvement of 17.72% compared to the MFCC-based GMM algorithm in terms of error reduction.

  • PDF