• Title/Summary/Keyword: 정분류율

Search Result 28, Processing Time 0.022 seconds

A Study on Predicting Bankruptcy Discriminant Model for Small-Sized Venture Firms using Technology Evaluation Data (기술력평가 자료를 이용한 중소벤처기업 파산예측 판별모형에 관한 연구)

  • Sung Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.9 no.2
    • /
    • pp.304-324
    • /
    • 2006
  • There were considerable researches by finance people trying to find out business ratios as predictors of corporate bankruptcy. However, such financial ratios usually lack theoretical justification to predict bankruptcy for technology-oriented small sized venture firms. This study proposes a bankruptcy predictive discriminant model using technology evaluation data instead of financial data, evaluates the model fit by the correct classification rate, cross-validation method and M-P-P method. The results indicate that linear discriminant model was found to be more appropriate model than the logistic discriminant model and 69% of original grouped data were correctly classified while 67% of future data were expected to be classified correctly.

  • PDF

Improving the Effectiveness of Customer Classification Models: A Pre-segmentation Approach (사전 세분화를 통한 고객 분류모형의 효과성 제고에 관한 연구)

  • Chang, Nam-Sik
    • Information Systems Review
    • /
    • v.7 no.2
    • /
    • pp.23-40
    • /
    • 2005
  • Discovering customers' behavioral patterns from large data set and providing them with corresponding services or products are critical components in managing a current business. However, the diversity of customer needs coupled with the limited resources suggests that companies should make more efforts on understanding and managing specific groups of customers, not the whole customers. The key issue of this paper is based on the fact that the behavioral patterns extracted from the specific groups of customers shall be different from those from the whole customers. This paper proposes the idea of pre-segmentation before developing customer classification models. We collected three customers' demographic and transactional data sets from a credit card, a tele-communication, and an insurance company in Korea, and then segmented customers by major variables. Different churn prediction models were developed from each segments and the whole data set, respectively, using the decision tree induction approach, and compared in terms of the hit ratio and the simplicity of generated rules.

Comparison of data mining methods with daily lens data (데일리 렌즈 데이터를 사용한 데이터마이닝 기법 비교)

  • Seok, Kyungha;Lee, Taewoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1341-1348
    • /
    • 2013
  • To solve the classification problems, various data mining techniques have been applied to database marketing, credit scoring and market forecasting. In this paper, we compare various techniques such as bagging, boosting, LASSO, random forest and support vector machine with the daily lens transaction data. The classical techniques-decision tree, logistic regression-are used too. The experiment shows that the random forest has a little smaller misclassification rate and standard error than those of other methods. The performance of the SVM is good in the sense of misclassfication rate and bad in the sense of standard error. Taking the model interpretation and computing time into consideration, we conclude that the LASSO gives the best result.

Study on prediction for a film success using text mining (텍스트 마이닝을 활용한 영화흥행 예측 연구)

  • Lee, Sanghun;Cho, Jangsik;Kang, Changwan;Choi, Seungbae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1259-1269
    • /
    • 2015
  • Recently, big data is positioning as a keyword in the academic circles. And usefulness of big data is carried into government, a local public body and enterprise as well as academic circles. Also they are endeavoring to obtain useful information in big data. This research mainly deals with analyses of box office success or failure of films using text mining. For data, it used a portal site 'D' and film review data, grade point average and the number of screens gained from the Korean Film Commission. The purpose of this paper is to propose a model to predict whether a film is success or not using these data. As a result of analysis, the correct classification rate by the prediction model method proposed in this paper is obtained 95.74%.

Identification and classification of fresh lubricants and used engine oils by GC/MS and bayesian model (GC/MS 분석과 베이지안 분류 모형을 이용한 새 윤활유와 사용 엔진 오일의 동일성 추적과 분류)

  • Kim, Nam Yee;Nam, Geum Mun;Kim, Yuna;Lee, Dong-Kye;Park, Seh Youn;Lee, Kyoungjae;Lee, Jaeyong
    • Analytical Science and Technology
    • /
    • v.27 no.1
    • /
    • pp.41-59
    • /
    • 2014
  • The aims of this work were the identification and the classification of fresh lubricants and used engine oils of vehicles for the application in forensic science field-80 kinds of fresh lubricants were purchased and 86 kinds of used engine oils were sampled from 24 kinds of diesel and gasoline vehicles with different driving conditions. The sample of lubricants and used engine oils were analyzed by GC/MS. The Bayesian model technique was developed for classification or identification. Both the wavelet fitting and the principal component analysis (PCA) techniques as a data dimension reduction were applied. In fresh lubricants classification, the rates of matching by Bayesian model technique with wavelet fitting and PCA were 97.5% and 96.7%, respectively. The Bayesian model technique with wavelet fitting was better to classify lubricants than it with PCA based on dimension reduction. And we selected the Bayesian model technique with wavelet fitting for classification of lubricants. The other experiment was the analysis of used engine oils which were collected from vehicles with the several mileage up to 5,000 km after replacing engine oil. The eighty six kinds of used engine oil sample with the mileage were collected. In vehicle classification (total 24 classes), the rate of matching by Bayesian model with wavelet fitting was 86.4%. However, in the vehicle's fuel type classification (whether it is gasoline vehicle or diesel vehicle, only total 2 classes), the rate of matching was 99.6%. In the used engine oil brands classification (total 6 classes), the rate of matching was 97.3%.

A polychotomous regression model with tensor product splines and direct sums (연속형의 텐서곱과 범주형의 직합을 사용한 다항 로지스틱 회귀모형)

  • Sim, Songyong;Kang, Heemo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.19-26
    • /
    • 2014
  • In this paper, we propose a polychotomous regression model when independent variables include both categorical and numerical variables. For categorical independent variables, we use direct sums, and tensor product splines are used for continuous independent variables. We use BIC for varible selections criterior. We implemented the algorithm and apply the algorithm to real data. The use of direct sums and tensor products outperformed the usual multinomial logistic regression model.

A Study on the Optimal Discriminant Model Predicting the likelihood of Insolvency for Technology Financing (기술금융을 위한 부실 가능성 예측 최적 판별모형에 대한 연구)

  • Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.10 no.2
    • /
    • pp.183-205
    • /
    • 2007
  • An investigation was undertaken of the optimal discriminant model for predicting the likelihood of insolvency in advance for medium-sized firms based on the technology evaluation. The explanatory variables included in the discriminant model were selected by both factor analysis and discriminant analysis using stepwise selection method. Five explanatory variables were selected in factor analysis in terms of explanatory ratio and communality. Six explanatory variables were selected in stepwise discriminant analysis. The effectiveness of linear discriminant model and logistic discriminant model were assessed by the criteria of the critical probability and correct classification rate. Result showed that both model had similar correct classification rate and the linear discriminant model was preferred to the logistic discriminant model in terms of criteria of the critical probability In case of the linear discriminant model with critical probability of 0.5, the total-group correct classification rate was 70.4% and correct classification rates of insolvent and solvent groups were 73.4% and 69.5% respectively. Correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify the present sample. However, the actual correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify a future observation. Unfortunately, the correct classification rate underestimates the actual correct classification rate because the data set used to estimate the discriminant function is also used to evaluate them. The cross-validation method were used to estimate the bias of the correct classification rate. According to the results the estimated bias were 2.9% and the predicted actual correct classification rate was 67.5%. And a threshold value is set to establish an in-doubt category. Results of linear discriminant model can be applied for the technology financing banks to evaluate the possibility of insolvency and give the ranking of the firms applied.

  • PDF

Development of Advanced TB Case Classification Model Using NHI Claims Data (국민건강보험 청구자료 기반의 결핵환자 분류 고도화 모형 개발)

  • Park, Il-Su;Kim, Yoo-Mi;Choi, Youn-Hee;Kim, Sung-Soo;Kim, Eun-Ju;Won, Si-Yeon;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.11 no.9
    • /
    • pp.289-299
    • /
    • 2013
  • The aim of this study was to enhance the NHI claims data-based tuberculosis classification rule of KCDC(Korea centers for disease control & prevention) for an effective TB surveillance system. 8,118 cases, 10% samples of 81,199 TB cases from NHI claims data during 2009, were subject to the Medical Record Survey about whether they are real TB patients. The final study population was 7,132 cases whose medical records were surveyed. The decision tree model was evaluated as the most superior TB patients detection model. This model required the main independent variables of age, the number of anti-tuberculosis drugs, types of medical institution, tuberculosis tests, prescription days, types of TB. This model had sensitivity of 90.6%, PPV of 96.1%, and correct classification rate of 93.8%, which was better than KCDC's TB detection model with two or more NHI claims for TB and TB drugs(sensitivity of 82.6%, PPV of 95%, and correct classification rate of 80%).

Standard Criterion of VUS for ROC Surface (ROC 곡면에서 VUS의 판단기준)

  • Hong, C.S.;Jung, E.S.;Jung, D.G.
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.977-985
    • /
    • 2013
  • Many situations are classified into more than two categories in real world. In this work, we consider ROC surface and VUS, which are graphical representation methods for classification models with three categories. The standard criteria of AUC for the probability of default based on Basel II is extended to the VUS for ROC surface; therefore, the standardized criteria of VUS for the classification model is proposed. The ranges of AUC, K-S and mean difference statistics corresponding to VUS values for each class of the standard criteria are obtained. The standard criteria of VUS for ROC surface can be established by exploring the relationships of these statistics.

A Comparison Study for Ordination Methods in Ecology (생태학의 통계적 서열화 방법 비교에 관한 연구)

  • Ko, Hyeon-Seok;Jhun, Myoungshic;Jeong, Hyeong Chul
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.49-60
    • /
    • 2015
  • Various kinds of ordination methods such as correspondence analysis and canonical correspondence analysis are used in community ecology to visualize relationships among species, sites, and environmental variables. Ter Braak (1986), Jackson and Somers (1991), Parmer (1993), compared the ordination methods using eigenvalue and distance graph. However, these methods did not show the relationship between population and biplot because they are only based on surveyed data. In this paper, a method that measures the extent to show population information to biplot was introduced to compare ordination methods objectively.