• Title/Summary/Keyword: Statistical decision

Search Result 945, Processing Time 0.026 seconds

데이터마이닝을 위한 혼합 데이터베이스에서의 속성선택

  • Cha, Un-Ok;Heo, Mun-Yeol
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.103-108
    • /
    • 2003
  • 데이터마이닝을 위한 대용량 데이터베이스를 축소시키는 방법 중에 속성선택 방법이 많이 사용되고 있다. 본 논문에서는 세 가지 속성선택 방법을 사용하여 조건속성 수를 60%이상 축소시켜 결정나무와 로지스틱 회귀모형에 적용시켜보고 이들의 효율을 비교해 본다. 세 가지 속성선택 방법은 MDI, 정보획득, ReliefF 방법이다. 결정나무 방법은 QUEST, CART, C4.5를 사용하였다. 속성선택 방법들의 분류 정확성은 UCI 데이터베이스에 주어진 Credit 승인 데이터베이스와 German Credit 데이터베이스를 사용하여 10층-교차확인 방법으로 평가하였다.

  • PDF

S-QUEST와 태아발육제한증 (IUGR) 조기진단시스템 개발

  • Cha, Gyeong-Jun;Park, Mun-Il;Choe, Hang-Seok;Sin, Yeong-Jae
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.05a
    • /
    • pp.171-176
    • /
    • 2003
  • 방대한 양의 데이터에서 의사결정에 필요한 정보를 발견하는 일련의 과정을 데이터 마이닝 (data mining)이라고 하는데, 본 연구에서는 생물정보학 (bioinofmatics)의 한분야로서 의학분야의 통계적 의사결정 시스템을 제공하는 의사결정나무 (decision tree) 알고리즘 중 QUEST를 S-PLUS로 구현하고(이하 S-QUEST) 발육제한(Intrauterine Growth Restriction; IUGR) 데이터를 분석하였다.

  • PDF

Empirical Bayes Estimation of the Binomial and Normal Parameters

  • Hong, Jee-Chang;Inha Jung
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.1
    • /
    • pp.87-96
    • /
    • 2001
  • We consider the empirical Bayes estimation problems with the binomial and normal components when the prior distributions are unknown but are assumed to be in certain families. There may be the families of all distributions on the parameter space or subfamilies such as the parametric families of conjugate priors. We treat both cases and establish the asymptotic optimality for the corresponding decision procedures.

  • PDF

Robust Variable Selection in Classification Tree

  • Jang Jeong Yee;Jeong Kwang Mo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.89-94
    • /
    • 2001
  • In this study we focus on variable selection in decision tree growing structure. Some of the splitting rules and variable selection algorithms are discussed. We propose a competitive variable selection method based on Kruskal-Wallis test, which is a nonparametric version of ANOVA F-test. Through a Monte Carlo study we note that CART has serious bias in variable selection towards categorical variables having many values, and also QUEST using F-test is not so powerful to select informative variables under heavy tailed distributions.

  • PDF

Empirical Bayes Pproblems with Dependent and Nonidentical Components

  • Inha Jung;Jee-Chang Hong;Kang Sup Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.1
    • /
    • pp.145-154
    • /
    • 1995
  • Empirical Bayes approach is applied to estimation of the binomial parameter when there is a cost for observations. Both the sample size and the decision rule for estimating the parameter are determined stochastically by the data, making the result more useful in applications. Our empirical Bayes problems with non-iid components are compared to the usual empirical Bayes problems with iid components. The asymptotic optimal procedure with a computer simulation is given.

  • PDF

A Sequence of Improvement over the Lindley Type Estimator with the Cases of Unknown Covariance Matrices

  • Kim, Byung-Hwee;Baek, Hoh-Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.463-472
    • /
    • 2005
  • In this paper, the problem of estimating a p-variate (p $\ge$4) normal mean vector is considered in decision-theoretic set up. Using a simple property of the noncentral chi-square distribution, a sequence of estimators dominating the Lindley type estimator with the cases of unknown covariance matrices has been produced and each improved estimator is better than previous one.

Optimal Selection of Populations for Units in a System

  • Kim, Woo-Chul
    • Journal of the Korean Statistical Society
    • /
    • v.9 no.2
    • /
    • pp.135-144
    • /
    • 1980
  • A problem of choosing units for the series system and the 1-out-of-2 system from k available brands is treated from a decision-theoretic points of view. It is assumed that units from each brand have exponentially distributed life lengths, and that the loss functions are inversely proportional to the reliability of the system. For the series system the 'natural' rule is shown to be optimal. For the 1-out-of-2 system, the Bayes rule wrt the natural conjugate prior is derived and teh constants to implement the Bayes rule are given.

  • PDF

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

An Investigation of Factors Affecting Management Efficiency in Korean General Hospitals Using DEA Model (DEA모형을 이용한 종합병원의 효율성 측정과 영향요인)

  • Ahn, In-Whan;Yang, Dong-Hyun
    • Korea Journal of Hospital Management
    • /
    • v.10 no.1
    • /
    • pp.71-92
    • /
    • 2005
  • The purpose of this study is to analyze the efficiency in management of general hospitals and investigate the major factors on efficiency. Specifically, the management of each general hospital is evaluated by using Data Envelopment Analysis(DEA) technique which is a nonparametric statistical method for measurement of efficiency. Then, the influencing factors are investigated through analyses of Decision-Tree Model and Tobit Regression. The target hospitals were general hospitals in which bed sizes are between 200 and 500 among a total of 276 general hospitals. The main data of financial indicators were collected from 48 hospitals, and it was analyzed by using two statistical models. For Model I, three input and two output variables were used for efficiency evaluation. In particular, three input variables were the number of medical doctors, the number of paramedical personnel, and the bed size. And, two output variables were the numbers of inpatients and outpatients per year, adjusted by bed-size. The results of DEA analysis showed that only seven out of 48 hospitals(15%) turned out to be efficient. The decision-tree analysis also showed that there were six significant influencing factors for Model I. Six factors for Model I were Bed Occupancy Rate, Cost per Adjusted Inpatient, New Visit Ratio of Outpatients, Retired Ratio, Net Profit to Gross Revenues, Net Profit to Total Assets. In addition, the management efficiency of hospital is proved to increase as profit and patient-induced indicators increase and cost-related indicators decrease, by the Tobit regression model of independent variables derived from the decision-tree analysis. This study may be contributable to the development of analytic methodology regarding the efficiency of hospital management in that it suggests the synthetic measures by utilizing DEA model instead of suggesting simple ratio-analyzing results.

  • PDF

Development of Diagnostic System for FHR Monitering by Using Neural Networks

  • Cha Kyung-Joon;Park Moon-Il;Oh Jae-Eung;Han Hyun-Ju;Lee Hae-Jin;Park Young-Sun
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.73-88
    • /
    • 2006
  • In this paper, we construct data-base for fetal heart rate (FHR) data and develop the FHR Monitering system to diagnose fetus, HYFM-III. For diagnostic system, a few statistical decision making mechanism are adopted, such as approximate entropy, neural networks, and logistic discrimination. Since FHR data is very chaotic, we mostly adopt nonlinear statistical methods. On the basis of this system, we expect to develop expert system for early detection of abnormal fetus.