• Title/Summary/Keyword: Selection Bias

Search Result 331, Processing Time 0.031 seconds

Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy

  • Cheng, Hongrong;Qin, Zhiguang;Feng, Chaosheng;Wang, Yong;Li, Fagen
    • ETRI Journal
    • /
    • v.33 no.2
    • /
    • pp.210-218
    • /
    • 2011
  • Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.

A new sample selection model for overdispersed count data (과대산포 가산자료의 새로운 표본선택모형)

  • Jo, Sung Eun;Zhao, Jun;Kim, Hyoung-Moon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.733-749
    • /
    • 2018
  • Sample selection arises as a result of the partial observability of the outcome of interest in a study. Heckman introduced a sample selection model to analyze such data and proposed a full maximum likelihood estimation method under the assumption of normality. Recently sample selection models for binomial and Poisson response variables have been proposed. Based on the theory of symmetry-modulated distribution, we extend these to a model for overdispersed count data. This type of data with no sample selection is often modeled using negative binomial distribution. Hence we propose a sample selection model for overdispersed count data using the negative binomial distribution. A real data application is employed. Simulation studies reveal that our estimation method based on profile log-likelihood is stable.

A selection of optimal method for bias-correction in Global Seasonal Forecast System version 5 (GloSea5) (전지구 계절예측시스템 GloSea5의 최적 편의보정기법 선정)

  • Son, Chanyoung;Song, Junghyun;Kim, Sejin;Cho, Younghyun
    • Journal of Korea Water Resources Association
    • /
    • v.50 no.8
    • /
    • pp.551-562
    • /
    • 2017
  • In order to utilize 6-month precipitation forecasts (6 months at maximum) of Global Seasonal Forecast System version 5 (GloSea5), which is being provided by KMA (Korea Meteorological Administration) since 2014, for water resources management as well as other applications, it is needed to correct the forecast model's quantitative bias against observations. This study evaluated applicability of bias-correction skill in GloSea5 and selected an optimal method among 11 techniques that include probabilistic distribution type based, parametric, and non-parametric bias-correction to fix GloSea5's bias in precipitation forecasts. Non-parametric bias-correction provided the most similar results with observed data compared to other techniques in hindcast for the past events, yet relatively generated some discrepancies in forecast. On the contrary, parametric bias-correction produced the most reliable results in both hindcast and forecast periods. The results of this study are expected to be applicable to various applications using seasonal forecast model such as water resources operation and management, hydropower, agriculture, etc.

A Bayesian Method to Semiparametric Hierarchical Selection Models (준모수적 계층적 선택모형에 대한 베이지안 방법)

  • 정윤식;장정훈
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.161-175
    • /
    • 2001
  • Meta-analysis refers to quantitative methods for combining results from independent studies in order to draw overall conclusions. Hierarchical models including selection models are introduced and shown to be useful in such Bayesian meta-analysis. Semiparametric hierarchical models are proposed using the Dirichlet process prior. These rich class of models combine the information of independent studies, allowing investigation of variability both between and within studies, and weight function. Here we investigate sensitivity of results to unobserved studies by considering a hierachical selection model with including unknown weight function and use Markov chain Monte Carlo methods to develop inference for the parameters of interest. Using Bayesian method, this model is used on a meta-analysis of twelve studies comparing the effectiveness of two different types of flouride, in preventing cavities. Clinical informative prior is assumed. Summaries and plots of model parameters are analyzed to address questions of interest.

  • PDF

Parameter estimation and assessment of bias in genetic evaluation of carcass traits in Hanwoo cattle using real and simulated data

  • Mohammed Bedhane;Julius van der Werf;Sara de las Heras-Saldana;Leland Ackerson IV;Dajeong Lim;Byoungho Park;Mi Na Park;Seunghee Roh;Samuel Clark
    • Journal of Animal Science and Technology
    • /
    • v.65 no.6
    • /
    • pp.1180-1193
    • /
    • 2023
  • Most carcass and meat quality traits are moderate to highly heritable, indicating that they can be improved through selection. Genetic evaluation for these types of traits is performed using performance data obtained from commercial and progeny testing evaluation. The performance data from commercial farms are available in large volume, however, some drawbacks have been observed. The drawback of the commercial data is mainly due to sorting of animals based on live weight prior to slaughter, and this could lead to bias in the genetic evaluation of later measured traits such as carcass traits. The current study has two components to address the drawback of the commercial data. The first component of the study aimed to estimate genetic parameters for carcass and meat quality traits in Korean Hanwoo cattle using a large sample size of industry-based carcass performance records (n = 469,002). The second component of the study aimed to describe the impact of sorting animals into different contemporary groups based on an early measured trait and then examine the effect on the genetic evaluation of subsequently measured traits. To demonstrate our objectives, we used real performance data to estimate genetic parameters and simulated data was used to assess the bias in genetic evaluation. The results of our first study showed that commercial data obtained from slaughterhouses is a potential source of carcass performance data and useful for genetic evaluation of carcass traits to improve beef cattle performance. However, we observed some harvesting effect which leads to bias in genetic evaluation of carcass traits. This is mainly due to the selection of animal based on their body weight before arrival to slaughterhouse. Overall, the non-random allocation of animals into a contemporary group leads to a biased estimated breeding value in genetic evaluation, the severity of which increases when the evaluation traits are highly correlated.

Poststroke Depression : A Review of the Latest Oriental Medicine Articles (뇌졸중 후 우울증 : 한방 치료에 대한 국내외 최신 문헌 고찰)

  • Lee, Je-Won;Lee, Bo-Mae-Na;Jang, Woo-Seok;Hwang, Ha-Yeon;Baek, Kyung-Min
    • The Journal of Internal Korean Medicine
    • /
    • v.33 no.4
    • /
    • pp.448-464
    • /
    • 2012
  • Objectives : This study reviews the latest articles in Korea and other countries that studied oriental medicine treatment on poststroke depression. Methods : Korean articles were retrieved from the 9 major Korean web article search engines. Foreign articles were retrieved from PubMed. Article published date was from 2000 up to September 2012. There were no restrictions on the types of publication, but articles not available in full text were excluded. The methodological quality was assessed according to Cochrane's assessment of risk of bias and Newcastle-Ottawa quality assessment scale. Results : Twenty-two articles were included in this study. Eleven articles were published in Korea, the rest were published in China. Nine articles were randomized controlled trials (RCT), one article was a non-randomized study (NRS), four articles were case reports, three articles were cross-sectional studies, two articles were comparative studies. In RCT articles, risk of selection bias and performance bias were generally high, risk of detection bias was unclear. The NRS article took four stars in Newcastle-Ottawa quality assessment. Comparison Hamilton rating scale for depression score between oriental medicine treated group and western medicine treated group revealed that there was no remarkable difference in mean score changes after treatment on PSD. Conclusions : The results of this study suggest that oriental medicine treatment is as effective as western medicine treatment for PSD. In the future, more rigorous oriental medicine treatment studies should be conducted.

Constructing Database and Probabilistic Analysis for Ultimate Bearing Capacity of Aggregate Pier (쇄석다짐말뚝의 극한지지력 데이터베이스 구축 및 통계학적 분석)

  • Park, Joon-Mo;Kim, Bum-Joo;Jang, Yeon-Soo
    • Journal of the Korean Geotechnical Society
    • /
    • v.30 no.8
    • /
    • pp.25-37
    • /
    • 2014
  • In load and resistance factor design (LRFD) method, resistance factors are typically calibrated using resistance bias factors obtained from either only the data within ${\pm}2{\sigma}$ or the data except the tail values of an assumed probability distribution to increase the reliability of the database. However, the data selection approach has a shortcoming that any low-quality data inadvertently included in the database may not be removed. In this study, a data quality evaluation method, developed based on the quality of static load test results, the engineering characteristics of in-situ soil, and the dimension of aggregate piers, is proposed for use in constructing database. For the evaluation of the method, a total 65 static load test results collected from various literatures, including static load test reports, were analyzed. Depending on the quality of the database, the comparison between bias factors, coefficients of variation, and resistance factors showed that uncertainty in estimating bias factors can be reduced by using the proposed data quality evaluation method when constructing database.

Systematic Review and Meta-analysis of Chuna Therapy for Sciatica (좌골신경통에 적용한 추나 치료에 대한 체계적 문헌 고찰 및 메타 분석)

  • Hong, Su Min;Oh, Seung Joon;Lee, Eun Jung
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.34 no.6
    • /
    • pp.299-308
    • /
    • 2020
  • This study aimed to evaluate the effects of Chuna therapy for Sciatica. We searched the following 16 online databases without a language restriction (Pubmed, Cochrane, Embase, CINAHL, Ovid, Kmbase, RISS, NDSL, OASIS, KISS, KNAL, KTKP, DBpia, CNKI, Wangfang, J-stage) to find randomized controlled clinical trials that used Chuna therapy for Sciatica. The methodological quality of randomized controlled clinical trials (RCTs) were assessed using the Cochrane risk of bias tool and meta-analysis were performed. Among 496 articles that were searched, 15 RCTs were finally selected for systematic review. 14 studies showed that Chuna therapy has positive effect on sciatica. Two studies noted that there were side effects, and the difference between the intervention group and the control group was statistically insignificant. One study noted no side effects and the rest of the study, there was no mention of side effects. Meta-analysis showed positive results for Chuna single therapy in terms of efficiency rate compared to painkiller, herb medicine excepting acupuncture. When comparing Chuna therapy plus acupuncture and acupuncture, Chuna therapy plus acupuncture had a more positive result than acupuncture in terms of efficiency rate. Cochrane Risk of Bias (RoB)evaluation method, most of the studies's selection, performance, detection and reporting bias were unclear. The studies showed that Chuna therapy can significantly effective on sciatica. However, most of the studies's Risk of Bias included in the analysis were not low enough. In the future, to prove the level of evidence of Chuna therapy, more high-quality studies will be needed.

A Study on Sampling for Estimating Tobacco Disease Incidences

  • Park, Hong-Nai
    • Journal of the Korean Statistical Society
    • /
    • v.8 no.2
    • /
    • pp.165-168
    • /
    • 1979
  • For crops that are planted in a lattice layout, sampling designs can be made to take advantage of this regular arrangement. In order to select which tobacco plants to be examined in a survey to estimate disease loss in tobacco a method of, so called, bent plots was devised based on the regularity of plantings in the tobacco fields. We will first describe this sample selection and measurement method and then provide estimators and their bias and variance properties.

  • PDF

A Comparative Study of Restricted Randomization Methods in Clinicla Trials

  • Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.14 no.1
    • /
    • pp.48-55
    • /
    • 1985
  • In clinical trials subjects are avalible sequentially and must be assigned to treatments immediately. Completely randomized procedure for the allocation of treatments to each subject may result in severe imbalance among the number of subjects in treatment groups, especially for small experiments or interim analyses of large experiments. In this study, restricted randomization methods such as biased coin designs (Efron, 1971), permuted block design, and truncated binomial design are compared to teh completely randomized design in the presence of selection and/or accidential bias by Monte Carlo simulations.

  • PDF