• 제목/요약/키워드: Selection Bias

검색결과 331건 처리시간 0.022초

Conditional Mutual Information-Based Feature Selection Analyzing for Synergy and Redundancy

  • Cheng, Hongrong;Qin, Zhiguang;Feng, Chaosheng;Wang, Yong;Li, Fagen
    • ETRI Journal
    • /
    • 제33권2호
    • /
    • pp.210-218
    • /
    • 2011
  • Battiti's mutual information feature selector (MIFS) and its variant algorithms are used for many classification applications. Since they ignore feature synergy, MIFS and its variants may cause a big bias when features are combined to cooperate together. Besides, MIFS and its variants estimate feature redundancy regardless of the corresponding classification task. In this paper, we propose an automated greedy feature selection algorithm called conditional mutual information-based feature selection (CMIFS). Based on the link between interaction information and conditional mutual information, CMIFS takes account of both redundancy and synergy interactions of features and identifies discriminative features. In addition, CMIFS combines feature redundancy evaluation with classification tasks. It can decrease the probability of mistaking important features as redundant features in searching process. The experimental results show that CMIFS can achieve higher best-classification-accuracy than MIFS and its variants, with the same or less (nearly 50%) number of features.

과대산포 가산자료의 새로운 표본선택모형 (A new sample selection model for overdispersed count data)

  • 조성은;조준;김형문
    • 응용통계연구
    • /
    • 제31권6호
    • /
    • pp.733-749
    • /
    • 2018
  • 어떠한 연구에서 관심의 대상이 되는 관찰치가 부분적으로 관측 가능할 때 표본선택의 문제가 일어난다. 이러한 자료를 분석하기 위해 헤크만은 표본선택 모형을 개발하였고 이변량 정규분표의 가정 하에 최대우도방법을 사용하여 모수를 추정하였다. 최근 이항자료와 포아송 자료에 대한 표본선택모형이 제안되었다. 이를 분포조정에 기초하여 과대산포 자료에 대한 모형으로 확장하고자 한다. 표본선택이 없는 과대산포 자료는 흔히 음이항 분포로 분석되어진다. 따라서 음이항 분포를 이용하고 분포조정을 도입한 과대산포 자료에 대한 새로운 모형을 제시하고자 한다. 실제 자료를 이용하여 분석을 하였다. 모의실험 결과 프로파일 우도함수를 이용하여 모수에 대해 추정한 결과는 안정적이다.

전지구 계절예측시스템 GloSea5의 최적 편의보정기법 선정 (A selection of optimal method for bias-correction in Global Seasonal Forecast System version 5 (GloSea5))

  • 손찬영;송정현;김세진;조영현
    • 한국수자원학회논문집
    • /
    • 제50권8호
    • /
    • pp.551-562
    • /
    • 2017
  • 2014년부터 기상청에서 현업으로 활용하고 있는 전지구 계절예측시스템 GloSea5의 최대 6개월 예측 강수량을 수자원 및 여러 응용분야에 활용하기 위해서는 예측모델이 가지는 관측자료와의 정량적인 편의를 보정할 필요가 있다. 본 연구에서는 GloSea5의 예측 강수량에서 나타나는 편의를 보정하기 위해 확률분포형을 활용한 편의보정기법, 매개변수 및 비매개변수적 편의보정기법 등 총 11개의 기법을 활용하여 계절예측모델의 적용성을 평가하고 최적의 편의보정기법을 선정하고자 하였다. 과거재현기간에 대한 편의보정 결과, 비매개변수적 편의보정기법이 다른 기법에 비해 가장 관측자료와 유사하게 보정하는 것으로 분석되었으나 예측기간에 대해서는 상대적으로 많은 이상치를 발생시켰다. 이와는 대조적으로 매개변수적 편의보정기법은 과거재현기간 및 예측기간 모두 안정된 결과를 보여주고 있음을 확인할 수 있었다. 본 연구의 결과는 수자원운영 및 관리, 수력, 농업 등 계절예측모델을 활용한 여러 응용분야에 적용이 가능할 것으로 기대된다.

준모수적 계층적 선택모형에 대한 베이지안 방법 (A Bayesian Method to Semiparametric Hierarchical Selection Models)

  • 정윤식;장정훈
    • 응용통계연구
    • /
    • 제14권1호
    • /
    • pp.161-175
    • /
    • 2001
  • 메타분석(Meta-analysis)은 서로 독립적으로 연구되어진 결과들을 전체적인 하나의 결과로 도출하기 위해 사용되어지는 통계적 방법이다. 이러한 통계적 방법을 설명할 모형으로는 선택모형(selection model)을 포함한 계층적 모형(hierarchical model)을 사용하며, 이러한 모형들은 베이지안 메타분석에 유용한 것으로 알려져 있다. 그러나, 메타분석의 자료들은 일반적으로 출판편의(publication bias)를 갖고 있으므로 이를 극복하고자 가중함수(weight function)를 이용하여 분포함수를 새롭게 정의하여 사용한다. 최근에 Silliman(1997)은 계층적 모형(hierarchical model)에 가중함수를 첨부한 계층적 선택모형(hierarchical selection model)을 정의하고 모수적 베이지안 방법을 제시하였다. 본 연구에서는 미관측된 연구효과에 디리슈레 과정 사전분포(Dirichlet process prior)를 적용한 준모수적 계층적 선택모형(semiparametric hierarchical selection models)을 소개한다. 여기서 제시된 준모수적 계층적 선택모형을 베이지안 방법으로 추정하기 위하여 마코프 연쇄 몬테칼로(Markov chain Monte Carlo)방법을 이용한다. 제시된 방법을 적용하기 위하여 실제 자료(Johnson, 1993)인 충치를 예방하기 위한 두 가지의 예방약의 효과에 대한 차이를 비교하기 위해 얻어진 12개의 연구를 이용하여 메타분석을 한다.

  • PDF

Parameter estimation and assessment of bias in genetic evaluation of carcass traits in Hanwoo cattle using real and simulated data

  • Mohammed Bedhane;Julius van der Werf;Sara de las Heras-Saldana;Leland Ackerson IV;Dajeong Lim;Byoungho Park;Mi Na Park;Seunghee Roh;Samuel Clark
    • Journal of Animal Science and Technology
    • /
    • 제65권6호
    • /
    • pp.1180-1193
    • /
    • 2023
  • Most carcass and meat quality traits are moderate to highly heritable, indicating that they can be improved through selection. Genetic evaluation for these types of traits is performed using performance data obtained from commercial and progeny testing evaluation. The performance data from commercial farms are available in large volume, however, some drawbacks have been observed. The drawback of the commercial data is mainly due to sorting of animals based on live weight prior to slaughter, and this could lead to bias in the genetic evaluation of later measured traits such as carcass traits. The current study has two components to address the drawback of the commercial data. The first component of the study aimed to estimate genetic parameters for carcass and meat quality traits in Korean Hanwoo cattle using a large sample size of industry-based carcass performance records (n = 469,002). The second component of the study aimed to describe the impact of sorting animals into different contemporary groups based on an early measured trait and then examine the effect on the genetic evaluation of subsequently measured traits. To demonstrate our objectives, we used real performance data to estimate genetic parameters and simulated data was used to assess the bias in genetic evaluation. The results of our first study showed that commercial data obtained from slaughterhouses is a potential source of carcass performance data and useful for genetic evaluation of carcass traits to improve beef cattle performance. However, we observed some harvesting effect which leads to bias in genetic evaluation of carcass traits. This is mainly due to the selection of animal based on their body weight before arrival to slaughterhouse. Overall, the non-random allocation of animals into a contemporary group leads to a biased estimated breeding value in genetic evaluation, the severity of which increases when the evaluation traits are highly correlated.

뇌졸중 후 우울증 : 한방 치료에 대한 국내외 최신 문헌 고찰 (Poststroke Depression : A Review of the Latest Oriental Medicine Articles)

  • 이제원;이보매나;장우석;황하연;백경민
    • 대한한방내과학회지
    • /
    • 제33권4호
    • /
    • pp.448-464
    • /
    • 2012
  • Objectives : This study reviews the latest articles in Korea and other countries that studied oriental medicine treatment on poststroke depression. Methods : Korean articles were retrieved from the 9 major Korean web article search engines. Foreign articles were retrieved from PubMed. Article published date was from 2000 up to September 2012. There were no restrictions on the types of publication, but articles not available in full text were excluded. The methodological quality was assessed according to Cochrane's assessment of risk of bias and Newcastle-Ottawa quality assessment scale. Results : Twenty-two articles were included in this study. Eleven articles were published in Korea, the rest were published in China. Nine articles were randomized controlled trials (RCT), one article was a non-randomized study (NRS), four articles were case reports, three articles were cross-sectional studies, two articles were comparative studies. In RCT articles, risk of selection bias and performance bias were generally high, risk of detection bias was unclear. The NRS article took four stars in Newcastle-Ottawa quality assessment. Comparison Hamilton rating scale for depression score between oriental medicine treated group and western medicine treated group revealed that there was no remarkable difference in mean score changes after treatment on PSD. Conclusions : The results of this study suggest that oriental medicine treatment is as effective as western medicine treatment for PSD. In the future, more rigorous oriental medicine treatment studies should be conducted.

쇄석다짐말뚝의 극한지지력 데이터베이스 구축 및 통계학적 분석 (Constructing Database and Probabilistic Analysis for Ultimate Bearing Capacity of Aggregate Pier)

  • 박준모;김범주;장연수
    • 한국지반공학회논문집
    • /
    • 제30권8호
    • /
    • pp.25-37
    • /
    • 2014
  • 국내 외 하중저항계수설계법의 저항계수 보정 시 수집된 데이터베이스의 신뢰성을 향상시키기 위하여 저항편향계수 산정 단계에서 저항편향치의 ${\pm}2{\sigma}$ 범위의 데이터만을 선택하거나 가정된 확률분포 검정을 만족하도록 꼬리(tail)부분의 데이터를 제거하는 방법을 적용하고 있다. 그러나 이들 방법에서는 데이터베이스 내에 우연히 포함된 저품질의 데이터를 확인할 수 없는 단점이 발견되었다. 본 연구에서는 정재하시험의 품질, 원지반의 공학적 특성, 쇄석다짐말뚝의 제원 등의 품질기준을 이용하여 데이터베이스 구축 단계에서 수행할 수 있는 품질평가법을 제안하였으며, 국내 외 문헌 및 정재하시험 보고서로부터 65개소의 정재하시험 데이터를 수집하여 데이터베이스의 구축 및 품질평가를 수행하였다. 데이터베이스의 품질 평가 상태에 따른 저항편향계수와 변동계수, 저항계수를 비교한 결과, 기존의 데이터베이스 처리과정과 품질평가법을 병행할 경우에 저항편향계수의 불확실성이 감소되며, 신뢰도 높은 LRFD 저항계수 보정에 효과적인 것으로 판단된다.

좌골신경통에 적용한 추나 치료에 대한 체계적 문헌 고찰 및 메타 분석 (Systematic Review and Meta-analysis of Chuna Therapy for Sciatica)

  • 홍수민;오승준;이은정
    • 동의생리병리학회지
    • /
    • 제34권6호
    • /
    • pp.299-308
    • /
    • 2020
  • This study aimed to evaluate the effects of Chuna therapy for Sciatica. We searched the following 16 online databases without a language restriction (Pubmed, Cochrane, Embase, CINAHL, Ovid, Kmbase, RISS, NDSL, OASIS, KISS, KNAL, KTKP, DBpia, CNKI, Wangfang, J-stage) to find randomized controlled clinical trials that used Chuna therapy for Sciatica. The methodological quality of randomized controlled clinical trials (RCTs) were assessed using the Cochrane risk of bias tool and meta-analysis were performed. Among 496 articles that were searched, 15 RCTs were finally selected for systematic review. 14 studies showed that Chuna therapy has positive effect on sciatica. Two studies noted that there were side effects, and the difference between the intervention group and the control group was statistically insignificant. One study noted no side effects and the rest of the study, there was no mention of side effects. Meta-analysis showed positive results for Chuna single therapy in terms of efficiency rate compared to painkiller, herb medicine excepting acupuncture. When comparing Chuna therapy plus acupuncture and acupuncture, Chuna therapy plus acupuncture had a more positive result than acupuncture in terms of efficiency rate. Cochrane Risk of Bias (RoB)evaluation method, most of the studies's selection, performance, detection and reporting bias were unclear. The studies showed that Chuna therapy can significantly effective on sciatica. However, most of the studies's Risk of Bias included in the analysis were not low enough. In the future, to prove the level of evidence of Chuna therapy, more high-quality studies will be needed.

A Study on Sampling for Estimating Tobacco Disease Incidences

  • Park, Hong-Nai
    • Journal of the Korean Statistical Society
    • /
    • 제8권2호
    • /
    • pp.165-168
    • /
    • 1979
  • For crops that are planted in a lattice layout, sampling designs can be made to take advantage of this regular arrangement. In order to select which tobacco plants to be examined in a survey to estimate disease loss in tobacco a method of, so called, bent plots was devised based on the regularity of plantings in the tobacco fields. We will first describe this sample selection and measurement method and then provide estimators and their bias and variance properties.

  • PDF

A Comparative Study of Restricted Randomization Methods in Clinicla Trials

  • Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • 제14권1호
    • /
    • pp.48-55
    • /
    • 1985
  • In clinical trials subjects are avalible sequentially and must be assigned to treatments immediately. Completely randomized procedure for the allocation of treatments to each subject may result in severe imbalance among the number of subjects in treatment groups, especially for small experiments or interim analyses of large experiments. In this study, restricted randomization methods such as biased coin designs (Efron, 1971), permuted block design, and truncated binomial design are compared to teh completely randomized design in the presence of selection and/or accidential bias by Monte Carlo simulations.

  • PDF