• Title/Summary/Keyword: Data Bias

Search Result 1,767, Processing Time 0.223 seconds

Adjusting sampling bias in case-control genetic association studies

  • Seo, Geum Chu;Park, Taesung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1127-1135
    • /
    • 2014
  • Genome-wide association studies (GWAS) are designed to discover genetic variants such as single nucleotide polymorphisms (SNPs) that are associated with human complex traits. Although there is an increasing interest in the application of GWAS methodologies to population-based cohorts, many published GWAS have adopted a case-control design, which raise an issue related to a sampling bias of both case and control samples. Because of unequal selection probabilities between cases and controls, the samples are not representative of the population that they are purported to represent. Therefore, non-random sampling in case-control study can potentially lead to inconsistent and biased estimates of SNP-trait associations. In this paper, we proposed inverse-probability of sampling weights based on disease prevalence to eliminate a case-control sampling bias in estimation and testing for association between SNPs and quantitative traits. We apply the proposed method to a data from the Korea Association Resource project and show that the standard estimators applied to the weighted data yield unbiased estimates.

Nonuniformity Correction Scheme Based on 3-dimensional Visualization of MRI Images (MRI 영상의 3차원 가시화를 통한 영상 불균일성 보정 기법)

  • Kim, Hyoung-Jin;Seo, Kwang-Deok
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.948-958
    • /
    • 2010
  • Human body signals collected by the MRI system are very weak, such that they may be easily affected by either external noise or system instability while being imaged. Therefore, this paper analyzes the nonuniformity caused by a design of the RF receiving coil in a low-magnetic-field MRI system, and proposes an efficient method to improve the image uniformity. In this paper, a method for acquiring 3D bias volume data by using phantom data among various methods for correcting such nonuniformity in MRI image is proposed, such that it is possible to correct various-sized images. It is shown by simulations that images obtained by various imaging methods can be effectively corrected using single bias data.

BERT-Based Logits Ensemble Model for Gender Bias and Hate Speech Detection

  • Sanggeon Yun;Seungshik Kang;Hyeokman Kim
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.641-651
    • /
    • 2023
  • Malicious hate speech and gender bias comments are common in online communities, causing social problems in our society. Gender bias and hate speech detection has been investigated. However, it is difficult because there are diverse ways to express them in words. To solve this problem, we attempted to detect malicious comments in a Korean hate speech dataset constructed in 2020. We explored bidirectional encoder representations from transformers (BERT)-based deep learning models utilizing hyperparameter tuning, data sampling, and logits ensembles with a label distribution. We evaluated our model in Kaggle competitions for gender bias, general bias, and hate speech detection. For gender bias detection, an F1-score of 0.7711 was achieved using an ensemble of the Soongsil-BERT and KcELECTRA models. The general bias task included the gender bias task, and the ensemble model achieved the best F1-score of 0.7166.

Re-Considering Aggregated Data Bias by Extending "Koyck Model" of Advertising Effect (광고 효과 확장 코익 모델을 이용한 Aggregated data bias의 재조명)

  • Song, Tea-Ho;Yuan, Xina;Kim, Ji-Yoon
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.34 no.2
    • /
    • pp.91-100
    • /
    • 2009
  • "How does advertising affect sales?" is the fundamental issue of modern advertising research. There is an interesting issue for estimating carryover effects of advertising on sales, and the aggregated data biases exist in the duration of advertising effect. This research suggests an extended model of Koyck Model which is employed for micro-data (Koyck 1954) to estimate aggregated advertising data, and empirically shows the aggregated data bias. Our developed model with the aggregated level of actual advertising data is more appropriate than the basic Koyck model for micro-data. The result figures out that it is important to consider the disaggregated data level in the analysis of dynamic effects of adverting such as carryover effects.

Parameter Extraction of HEMT Small-Signal Equivalent Circuits Using Multi-Bias Extraction Technique (다중 바이어스 추출 기법을 이용한 HEMT 소신호 파라미터 추출)

  • 강보술;전만영;정윤하
    • Proceedings of the IEEK Conference
    • /
    • 2000.11a
    • /
    • pp.353-356
    • /
    • 2000
  • Multi-bias parameter extraction technique for HEMT small signa] equivalent circuits is presented in this paper. The technique in this paper uses S-parameters measured at various bias points in the active region to construct one optimization problem, of which the vector of unknowns contains only a set of bias-independent elements. Tests are peformed on measured S-parameters of a pHEMT at 30 bias points. Results indicate that the calculated S-parameters is similar to the measured data.

  • PDF

A Study on the Bias Reduction in Split Variable Selection in CART

  • Song, Hyo-Im;Song, Eun-Tae;Song, Moon Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.553-562
    • /
    • 2004
  • In this short communication we discuss the bias problems of CART in split variable selection and suggest a method to reduce the variable selection bias. Penalties proportional to the number of categories or distinct values are applied to the splitting criteria of CART. The results of empirical comparisons show that the proposed modification of CART reduces the bias in variable selection.

TheAssessment of risk of bias in randomized controlled trials published in the Korean Journal of Physical Therapy: A 2018~2022 review (한국 물리치료 학술지에 무작위대조연구의 비뚤림 위험 평가: 2018~2022년 검토)

  • Jae Hyun Lim;Chi Bok Park;Byeong Geun Kim
    • Journal of Korean Physical Therapy Science
    • /
    • v.30 no.4
    • /
    • pp.82-91
    • /
    • 2023
  • Background: Randomized controlled trials (RCTs) provide evidence on the effectiveness and safety of interventions and inform systematic reviews and guideline preparation for clinical application. However, methodological flaws can occur in many RCTs, and Cochrane's risk of bias version 2 (RoB2) can be used to evaluate RCTs' risk of bias (RoB). However, physical therapy RCTs in Korea did not confirm RoB. Therefore, the purpose of this study was to evaluate RoB using RoB2 in RCTs published in the Korean Physical Therapy Journal. Design: Review. Methods: The RCTs subject to evaluation were RCTs published in 11 physical therapy journals in Korea from 2018 to 2022. RoB2 evaluated a total of five domains: bias arising from the randomization process, bias due to deviations from intended interventions, bias due to missing outcome data, bias in measurement of the outcome, and bias in selection of the reported result. Results: A total of 616 RCTs were evaluated. As for bias arising from the randomization process, high risk was the highest at 555 (90.1%), followed by low risk at 41 (6.7%) and some concerns at 20 (3.2%). For bias due to deviations from intended interventions, the proportion of some concerns was the highest at 390 (63.3%), followed by high risk at 218 (35.4%) and low risk at 8 (1.3%). As for the bias due to missing outcome data, the rate of low risk was the highest at 399 (64.8%), followed by high risk at 159 (25.8%) and some concerns at 58 (9.4%). As for bias in measurement of the outcome, high risk was the highest at 294 (47.7%), followed by low risk at 224 (36.4%) and some concerns at 98 (15.9%). In the bias due to missing outcome data, the ratio of high risk was the highest at 610 (99%), followed by low risk at 4 (0.7%) and some concerns at 2 (0.3%). Conclusion: Most of the RoB evaluation results of RCTs published in the Korean Physical Therapy Journal were rated as high risk. Methodological quality of RCTs needs to be improved.

Learning Method of Data Bias employing MachineLearningforKids: Case of AI Baseball Umpire (머신러닝포키즈를 활용한 데이터 편향 인식 학습: AI야구심판 사례)

  • Kim, Hyo-eun
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.4
    • /
    • pp.273-284
    • /
    • 2022
  • The goal of this paper is to propose the use of machine learning platforms in education to train learners to recognize data biases. Learners can cultivate the ability to recognize when learners deal with AI data and systems when they want to prevent damage caused by data bias. Specifically, this paper presents a method of data bias education using MachineLearningforKids, focusing on the case of AI baseball referee. Learners take the steps of selecting a specific topic, reviewing prior research, inputting biased/unbiased data on a machine learning platform, composing test data, comparing the results of machine learning, and present implications. Learners can learn that AI data bias should be minimized and the impact of data collection and selection on society. This learning method has the significance of promoting the ease of problem-based self-directed learning, the possibility of combining with coding education, and the combination of humanities and social topics with artificial intelligence literacy.

Bias Compensation Algorithm of Acceleration Sensor on Galloping Measurement System

  • Kim, Hwan-Seong;Byung, Gi-Sig;So, Sang-Gyun
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.127.6-127
    • /
    • 2001
  • In this paper, we deal with two bias compensation algorithms of acceleration sensor for measuring the galloping on power transmission line. Firstly, the block diagram of galloping measurement system is given and a galloping model is presented. Secondly, two compensation algorithms, a simple compensation and a period compensation, are proposed. A simple compensation algorithm use the drafts of velocity and distance at fixed periods, so it is useful for constant bias case. Next, a period compensation algorithm can compensate a periodic bias. This algorithm use the previous measured data and compensated data for constant period, where the period is obtained by FFT method. Lastly, the effectiveness of proposed algorithms is verified by comparing between two algorithms in simulation, and its characteristics and the bias error bound are shown, respectively.

  • PDF

Evolution of Bias-corrected Satellite Rainfall Estimation for Drought Monitoring System in South Korea (한반도지역 가뭄 모니터링 활용을 위한 위성강우 편의보정)

  • Park, Jihoon;Jung, Imgook;Park, Kyungwon
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_1
    • /
    • pp.997-1007
    • /
    • 2018
  • Drought monitoring is the important system for disasters by climate change. To perform this, it is necessary to measure the precipitation based on satellite rainfall estimation. The data developed in this study provides two kinds of satellite data (raw satellite data and bias-corrected satellite data). The spatial resolution of satellite data is 10 km and the temporal resolution is 1 day. South Korea was selected as the target area, and the original satellite data was constructed, and the bias-correction method was validated. The raw satellite data was constructed using TRMM TMPA and GPM IMERG products. The GRA-IDW was selected for bias-correction method. The correlation coefficient of 0.775 between 1998 and 2017 is relatively high, and TRMM TMPA and GPM IMERG 10 km daily rainfall correlation coefficients are 0.776 and 0.753, respectively. The BIAS values were found to overestimate the raw satellite data over observed data. By using the technique developed in this study, it is possible to provide reliable drought monitoring to Korean peninsula watershed. It is also a basic data for overseas projects including the un-gaged regions. It is expected that reliable gridded data for end users of drought management.