• Title/Summary/Keyword: Cost-Sensitive Learning

Search Result 26, Processing Time 0.029 seconds

Human Papillomavirus Risk Classification by Cost-Sensitive Learning (비용 의존 학습에 의한 인유두종 바이러스의 분류)

  • 황소현;박성배;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.401-403
    • /
    • 2003
  • 인유두종 바이러스는 표피세포에 감염되는 DNA 바이러스로 자궁경부암을 일으키는 가장 큰 요인이다. 현재까지 100 여개의 종류가 알려져 있고 악성종양 유발 가능성에 따라 위험군을 나누는데. 여기서 중요한 것은 고위험군을 저위험군으로 잘못 분류하는 것을 최소화하는 것이다. 본 논문에서는 분류를 위한 데이터로 인유두종 바이러스에 관한 문서 자료들을, 기계 학습 방법으로 분류 비용을 고려해 줄 수 있는 비용 의존 학습을 이용하였다. 실험결과. 비용을 고려해 주는 것이 고려하지 않았을 때보다 더 종은 성능을 나타내었다.

  • PDF

Image Classification using Class-Balanced Loss (Class-Balanced Loss를 이용한 이미지 분류)

  • Jihee Park;Wonjun Hwang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.164-166
    • /
    • 2022
  • Long-tail problem은 class 별로 sample의 개수에 차이가 있어 성능에 안 좋은 영향을 미치는 것을 말한다. 본 논문에서는 cost-sensitive learning 중 Class-Balanced Loss를 이용해 성능을 개선하여 Long-tail problem을 해결하려고 한다. 먼저, balanced data set과 imbalanced data set의 성능 차이를 살펴보도록 할 것이다. 그 후, Class-Balanced Loss를 3가지 버전으로 이용해 그 성능을 측정하고 분석해 볼 것이다.

  • PDF

Cost-Sensitive Case Based Reasoning using Genetic Algorithm: Application to Diagnose for Diabetes

  • Park Yoon-Joo;Kim Byung-Chun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2006.06a
    • /
    • pp.327-335
    • /
    • 2006
  • Case Based Reasoning (CBR) has come to be considered as an appropriate technique for diagnosis, prognosis and prescription in medicine. However, canventional CBR has a limitation in that it cannot incorporate asymmetric misclassification cast. It assumes that the cast of type1 error and type2 error are the same, so it cannot be modified according ta the error cast of each type. This problem provides major disincentive to apply conventional CBR ta many real world cases that have different casts associated with different types of error. Medical diagnosis is an important example. In this paper we suggest the new knowledge extraction technique called Cast-Sensitive Case Based Reasoning (CSCBR) that can incorporate unequal misclassification cast. The main idea involves a dynamic adaptation of the optimal classification boundary paint and the number of neighbors that minimize the tatol misclassification cast according ta the error casts. Our technique uses a genetic algorithm (GA) for finding these two feature vectors of CSCBR. We apply this new method ta diabetes datasets and compare the results with those of the cast-sensitive methods, C5.0 and CART. The results of this paper shaw that the proposed technique outperforms other methods and overcomes the limitation of conventional CBR.

  • PDF

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction (교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.77-90
    • /
    • 2018
  • Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

Topic Classification for Suicidology

  • Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.143-150
    • /
    • 2012
  • Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.

Analysis on learning curves of end-use appliances for the establishment of price-sensitivity load model in competitive electricity market (전력산업 경쟁 환경에서의 요금부하모델 수립을 위한 부하기기의 학습곡선 분석)

  • Hwang, Sung-Wook;Kim, Jung-Hoon;Song, Kyung-Bin;Choi, Joon-Young
    • Proceedings of the KIEE Conference
    • /
    • 2001.07a
    • /
    • pp.386-388
    • /
    • 2001
  • The change of the electricity charge from cost base to price base due to the introduction of the electricity market competition causes consumer to choose a variety of charge schemes and a portion of loads to be affected by this change. Besides, it is required the index that consolidate the price volatility experienced on the power exchange with gaming and strategic bidding by suppliers to increase profits. Therefore, in order to find a mathematical model of the sensitively-responding-to-price loads, the price-sensitive load model is needed. And the development of state-of-the-art technologies affects the electricity price, so the diffusion of high-efficient end-uses and these price affect load patterns. This paper shows the analysis on learning curves algorithms which is used to investigate the correlation of the end-uses' price and load patterns.

  • PDF

The Influence of Introducing New Technologies and DSM Strategies on End-Use Learning Curves (신기술 보급 및 DSM 정책이 부하기기 학습곡선에 미치는 영향)

  • Hwang, Sung-Wook;Kim, Jung-Hoon
    • Proceedings of the KIEE Conference
    • /
    • 2001.11b
    • /
    • pp.435-437
    • /
    • 2001
  • The change of the electricity charge from cost base to price base due to the introduction of the electricity market competition causes consumer to choose a variety of charge schemes and a portion of loads to be affected by this change. Besides, it is required the index that consolidate the price volatility experienced on the power exchange with gaming and strategic bidding by suppliers to increase profits. Therefore, in order to find a mathematical model of the sensitively-responding-to-price loads, the price-sensitive load model is needed. And the development of state-of-the-art technologies affects the electricity price, so the diffusion of high-efficient end-uses and these price affect load patterns. This paper shows the analysis on learning curves algorithms which is used to investigate the correlation of the end-uses' price and load patterns.

  • PDF

A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors (태깅 오류 간 중요도 차별화에 기반한 비용 의존 품사 태깅)

  • Son, Jeong-Woo;Noh, Tae-Gil;Park, Seong-Bae;Go, Jun-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.236-239
    • /
    • 2011
  • 품사 태깅에서 오류는 같은 가중치를 가지는 것으로 간주되어 왔다. 하지만 품사 태깅의 결과를 활용하는 다른 자연어 처리 기술에 태깅 오류가 얼마나 영향을 미칠 수 있는가에 따라 품사 태깅 시 발생하는 오류가 가지는 가중치를 다르게 보아야 한다. 심각한 오류는 이를 활용하는 자연어 처리 기술의 성능 저하를 크게 야기하지만, 사소한 오류는 성능의 저하를 야기하지 않거나 그 영향이 미미하다. 본 논문에서는 품사 태깅 시, 전체적인 성능을 유지하면서 심각한 오류를 줄이는 것을 목표로 한다. 이를 위해 두 가지 점진적 손실 함수(gradient loss function)를 제안한다. 제안한 손실 함수는 심각한 오류에 사소한 오류보다 더 큰 가중치를 줌으로써 품사 태깅 모델이 심각한 오류에 더 집중하여 성능을 최적화하도록 한다. 실험에서 제안한 손실 함수를 활용한 태깅 모델은 기존의 방법에 비해 심각한 오류를 효과적으로 줄일 뿐만 아니라 전체적으로 더 높은 정확도를 보였다.

Classification of Class-Imbalanced Data: Effect of Over-sampling and Under-sampling of Training Data (계급불균형자료의 분류: 훈련표본 구성방법에 따른 효과)

  • 김지현;정종빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.445-457
    • /
    • 2004
  • Given class-imbalanced data in two-class classification problem, we often do over-sampling and/or under-sampling of training data to make it balanced. We investigate the validity of such practice. Also we study the effect of such sampling practice on boosting of classification trees. Through experiments on twelve real datasets it is observed that keeping the natural distribution of training data is the best way if you plan to apply boosting methods to class-imbalanced data.

Case Analysis of Applications of Seismic Data Denoising Methods using Deep-Learning Techniques (심층 학습 기법을 이용한 탄성파 자료 잡음 제거 적용사례 분석)

  • Jo, Jun Hyeon;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.2
    • /
    • pp.72-88
    • /
    • 2020
  • Recent rapid advances in computer hardware performance have led to relatively low computational costs, increasing the number of applications of machine-learning techniques to geophysical problems. In particular, deep-learning techniques are gaining in popularity as the number of cases successfully solving complex and nonlinear problems has gradually increased. In this paper, applications of seismic data denoising methods using deep-learning techniques are introduced and investigated. Depending on the type of attenuated noise, these studies are grouped into denoising applications of coherent noise, random noise, and the combination of these two types of noise. Then, we investigate the deep-learning techniques used to remove the corresponding noise. Unlike conventional methods used to attenuate seismic noise, deep neural networks, a typical deep-learning technique, learn the characteristics of the noise independently and then automatically optimize the parameters. Therefore, such methods are less sensitive to generalized problems than conventional methods and can reduce labor costs. Several studies have also demonstrated that deep-learning techniques perform well in terms of computational cost and denoising performance. Based on the results of the applications covered in this paper, the pros and cons of the deep-learning techniques used to remove seismic noise are analyzed and discussed.