• Title/Summary/Keyword: 비용민감학습

Search Result 12, Processing Time 0.022 seconds

Cost-Sensitive Learning for Cardio-Cerebrovascular Disease Risk Prediction (심혈관질환 위험 예측을 위한 비용민감 학습 모델)

  • Yu Na Lee;Kyung-Hee Lee;Wan-Sup Cho
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.161-168
    • /
    • 2021
  • In this study, we propose a cardiovascular disease prediction model using machine learning. First, a multidimensional analysis of various differences between the two groups is performed and the results are visualized. In particular, we propose a predictive model using cost-sensitive learning that can improve the sensitivity for cases where there is a high class imbalance between the normal and patient groups, such as diseases. In this study, a predictive model is developed using CART and XGBoost, which are representative machine learning technologies, and prediction and performance are compared for cardiovascular disease patient data. According to the study results, CART showed higher accuracy and specificity than XGBoost, and the accuracy was about 70% to 74%.

Cost-sensitive Learning for Credit Card Fraud Detection (신용카드 사기 검출을 위한 비용 기반 학습에 관한 연구)

  • Park Lae-Jeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.5
    • /
    • pp.545-551
    • /
    • 2005
  • The main objective of fraud detection is to minimize costs or losses that are incurred due to fraudulent transactions. Because of the problem's nature such as highly skewed, overlapping class distribution and non-uniform misclassification costs, it is, however, practically difficult to generate a classifier that is near-optimal in terms of classification costs at a desired operating range of rejection rates. This paper defines a performance measure that reflects classifier's costs at a specific operating range and offers a cost-sensitive learning approach that enables us to train classifiers suitable for real-world credit card fraud detection by directly optimizing the performance measure with evolutionary programming. The experimental results demonstrate that the proposed approach provides an effective way of training cost-sensitive classifiers for successful fraud detection, compared to other training methods.

A Study on the Improvement of Image Classification Performance in the Defense Field through Cost-Sensitive Learning of Imbalanced Data (불균형데이터의 비용민감학습을 통한 국방분야 이미지 분류 성능 향상에 관한 연구)

  • Jeong, Miae;Ma, Jungmok
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.3
    • /
    • pp.281-292
    • /
    • 2021
  • With the development of deep learning technology, researchers and technicians keep attempting to apply deep learning in various industrial and academic fields, including the defense. Most of these attempts assume that the data are balanced. In reality, since lots of the data are imbalanced, the classifier is not properly built and the model's performance can be low. Therefore, this study proposes cost-sensitive learning as a solution to the imbalance data problem of image classification in the defense field. In the proposed model, cost-sensitive learning is a method of giving a high weight on the cost function of a minority class. The results of cost-sensitive based model shows the test F1-score is higher when cost-sensitive learning is applied than general learning's through 160 experiments using submarine/non-submarine dataset and warship/non-warship dataset. Furthermore, statistical tests are conducted and the results are shown significantly.

기계학습을 이용한 수출 컨테이너의 무게그룹 분류

  • Gang, Jae-Ho;Gang, Byeong-Ho;Ryu, Gwang-Ryeol;Kim, Gap-Hwan
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.77-86
    • /
    • 2005
  • 컨테이너 터미널에서는 장치장으로 반입되는 수출 컨테이너의 무게를 몇 단계 그룹으로 나누고 각 무게그룹 별로 모아서 장치한다. 이는 수출 컨테이너를 선박에 싣는 적하 작업 시 선박의 안정성을 위하여 무거운 무게그룹의 컨테이너들을 장치장에서 먼저 반출하여 선박의 바닥 쪽에 놓기 위함이다. 하지만 반입되는 컨테이너의 무게그룹을 결정할 때 사용하는 운송사로부터 받은 무게정보는 부정확한 경우가 많아 하나의 스택(stack)에 서로 다른 무게그룹에 속하는 컨테이너들이 섞이게 된다. 이로 인하여 무거운 무게그룹의 컨테이너를 반출할 때 해당 컨테이너의 상단에 놓여진 보다 가벼운 무게그룹의 컨테이너들을 임시로 옮겨야 하는 재취급(rehandling, reshuffling)이 발생하게 된다. 적하작업 시 장치장에서 재취급이 빈번히 발생하면 작업이 지연되므로 터미널 생산성 향상을 위해서는 재취급 발생을 가급적 줄여야 한다. 본 논문에서는 기계학습 기법을 적용하여 반입 컨테이너의 무게그룹을 보다 정확히 추정하는 방안을 제안한다. 또한 탐색을 통하여 분류기 생성에 관여하는 비용행렬(cost matrix)을 조정함으로써 재취급 발생을 줄일 수 있는 분류기(classifier)를 생성하는 방안을 함께 소개한다. 실험 결과 본 논문에서 제안하는 방안 적용 시 재취급 발생을 $5{\sim}7%$ 정도 줄일 수 있음을 예상할 수 있었다.

  • PDF

Case Analysis of Applications of Seismic Data Denoising Methods using Deep-Learning Techniques (심층 학습 기법을 이용한 탄성파 자료 잡음 제거 적용사례 분석)

  • Jo, Jun Hyeon;Ha, Wansoo
    • Geophysics and Geophysical Exploration
    • /
    • v.23 no.2
    • /
    • pp.72-88
    • /
    • 2020
  • Recent rapid advances in computer hardware performance have led to relatively low computational costs, increasing the number of applications of machine-learning techniques to geophysical problems. In particular, deep-learning techniques are gaining in popularity as the number of cases successfully solving complex and nonlinear problems has gradually increased. In this paper, applications of seismic data denoising methods using deep-learning techniques are introduced and investigated. Depending on the type of attenuated noise, these studies are grouped into denoising applications of coherent noise, random noise, and the combination of these two types of noise. Then, we investigate the deep-learning techniques used to remove the corresponding noise. Unlike conventional methods used to attenuate seismic noise, deep neural networks, a typical deep-learning technique, learn the characteristics of the noise independently and then automatically optimize the parameters. Therefore, such methods are less sensitive to generalized problems than conventional methods and can reduce labor costs. Several studies have also demonstrated that deep-learning techniques perform well in terms of computational cost and denoising performance. Based on the results of the applications covered in this paper, the pros and cons of the deep-learning techniques used to remove seismic noise are analyzed and discussed.

Effective Harmony Search-Based Optimization of Cost-Sensitive Boosting for Improving the Performance of Cross-Project Defect Prediction (교차 프로젝트 결함 예측 성능 향상을 위한 효과적인 하모니 검색 기반 비용 민감 부스팅 최적화)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.3
    • /
    • pp.77-90
    • /
    • 2018
  • Software Defect Prediction (SDP) is a field of study that identifies defective modules. With insufficient local data, a company can exploit Cross-Project Defect Prediction (CPDP), a way to build a classifier using dataset collected from other companies. Most machine learning algorithms for SDP have used more than one parameter that significantly affects prediction performance depending on different values. The objective of this study is to propose a parameter selection technique to enhance the performance of CPDP. Using a Harmony Search algorithm (HS), our approach tunes parameters of cost-sensitive boosting, a method to tackle class imbalance causing the difficulty of prediction. According to distributional characteristics, parameter ranges and constraint rules between parameters are defined and applied to HS. The proposed approach is compared with three CPDP methods and a Within-Project Defect Prediction (WPDP) method over fifteen target projects. The experimental results indicate that the proposed model outperforms the other CPDP methods in the context of class imbalance. Unlike the previous researches showing high probability of false alarm or low probability of detection, our approach provides acceptable high PD and low PF while providing high overall performance. It also provides similar performance compared with WPDP.

Computational Method for Searching Human miRNA Precursors (인간 miRNA 전구체 탐색을 위한 계산학적 방법)

  • Nam, Jin-Wu;Joung, Je-Gun;Lee, Wha-Jin;Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.288-297
    • /
    • 2003
  • 본 논문은 진화 알고리즘(Evolutionary algorithm)의 기법중의 하나인 유전자 프로그래밍(Genetic programming)을 이용하여 miRNA 유전자를 발굴하기 위한 알고리즘을 소개하고 있다 miRNA는 세포내에서 유전자의 전사를 중지시킴으로써 유전자의 발현을 직접적으로 조절하게 되는 작은 RNA 집단 중의 하나이다. 그러므로 miRNA를 유전체 데이터에서 동정해내는 작업은 생물학적으로 상당히 중요하다. 한편 유전체 데이터에서 miRNA를 동정해내는 알고리즘은 생물학적 실험에서의 시간과 비용을 상당히 절감할 수 있으며, 생물학적으로 miRNA를 동정하는 많은 어려움을 덜어주게 된다. 하지만 계산학적으로 miRNA의 동정은 1차 염기서열상의 통계적인 중요도가 부족하여 기존의 유전자 예측 알고리즘을 적용하기에는 어려움이 있다. 따라서 본 연구에서는 miRNA의 염기서열보다는 2차구조에서 더 많은 유사성을 갖는다는 점을 착안하여, 2차구조내에서 공통적인 구조를 찾아내고, 그 정보를 이용하여 miRNA를 동정해내는 방법으로 접근하였다. 이 알고리즘의 성능평가를 위해 우리는 test set을 이용하여 학습된 모델의 특이도(= 34/38)와 민감도(= 38/67)를 계산하였다. 평가결과 본 알고리즘이 기존의 miRNA 예측 프로그램보다 높은 특이도를 갖고 있으며, 유사한 수준의 민감도를 갖고 있음을 보여 주고 있다.

  • PDF

Practical Concerns in Enforcing Ethereum Smart Contracts as a Rewarding Platform in Decentralized Learning (연합학습의 인센티브 플랫폼으로써 이더리움 스마트 컨트랙트를 시행하는 경우의 실무적 고려사항)

  • Rahmadika, Sandi;Firdaus, Muhammad;Jang, Seolah;Rhee, Kyung-Hyune
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.12
    • /
    • pp.321-332
    • /
    • 2020
  • Decentralized approaches are extensively researched by academia and industry in order to cover up the flaws of existing systems in terms of data privacy. Blockchain and decentralized learning are prominent representatives of a deconcentrated approach. Blockchain is secure by design since the data record is irrevocable, tamper-resistant, consensus-based decision making, and inexpensive of overall transactions. On the other hand, decentralized learning empowers a number of devices collectively in improving a deep learning model without exposing the dataset publicly. To motivate participants to use their resources in building models, a decent and proportional incentive system is a necessity. A centralized incentive mechanism is likely inconvenient to be adopted in decentralized learning since it relies on the middleman that still suffers from bottleneck issues. Therefore, we design an incentive model for decentralized learning applications by leveraging the Ethereum smart contract. The simulation results satisfy the design goals. We also outline the concerns in implementing the presented scheme for sensitive data regarding privacy and data leakage.

A Study on the Self-Strengthening Smart IoT Hub Based on Strengthening Learning (강화 학습 기반의 독립형 스마트 IoT 허브 연구)

  • Lee, Yerin;Kim, Hyun;Lee, Innjie;Chai, Jihee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.288-290
    • /
    • 2019
  • 해가 갈수록 스마트홈을 구성하는 다양한 IoT 상품들이 출시되고 있고 이것들을 통합 관제하기 위한 IoT 허브(gateway) 등의 제어 장치들이 필요해 지고 있다. 구글의 'Google home', 아마존의 'Echo' 등이 대표적이다. 그러나 이러한 제어 장치들은 클라우드 기반으로 동작되기 때문에 비용이 발생하고 개인으로부터 생성되는 민감한 개인 데이터들의 보관방법에 대한 다양한 문제들을 내포하고 있다. 본 연구팀은 독립형 스마트 IoT 허브 개발을 통해 개인정보를 보호하고 다양한 IoT 단말기들을 손쉽고 간편하게 제어하고자 하였다. 그리고 IoT 단말기와 연결된 센서의 실시간 모니터링 및 분석을 인공지능 기술인 강화 학습 기술을 이용해 구현할 수 있었다. 네트워크 끊김, 고장 등 IoT 단말기 들의 다양한 통신값을 분석하고 이를 기반으로 안정적이고 효율적인 제어를 가능할 수 있게 되었다. IoT 단말기는 아두이노를 이용했으며 스마트 IoT 허브는 라즈베리 파이로 구현해 개인정보를 보다 안전하게 보호하고 다양한 IoT 단말기를 모니터링 하고 제어할 수 있는 독립형 IoT 허브를 설계하고 구현할 수 있었다.

Road Surface Damage Detection Based on Semi-supervised Learning Using Pseudo Labels (수도 레이블을 활용한 준지도 학습 기반의 도로노면 파손 탐지)

  • Chun, Chanjun;Ryu, Seung-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.18 no.4
    • /
    • pp.71-79
    • /
    • 2019
  • By using convolutional neural networks (CNNs) based on semantic segmentation, road surface damage detection has being studied. In order to generate the CNN model, it is essential to collect the input and the corresponding labeled images. Unfortunately, such collecting pairs of the dataset requires a great deal of time and costs. In this paper, we proposed a road surface damage detection technique based on semi-supervised learning using pseudo labels to mitigate such problem. The model is updated by properly mixing labeled and unlabeled datasets, and compares the performance against existing model using only labeled dataset. As a subjective result, it was confirmed that the recall was slightly degraded, but the precision was considerably improved. In addition, the $F_1-score$ was also evaluated as a high value.