• Title/Summary/Keyword: 준지도학습

Search Result 66, Processing Time 0.024 seconds

Semi-Supervised Learning to Predict Default Risk for P2P Lending (준지도학습 기반의 P2P 대출 부도 위험 예측에 대한 연구)

  • Kim, Hyun-jung
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.185-192
    • /
    • 2022
  • This study investigates the effect of the semi-supervised learning(SSL) method on predicting default risk of peer-to-peer(P2P) loans. Despite its proven performance, the supervised learning(SL) method requires labeled data, which may require a lot of effort and resources to collect. With the rapid growth of P2P platforms, the number of loans issued annually that have no clear final resolution is continuously increasing leading to abundance in unlabeled data. The research data of P2P loans used in this study were collected on the LendingClub platform. This is why an SSL model is needed to predict the default risk by using not only information from labeled loans(fully paid or defaulted) but also information from unlabeled loans. The results showed that in terms of default risk prediction and despite the use of a small number of labeled data, the SSL method achieved a much better default risk prediction performance than the SL method trained using a much larger set of labeled data.

Semi-supervised learning based malware detection technique (준지도 학습 기반의 멀웨어 탐지 기법)

  • Yu-Ran Jeon;Hye Yeon Shim;Il-Gu Lee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.254-257
    • /
    • 2024
  • 5G 통신과 인공지능 기술이 발전하고, 사물인터넷 기기의 수가 증가함에 따라 종래의 정보보호체계를 우회하는 지능적인 사이버 공격이 증가하고 있다. 그러나, 종래의 기계학습 기반 멀웨어 탐지 방식은 이미 알려진 멀웨어만 탐지할 수 있으며, 새로운 멀웨어는 탐지가 어렵거나, 기존의 알려진 멀웨어로 잘못 분류되는 문제가 있다. 본 연구에서는 비지도학습을 사용하여 알려지지 않은 멀웨어를 탐지하고, 새롭게 탐지된 멀웨어를 새로운 라벨로 분류하여 재학습하는 준지도 학습 기반의 멀웨어 탐지 기법을 제안한다. 다양한 데이터 환경에서 알려지지 않은 멀웨어 데이터가 탐지 모델로 입력될 때 제안한 방식의 성능을 평가했다. 실험 결과에 따르면 제안한 준지도 학습 기반의 멀웨어 탐지 방법은 종래의 방식 대비 정확도를 약 16% 개선했다.

A Study on the Prediction of Ship Collision Based on Semi-Supervised Learning (준지도 학습 기반 선박충돌 예측에 대한 연구)

  • Ho-June Seok;Seung Sim;Jeong-Hun Woo;Jun-Rae Cho;Deuk-Jae Cho;Jong-Hwa Baek;Jaeyong Jung
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2023.05a
    • /
    • pp.204-205
    • /
    • 2023
  • This study studied a prediction model for sending collision alarms for small fishing boats based on semi-supervised learning(SSL). The supervised learning (SL) method requires a large number of labeled data, but the labeling process takes a lot of resources and time. This study used service data collected through a data pipeline linked to 'intelligent maritime traffic information service' and data collected from real-sea experiment. The model accuracy was improved as a result of learning not only real-sea experiment data with labeling determined based on actual user satisfaction but also service data without label determined together.

  • PDF

Semi-supervised learning of speech recognizers based on variational autoencoder and unsupervised data augmentation (변분 오토인코더와 비교사 데이터 증강을 이용한 음성인식기 준지도 학습)

  • Jo, Hyeon Ho;Kang, Byung Ok;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.6
    • /
    • pp.578-586
    • /
    • 2021
  • We propose a semi-supervised learning method based on Variational AutoEncoder (VAE) and Unsupervised Data Augmentation (UDA) to improve the performance of an end-to-end speech recognizer. In the proposed method, first, the VAE-based augmentation model and the baseline end-to-end speech recognizer are trained using the original speech data. Then, the baseline end-to-end speech recognizer is trained again using data augmented from the learned augmentation model. Finally, the learned augmentation model and end-to-end speech recognizer are re-learned using the UDA-based semi-supervised learning method. As a result of the computer simulation, the augmentation model is shown to improve the Word Error Rate (WER) of the baseline end-to-end speech recognizer, and further improve its performance by combining it with the UDA-based learning method.

Semi-Supervised Learning by Gaussian Mixtures (정규 혼합분포를 이용한 준지도 학습)

  • Choi, Byoung-Jeong;Chae, Youn-Seok;Choi, Woo-Young;Park, Chang-Yi;Koo, Ja-Yong
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.5
    • /
    • pp.825-833
    • /
    • 2008
  • Discriminant analysis based on Gaussian mixture models, an useful tool for multi-class classifications, can be extended to semi-supervised learning. We consider a model selection problem for a Gaussian mixture model in semi-supervised learning. More specifically, we adopt Bayesian information criterion to determine the number of subclasses in the mixture model. Through simulations, we illustrate the usefulness of the criterion.

Smoothing parameter selection in semi-supervised learning (준지도 학습의 모수 선택에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.993-1000
    • /
    • 2016
  • Semi-supervised learning makes it easy to use an unlabeled data in the supervised learning such as classification. Applying the semi-supervised learning on the regression analysis, we propose two methods for a better regression function estimation. The proposed methods have been assumed different marginal densities of independent variables and different smoothing parameters in unlabeled and labeled data. We shows that the overfitted pilot estimator should be used to achieve the fastest convergence rate and unlabeled data may help to improve the convergence rate with well estimated smoothing parameters. We also find the conditions of smoothing parameters to achieve optimal convergence rate.

Semi-Automatic Scoring for Short Korean Free-Text Responses Using Semi-Supervised Learning (준지도학습 방법을 이용한 한국어 서답형 문항 반자동 채점)

  • Cheon, Min-Ah;Seo, Hyeong-Won;Kim, Jae-Hoon;Noh, Eun-Hee;Sung, Kyung-Hee;Lim, EunYoung
    • Korean Journal of Cognitive Science
    • /
    • v.26 no.2
    • /
    • pp.147-165
    • /
    • 2015
  • Through short-answer questions, we can reflect the depth of students' understanding and higher-order thinking skills. Scoring for short-answer questions may take long time and may be an issue on consistency of grading. To alleviate such the suffering, automated scoring systems are widely used in Europe and America, but are in the initial stage in research in Korea. In this paper, we propose a semi-automatic scoring system for short Korean free-text responses using semi-supervised learning. First of all, based on the similarity score between students' answers and model answers, the proposed system grades students' answers and the scored answers with high reliability have been included in the model answers through the thorough test. This process repeats until all answers are scored. The proposed system is used experimentally in Korean and social studies in Nationwide Scholastic Achievement Test. We have confirmed that the processing time and the consistency of grades are promisingly improved. Using the system, various assessment methods have got to be developed and comparative studies need to be performed before applying to school fields.

Response Modeling with Semi-Supervised Support Vector Regression (준지도 지지 벡터 회귀 모델을 이용한 반응 모델링)

  • Kim, Dong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.125-139
    • /
    • 2014
  • In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

Efficient Hangul Word Processor (HWP) Malware Detection Using Semi-Supervised Learning with Augmented Data Utility Valuation (효율적인 HWP 악성코드 탐지를 위한 데이터 유용성 검증 및 확보 기반 준지도학습 기법)

  • JinHyuk Son;Gihyuk Ko;Ho-Mook Cho;Young-Kuk Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.1
    • /
    • pp.71-82
    • /
    • 2024
  • With the advancement of information and communication technology (ICT), the use of electronic document types such as PDF, MS Office, and HWP files has increased. Such trend has led the cyber attackers increasingly try to spread malicious documents through e-mails and messengers. To counter such attacks, AI-based methodologies have been actively employed in order to detect malicious document files. The main challenge in detecting malicious HWP(Hangul Word Processor) files is the lack of quality dataset due to its usage is limited in Korea, compared to PDF and MS-Office files that are highly being utilized worldwide. To address this limitation, data augmentation have been proposed to diversify training data by transforming existing dataset, but as the usefulness of the augmented data is not evaluated, augmented data could end up harming model's performance. In this paper, we propose an effective semi-supervised learning technique in detecting malicious HWP document files, which improves overall AI model performance via quantifying the utility of augmented data and filtering out useless training data.

Automatic Text Categorization based on Semi-Supervised Learning (준지도 학습 기반의 자동 문서 범주화)

  • Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.325-334
    • /
    • 2008
  • The goal of text categorization is to classify documents into a certain number of pre-defined categories. The previous studies in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose a new text categorization method based on semi-supervised learning. The proposed method uses only unlabeled documents and keywords of each category, and it automatically constructs training data from them. Then a text classifier learns with them and classifies text documents. The proposed method shows a similar degree of performance, compared with the traditional supervised teaming methods. Therefore, this method can be used in the areas where low-cost text categorization is needed. It can also be used for creating labeled training documents.