• Title/Summary/Keyword: Unlabeled

Search Result 154, Processing Time 0.029 seconds

HR-evaluation sentence multi-classification and Analysis post-training effect using unlabeled data (HR-평가 문장 Multi-classification 및 Unlabeled data 를 활용한 Post-training 효과 분석)

  • Choi, Cheol;Lim, HeuiSeok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.424-427
    • /
    • 2022
  • 본 연구는 도메인 특성이 강한 HR 평가문장을 BERT PLM 모델을통해 4 가지 class 로 구분하는 문제를 다룬다. 다양한 PLM 모델 적용과 training data 수에 따른 모델 성능 비교를 통해 특정 도메인에 언어모델을 적용하기 위해서 필요한 기준을 확인하였다. 또한 Unlabeled 된 HR 분야 corpus 를 활용하여 BERT 모델을 post-training 한 HR-BERT 가 PLM 분석모델 정확도 향상에 미치는 결과를 탐구한다. 위와 같은 연구를 통해 HR 이 가지고 있는 가장 큰 text data 에 대한 활용 기반을 마련하고, 특수한 도메인 분야에 PLM 을 적용하기 위한 가이드를 제시하고자 한다

Performance Analysis of IP Packet over MPLS Domain (MPLS 영역에서 IP Packet의 성능 분석)

  • 박상준;박우출;이병호
    • Proceedings of the IEEK Conference
    • /
    • 2000.11a
    • /
    • pp.29-32
    • /
    • 2000
  • MPLS stands for “Multi-protocol Label Switching”. It's a layer 3 switching technology aimed at greatly improving the packet forwarding performance of the backbone routers in the Internet or other large networks. We compare the performance of unlabeled IP traffic and labeled IP traffic. In this paper, by varying the packet rate and measuring throughput of the flows we analyze the performance of labeled IP packet traffic and unlabeled IP packet traffic in IP over MPLS.

  • PDF

Semi-supervised regression based on support vector machine

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.2
    • /
    • pp.447-454
    • /
    • 2014
  • In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore semi-supervised learning algorithms have attracted much attentions. However, previous research mainly focuses on classication problems. In this paper, a semi-supervised regression method based on support vector regression (SVR) formulation that is proposed. The estimator is easily obtained via the dual formulation of the optimization problem. The experimental results with simulated and real data suggest superior performance of the our proposed method compared with standard SVR.

Semi-Supervised Learning Using Kernel Estimation

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.629-636
    • /
    • 2007
  • A kernel type semi-supervised estimate is proposed. The proposed estimate is based on the penalized least squares loss and the principle of Gaussian Random Fields Model. As a result, we can estimate the label of new unlabeled data without re-computation of the algorithm that is different from the existing transductive semi-supervised learning. Also our estimate is viewed as a general form of Gaussian Random Fields Model. We give experimental evidence suggesting that our estimate is able to use unlabeled data effectively and yields good classification.

  • PDF

A Semi-supervised Dimension Reduction Method Using Ensemble Approach (앙상블 접근법을 이용한 반감독 차원 감소 방법)

  • Park, Cheong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.147-150
    • /
    • 2012
  • While LDA is a supervised dimension reduction method which finds projective directions to maximize separability between classes, the performance of LDA is severely degraded when the number of labeled data is small. Recently semi-supervised dimension reduction methods have been proposed which utilize abundant unlabeled data and overcome the shortage of labeled data. However, matrix computation usually used in statistical dimension reduction methods becomes hindrance to make the utilization of a large number of unlabeled data difficult, and moreover too much information from unlabeled data may not so helpful compared to the increase of its processing time. In order to solve these problems, we propose an ensemble approach for semi-supervised dimension reduction. Extensive experimental results in text classification demonstrates the effectiveness of the proposed method.

Co-amplification at Lower Denaturation-temperature PCR Combined with Unlabled-probe High-resolution Melting to Detect KRAS Codon 12 and 13 Mutations in Plasma-circulating DNA of Pancreatic Adenocarcinoma Cases

  • Wu, Jiong;Zhou, Yan;Zhang, Chun-Yan;Song, Bin-Bin;Wang, Bei-Li;Pan, Bai-Shen;Lou, Wen-Hui;Guo, Wei
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.24
    • /
    • pp.10647-10652
    • /
    • 2015
  • Background: The aim of our study was to establish COLD-PCR combined with an unlabeled-probe HRM approach for detecting KRAS codon 12 and 13 mutations in plasma-circulating DNA of pancreatic adenocarcinoma (PA) cases as a novel and effective diagnostic technique. Materials and Methods: We tested the sensitivity and specificity of this approach with dilutions of known mutated cell lines. We screened 36 plasma-circulating DNA samples, 24 from the disease control group and 25 of a healthy group, to be subsequently sequenced to confirm mutations. Simultaneously, we tested the specimens using conventional PCR followed by HRM and then used target-DNA cloning and sequencing for verification. The ROC and respective AUC were calculated for KRAS mutations and/or serum CA 19-9. Results: It was found that the sensitivity of Sanger reached 0.5% with COLD-PCR, whereas that obtained after conventional PCR did 20%; that of COLD-PCR based on unlabeled-probe HRM, 0.1%. KRAS mutations were identified in 26 of 36 PA cases (72.2%), while none were detected in the disease control and/or healthy group. KRAS mutations were identified both in 26 PA tissues and plasma samples. The AUC of COLD-PCR based unlabeled probe HRM turned out to be 0.861, which when combined with CA 19-9 increased to 0.934. Conclusions: It was concluded that COLD-PCR with unlabeled-probe HRM can be a sensitive and accurate screening technique to detect KRAS codon 12 and 13 mutations in plasma-circulating DNA for diagnosing and treating PA.

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

  • Munkhdalai, Tsendsuren;Li, Meijing;Yun, Unil;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.575-588
    • /
    • 2012
  • Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

A study on the performance improvement of learning based on consistency regularization and unlabeled data augmentation (일치성규칙과 목표값이 없는 데이터 증대를 이용하는 학습의 성능 향상 방법에 관한 연구)

  • Kim, Hyunwoong;Seok, Kyungha
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.167-175
    • /
    • 2021
  • Semi-supervised learning uses both labeled data and unlabeled data. Recently consistency regularization is very popular in semi-supervised learning. Unsupervised data augmentation (UDA) that uses unlabeled data augmentation is also based on the consistency regularization. The Kullback-Leibler divergence is used for the loss of unlabeled data and cross-entropy for the loss of labeled data through UDA learning. UDA uses techniques such as training signal annealing (TSA) and confidence-based masking to promote performance. In this study, we propose to use Jensen-Shannon divergence instead of Kullback-Leibler divergence, reverse-TSA and not to use confidence-based masking for performance improvement. Through experiment, we show that the proposed technique yields better performance than those of UDA.

A study on semi-supervised kernel ridge regression estimation (준지도 커널능형회귀모형에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.341-353
    • /
    • 2013
  • In many practical machine learning and data mining applications, unlabeled data are inexpensive and easy to obtain. Semi-supervised learning try to use such data to improve prediction performance. In this paper, a semi-supervised regression method, semi-supervised kernel ridge regression estimation, is proposed on the basis of kernel ridge regression model. The proposed method does not require a pilot estimation of the label of the unlabeled data. This means that the proposed method has good advantages including less number of parameters, easy computing and good generalization ability. Experiments show that the proposed method can effectively utilize unlabeled data to improve regression estimation.

Semi-supervised Learning for the Positioning of a Smartphone-based Robot (스마트폰 로봇의 위치 인식을 위한 준 지도식 학습 기법)

  • Yoo, Jaehyun;Kim, H. Jin
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.6
    • /
    • pp.565-570
    • /
    • 2015
  • Supervised machine learning has become popular in discovering context descriptions from sensor data. However, collecting a large amount of labeled training data in order to guarantee good performance requires a great deal of expense and time. For this reason, semi-supervised learning has recently been developed due to its superior performance despite using only a small number of labeled data. In the existing semi-supervised learning algorithms, unlabeled data are used to build a graph Laplacian in order to represent an intrinsic data geometry. In this paper, we represent the unlabeled data as the spatial-temporal dataset by considering smoothly moving objects over time and space. The developed algorithm is evaluated for position estimation of a smartphone-based robot. In comparison with other state-of-art semi-supervised learning, our algorithm performs more accurate location estimates.