• Title/Summary/Keyword: Labeled Data

Search Result 464, Processing Time 0.031 seconds

Variational Auto-Encoder Based Semi-supervised Learning Scheme for Learner Classification in Intelligent Tutoring System (지능형 교육 시스템의 학습자 분류를 위한 Variational Auto-Encoder 기반 준지도학습 기법)

  • Jung, Seungwon;Son, Minjae;Hwang, Eenjun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.11
    • /
    • pp.1251-1258
    • /
    • 2019
  • Intelligent tutoring system enables users to effectively learn by utilizing various artificial intelligence techniques. For instance, it can recommend a proper curriculum or learning method to individual users based on their learning history. To do this effectively, user's characteristics need to be analyzed and classified based on various aspects such as interest, learning ability, and personality. Even though data labeled by the characteristics are required for more accurate classification, it is not easy to acquire enough amount of labeled data due to the labeling cost. On the other hand, unlabeled data should not need labeling process to make a large number of unlabeled data be collected and utilized. In this paper, we propose a semi-supervised learning method based on feedback variational auto-encoder(FVAE), which uses both labeled data and unlabeled data. FVAE is a variation of variational auto-encoder(VAE), where a multi-layer perceptron is added for giving feedback. Using unlabeled data, we train FVAE and fetch the encoder of FVAE. And then, we extract features from labeled data by using the encoder and train classifiers with the extracted features. In the experiments, we proved that FVAE-based semi-supervised learning was superior to VAE-based method in terms with accuracy and F1 score.

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

A Constraint-based Semi-supervised Clustering Through Initial Prediction of Unlabeled Data (비분류표시 데이터의 초기예측을 통한 제약기반 부분-지도 군집분석)

  • Kim, Eung-Gu;Jeon, Chi-Hyeok
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2007.11a
    • /
    • pp.383-387
    • /
    • 2007
  • Traditional clustering is regarded as an unsupervised teaming to analyze unlabeled data. Semi-supervised clustering uses a small amount of labeled data to predict labels of unlabeled data as well as to improve clustering performance. Previous methods use constraints generated from available labeled data in clustering process. We propose a new constraint-based semi-supervised clustering method by reflecting initial predicted labels of unlabeled data. We evaluate and compare the performance of the proposed method in terms of classification errors through numerical experiments with blinded labeled data.

  • PDF

Labeling Big Spatial Data: A Case Study of New York Taxi Limousine Dataset

  • AlBatati, Fawaz;Alarabi, Louai
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.207-212
    • /
    • 2021
  • Clustering Unlabeled Spatial-datasets to convert them to Labeled Spatial-datasets is a challenging task specially for geographical information systems. In this research study we investigated the NYC Taxi Limousine Commission dataset and discover that all of the spatial-temporal trajectory are unlabeled Spatial-datasets, which is in this case it is not suitable for any data mining tasks, such as classification and regression. Therefore, it is necessary to convert unlabeled Spatial-datasets into labeled Spatial-datasets. In this research study we are going to use the Clustering Technique to do this task for all the Trajectory datasets. A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of labeled datasets. Labeling a Big-data in many cases is a costly process. In this paper, we show the effectiveness of utilizing a Clustering Technique for labeling spatial data that leads to a high-accuracy classifier.

Semi-Supervised Learning to Predict Default Risk for P2P Lending (준지도학습 기반의 P2P 대출 부도 위험 예측에 대한 연구)

  • Kim, Hyun-jung
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.185-192
    • /
    • 2022
  • This study investigates the effect of the semi-supervised learning(SSL) method on predicting default risk of peer-to-peer(P2P) loans. Despite its proven performance, the supervised learning(SL) method requires labeled data, which may require a lot of effort and resources to collect. With the rapid growth of P2P platforms, the number of loans issued annually that have no clear final resolution is continuously increasing leading to abundance in unlabeled data. The research data of P2P loans used in this study were collected on the LendingClub platform. This is why an SSL model is needed to predict the default risk by using not only information from labeled loans(fully paid or defaulted) but also information from unlabeled loans. The results showed that in terms of default risk prediction and despite the use of a small number of labeled data, the SSL method achieved a much better default risk prediction performance than the SL method trained using a much larger set of labeled data.

A FRET Assay for Celiac Disease

  • Lee, Sae A;Cho, Chul Min;Jang, Il Ho;Kang, Jung Sook
    • Biomedical Science Letters
    • /
    • v.22 no.4
    • /
    • pp.160-166
    • /
    • 2016
  • To provide a basis for a homogeneous fluorescence resonance energy transfer (FRET) immunoassay for celiac disease, we carried out a FRET experiment using guinea pig tissue transglutaminase (tTG) and antibodies to tTG (anti-tTG) purified from rat serum. Fluorescein was utilized as the probe, and a nonfluorescent dye, QSY 7 served as the quencher. We labeled anti-tTG and tTG with fluorescein isothiocyanate and QSY 7 succinimidyl ester, respectively. Fluorescein-labeled anti-tTG was the donor, and QSY 7-labeled tTG was the acceptor of the FRET experiment. When we titrated fluorescein-labeled anti-tTG with QSY 7-labeled tTG, we observed a large decrease in the steady-state fluorescence intensity, which was due to strong FRET from fluorescein-labeled anti-tTG to QSY 7-labeled tTG. Using time-resolved fluorescence spectroscopy, we could also observe a decrease in the fluorescence lifetime, which confirms the steady-state data. We expect that these results might be useful in the development of a novel fluorescence immunoassay for an easy screening and follow-up of celiac patients.

Recommendations for the Selective Labeling of [$^{15}N$]-Labeled Amino Acids without Using Auxotrophic Strains

  • Chae, Young-Kee
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.4 no.2
    • /
    • pp.133-139
    • /
    • 2000
  • The strategy to incorporate [$^{15}$ N]-labeled amino acids were discussed. Instead of using specific auxotrophic strains for selective labeling, the prototrophic strain, BL2l(DE3), was used with a plasmid, pLysS, and found to be very effective for several amino acids including alanine, lysine, leucine, and threonine. Isoleucine, valine, glutamine, and tyrosine were also found to be effective despite some diffusion into other amino acids. Interesting result was obtained when [$^{15}$ N]-labeled glycine was tried: only glycines were labeled when amino acid mixture was added in the growth medium, and serines were co-labeled when amino acids were omitted. These results can be used as a guideline when selective labeling strategy is considered, and when the resulting data are interpreted.

  • PDF

An Efficient Detection Method for Rail Surface Defect using Limited Label Data (한정된 레이블 데이터를 이용한 효율적인 철도 표면 결함 감지 방법)

  • Seokmin Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.83-88
    • /
    • 2024
  • In this research, we propose a Semi-Supervised learning based railroad surface defect detection method. The Resnet50 model, pretrained on ImageNet, was employed for the training. Data without labels are randomly selected, and then labeled to train the ResNet50 model. The trained model is used to predict the results of the remaining unlabeled training data. The predicted values exceeding a certain threshold are selected, sorted in descending order, and added to the training data. Pseudo-labeling is performed based on the class with the highest probability during this process. An experiment was conducted to assess the overall class classification performance based on the initial number of labeled data. The results showed an accuracy of 98% at best with less than 10% labeled training data compared to the overall training data.

Semi-supervised Multi-view Manifold Discriminant Intact Space Learning

  • Han, Lu;Wu, Fei;Jing, Xiao-Yuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4317-4335
    • /
    • 2018
  • Semi-supervised multi-view latent space learning is gaining considerable popularity recently in many machine learning applications due to the high cost and difficulty to obtain the large amount of label information of data. Although some semi-supervised multi-view latent space learning methods have been presented, there is still much space for improvement: 1) How to learn latent discriminant intact feature representations by employing data of multiple views; 2) How to exploit the manifold structure of both labeled and unlabeled point in the learned latent intact space effectively. To address the above issues, we propose an approach called semi-supervised multi-view manifold discriminant intact space learning ($SM^2DIS$) for image classification in this paper. $SM^2DIS$ aims to seek a manifold discriminant intact space for data of different views by making use of both the discriminant information of labeled data and the manifold structure of both labeled and unlabeled data. Experimental results on MNIST, COIL-20, Multi-PIE, and Caltech-101 databases demonstrate the effectiveness and robustness of our proposed approach.

Semi-supervised Model for Fault Prediction using Tree Methods (트리 기법을 사용하는 세미감독형 결함 예측 모델)

  • Hong, Euyseok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.107-113
    • /
    • 2020
  • A number of studies have been conducted on predicting software faults, but most of them have been supervised models using labeled data as training data. Very few studies have been conducted on unsupervised models using only unlabeled data or semi-supervised models using enough unlabeled data and few labeled data. In this paper, we produced new semi-supervised models using tree algorithms in the self-training technique. As a result of the model performance evaluation experiment, the newly created tree models performed better than the existing models, and CollectiveWoods, in particular, outperformed other models. In addition, it showed very stable performance even in the case with very few labeled data.