• Title/Summary/Keyword: Unlabeled

Search Result 154, Processing Time 0.022 seconds

Response Modeling with Semi-Supervised Support Vector Regression (준지도 지지 벡터 회귀 모델을 이용한 반응 모델링)

  • Kim, Dong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.125-139
    • /
    • 2014
  • In this paper, I propose a response modeling with a Semi-Supervised Support Vector Regression (SS-SVR) algorithm. In order to increase the accuracy and profit of response modeling, unlabeled data in the customer dataset are used with the labeled data during training. The proposed SS-SVR algorithm is designed to be a batch learning to reduce the training complexity. The label distributions of unlabeled data are estimated in order to consider the uncertainty of labeling. Then, multiple training data are generated from the unlabeled data and their estimated label distributions with oversampling to construct the training dataset with the labeled data. Finally, a data selection algorithm, Expected Margin based Pattern Selection (EMPS), is employed to reduce the training complexity. The experimental results conducted on a real-world marketing dataset showed that the proposed response modeling method trained efficiently, and improved the accuracy and the expected profit.

Smoothing parameter selection in semi-supervised learning (준지도 학습의 모수 선택에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.993-1000
    • /
    • 2016
  • Semi-supervised learning makes it easy to use an unlabeled data in the supervised learning such as classification. Applying the semi-supervised learning on the regression analysis, we propose two methods for a better regression function estimation. The proposed methods have been assumed different marginal densities of independent variables and different smoothing parameters in unlabeled and labeled data. We shows that the overfitted pilot estimator should be used to achieve the fastest convergence rate and unlabeled data may help to improve the convergence rate with well estimated smoothing parameters. We also find the conditions of smoothing parameters to achieve optimal convergence rate.

An active learning method with difficulty learning mechanism for crack detection

  • Shu, Jiangpeng;Li, Jun;Zhang, Jiawei;Zhao, Weijian;Duan, Yuanfeng;Zhang, Zhicheng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.195-206
    • /
    • 2022
  • Crack detection is essential for inspection of existing structures and crack segmentation based on deep learning is a significant solution. However, datasets are usually one of the key issues. When building a new dataset for deep learning, laborious and time-consuming annotation of a large number of crack images is an obstacle. The aim of this study is to develop an approach that can automatically select a small portion of the most informative crack images from a large pool in order to annotate them, not to label all crack images. An active learning method with difficulty learning mechanism for crack segmentation tasks is proposed. Experiments are carried out on a crack image dataset of a steel box girder, which contains 500 images of 320×320 size for training, 100 for validation, and 190 for testing. In active learning experiments, the 500 images for training are acted as unlabeled image. The acquisition function in our method is compared with traditional acquisition functions, i.e., Query-By-Committee (QBC), Entropy, and Core-set. Further, comparisons are made on four common segmentation networks: U-Net, DeepLabV3, Feature Pyramid Network (FPN), and PSPNet. The results show that when training occurs with 200 (40%) of the most informative crack images that are selected by our method, the four segmentation networks can achieve 92%-95% of the obtained performance when training takes place with 500 (100%) crack images. The acquisition function in our method shows more accurate measurements of informativeness for unlabeled crack images compared to the four traditional acquisition functions at most active learning stages. Our method can select the most informative images for annotation from many unlabeled crack images automatically and accurately. Additionally, the dataset built after selecting 40% of all crack images can support crack segmentation networks that perform more than 92% when all the images are used.

F_MixBERT: Sentiment Analysis Model using Focal Loss for Imbalanced E-commerce Reviews

  • Fengqian Pang;Xi Chen;Letong Li;Xin Xu;Zhiqiang Xing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.263-283
    • /
    • 2024
  • Users' comments after online shopping are critical to product reputation and business improvement. These comments, sometimes known as e-commerce reviews, influence other customers' purchasing decisions. To confront large amounts of e-commerce reviews, automatic analysis based on machine learning and deep learning draws more and more attention. A core task therein is sentiment analysis. However, the e-commerce reviews exhibit the following characteristics: (1) inconsistency between comment content and the star rating; (2) a large number of unlabeled data, i.e., comments without a star rating, and (3) the data imbalance caused by the sparse negative comments. This paper employs Bidirectional Encoder Representation from Transformers (BERT), one of the best natural language processing models, as the base model. According to the above data characteristics, we propose the F_MixBERT framework, to more effectively use inconsistently low-quality and unlabeled data and resolve the problem of data imbalance. In the framework, the proposed MixBERT incorporates the MixMatch approach into BERT's high-dimensional vectors to train the unlabeled and low-quality data with generated pseudo labels. Meanwhile, data imbalance is resolved by Focal loss, which penalizes the contribution of large-scale data and easily-identifiable data to total loss. Comparative experiments demonstrate that the proposed framework outperforms BERT and MixBERT for sentiment analysis of e-commerce comments.

Semi-supervised classification with LS-SVM formulation (최소제곱 서포터벡터기계 형태의 준지도분류)

  • Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.3
    • /
    • pp.461-470
    • /
    • 2010
  • Semi supervised classification which is a method using labeled and unlabeled data has considerable attention in recent years. Among various methods the graph based manifold regularization is proved to be an attractive method. Least squares support vector machine is gaining a lot of popularities in analyzing nonlinear data. We propose a semi supervised classification algorithm using the least squares support vector machines. The proposed algorithm is based on the manifold regularization. In this paper we show that the proposed method can use unlabeled data efficiently.

An Improved Co-training Method without Feature Split (속성분할이 없는 향상된 협력학습 방법)

  • 이창환;이소민
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1259-1265
    • /
    • 2004
  • In many applications, producing labeled data is costly and time consuming while an enormous amount of unlabeled data is available with little cost. Therefore, it is natural to ask whether we can take advantage of these unlabeled data in classification teaming. In machine learning literature, the co-training method has been widely used for this purpose. However, the current co-training method requires the entire features to be split into two independent sets. Therefore, in this paper, we improved the current co-training method in a number of ways, and proposed a new co-training method which do not need the feature split. Experimental results show that our proposed method can significantly improve the performance of the current co-training algorithm.

Ultrastructural Localization of GABAergic Neuronal Components in the Dog Basilar Pons (개의 교핵내 GABA성 신경세포 성분의 미세구조적 위치관찰)

  • Lee, Hyun-Sook
    • Applied Microscopy
    • /
    • v.25 no.1
    • /
    • pp.65-74
    • /
    • 1995
  • An immunocytochemical study of GABA-positive neuronal elements was performed at the electron microscopic level to examine subcellular distribution of the inhibitory neurotransmitter in the dog basilar pons. Electron-dense reaction product was observed in neuronal somata and dendritic processes. One or more unlabeled axon terminals made asymmetric synaptic contacts with these GABAergic somatic and dendritic profiles. A large number of GABA-positive axon terminals were also observed. They made symmetric as well as asymmetric synaptic contacts with unlabeled dendritic profiles. In axo-axonic synapses, postsynaptic axon-like processes were consistently GABA-immunoreactive. These observations suggest that the inhibitory local circuit neurons in the dog basilar pons play a major role in cerebro-ponto-cerebellar circuitry by integrating various afferent inputs and conveying them into the cerebellar cortex and the deep cerebellar nuclei.

  • PDF

Semi-supervised Multi-view Manifold Discriminant Intact Space Learning

  • Han, Lu;Wu, Fei;Jing, Xiao-Yuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4317-4335
    • /
    • 2018
  • Semi-supervised multi-view latent space learning is gaining considerable popularity recently in many machine learning applications due to the high cost and difficulty to obtain the large amount of label information of data. Although some semi-supervised multi-view latent space learning methods have been presented, there is still much space for improvement: 1) How to learn latent discriminant intact feature representations by employing data of multiple views; 2) How to exploit the manifold structure of both labeled and unlabeled point in the learned latent intact space effectively. To address the above issues, we propose an approach called semi-supervised multi-view manifold discriminant intact space learning ($SM^2DIS$) for image classification in this paper. $SM^2DIS$ aims to seek a manifold discriminant intact space for data of different views by making use of both the discriminant information of labeled data and the manifold structure of both labeled and unlabeled data. Experimental results on MNIST, COIL-20, Multi-PIE, and Caltech-101 databases demonstrate the effectiveness and robustness of our proposed approach.

A Label Inference Algorithm Considering Vertex Importance in Semi-Supervised Learning (준지도 학습에서 꼭지점 중요도를 고려한 레이블 추론)

  • Oh, Byonghwa;Yang, Jihoon;Lee, Hyun-Jin
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1561-1567
    • /
    • 2015
  • Abstract Semi-supervised learning is an area in machine learning that employs both labeled and unlabeled data in order to train a model and has the potential to improve prediction performance compared to supervised learning. Graph-based semi-supervised learning has recently come into focus with two phases: graph construction, which converts the input data into a graph, and label inference, which predicts the appropriate labels for unlabeled data using the constructed graph. The inference is based on the smoothness assumption feature of semi-supervised learning. In this study, we propose an enhanced label inference algorithm by incorporating the importance of each vertex. In addition, we prove the convergence of the suggested algorithm and verify its excellence.