• Title/Summary/Keyword: Unlabeled

Search Result 154, Processing Time 0.027 seconds

Class Specific Autoencoders Enhance Sample Diversity

  • Kumar, Teerath;Park, Jinbae;Ali, Muhammad Salman;Uddin, AFM Shahab;Bae, Sung-Ho
    • Journal of Broadcast Engineering
    • /
    • v.26 no.7
    • /
    • pp.844-854
    • /
    • 2021
  • Semi-supervised learning (SSL) and few-shot learning (FSL) have shown impressive performance even then the volume of labeled data is very limited. However, SSL and FSL can encounter a significant performance degradation if the diversity gap between the labeled and unlabeled data is high. To reduce this diversity gap, we propose a novel scheme that relies on an autoencoder for generating pseudo examples. Specifically, the autoencoder is trained on a specific class using the available labeled data and the decoder of the trained autoencoder is then used to generate N samples of that specific class based on N random noise, sampled from a standard normal distribution. The above process is repeated for all the classes. Consequently, the generated data reduces the diversity gap and enhances the model performance. Extensive experiments on MNIST and FashionMNIST datasets for SSL and FSL verify the effectiveness of the proposed approach in terms of classification accuracy and robustness against adversarial attacks.

Tri-training algorithm based on cross entropy and K-nearest neighbors for network intrusion detection

  • Zhao, Jia;Li, Song;Wu, Runxiu;Zhang, Yiying;Zhang, Bo;Han, Longzhe
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3889-3903
    • /
    • 2022
  • To address the problem of low detection accuracy due to training noise caused by mislabeling when Tri-training for network intrusion detection (NID), we propose a Tri-training algorithm based on cross entropy and K-nearest neighbors (TCK) for network intrusion detection. The proposed algorithm uses cross-entropy to replace the classification error rate to better identify the difference between the practical and predicted distributions of the model and reduce the prediction bias of mislabeled data to unlabeled data; K-nearest neighbors are used to remove the mislabeled data and reduce the number of mislabeled data. In order to verify the effectiveness of the algorithm proposed in this paper, experiments were conducted on 12 UCI datasets and NSL-KDD network intrusion datasets, and four indexes including accuracy, recall, F-measure and precision were used for comparison. The experimental results revealed that the TCK has superior performance than the conventional Tri-training algorithms and the Tri-training algorithms using only cross-entropy or K-nearest neighbor strategy.

Real-time Ball Detection and Tracking with P-N Learning in Soccer Game (P-N 러닝을 이용한 실시간 축구공 검출 및 추적)

  • Huang, Shuai-Jie;Li, Gen;Lee, Yill-Byung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.447-450
    • /
    • 2011
  • This paper shows the application of P-N Learning [4] method in the soccer ball detection and improvement for increasing the speed of processing. In the P-N learning, the learning process is guided by positive (P) and negative (N) constraints which restrict the labeling of the unlabeled data, identify examples that have been classified in contradiction with structural constraints and augment the training set with the corrected samples in an iterative process. But for the long-view in the soccer game, P-N learning will produce so many ferns that more time is spent than other methods. We propose that color histogram of each frame is constructed to delete the unnecessary details in order to decreasing the number of feature points. We use the mask to eliminate the gallery region and Line Hough Transform to remove the line and adjust the P-N learning's parameters to optimize accurate and speed.

Pretext Task Analysis for Self-Supervised Learning Application of Medical Data (의료 데이터의 자기지도학습 적용을 위한 pretext task 분석)

  • Kong, Heesan;Park, Jaehun;Kim, Kwangsu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.38-40
    • /
    • 2021
  • Medical domain has a massive number of data records without the response value. Self-supervised learning is a suitable method for medical data since it learns pretext-task and supervision, which the model can understand the semantic representation of data without response values. However, since self-supervised learning performance depends on the expression learned by the pretext-task, it is necessary to define an appropriate Pretext-task with data feature consideration. In this paper, to actively exploit the unlabeled medical data into artificial intelligence research, experimentally find pretext-tasks that suitable for the medical data and analyze the result. We use the x-ray image dataset which is effectively utilizable for the medical domain.

  • PDF

Anomaly Detection via Pattern Dictionary Method and Atypicality in Application (패턴사전과 비정형성을 통한 이상치 탐지방법 적용)

  • Sehong Oh;Jongsung Park;Youngsam Yoon
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.6
    • /
    • pp.481-486
    • /
    • 2023
  • Anomaly detection holds paramount significance across diverse fields, encompassing fraud detection, risk mitigation, and sensor evaluation tests. Its pertinence extends notably to the military, particularly within the Warrior Platform, a comprehensive combat equipment system with wearable sensors. Hence, we propose a data-compression-based anomaly detection approach tailored to unlabeled time series and sequence data. This method entailed the construction of two distinctive features, typicality and atypicality, to discern anomalies effectively. The typicality of a test sequence was determined by evaluating the compression efficacy achieved through the pattern dictionary. This dictionary was established based on the frequency of all patterns identified in a training sequence generated for each sensor within Warrior Platform. The resulting typicality served as an anomaly score, facilitating the identification of anomalous data using a predetermined threshold. To improve the performance of the pattern dictionary method, we leveraged atypicality to discern sequences that could undergo compression independently without relying on the pattern dictionary. Consequently, our refined approach integrated both typicality and atypicality, augmenting the effectiveness of the pattern dictionary method. Our proposed method exhibited heightened capability in detecting a spectrum of unpredictable anomalies, fortifying the stability of wearable sensors prevalent in military equipment, including the Army TIGER 4.0 system.

Crop Leaf Disease Identification Using Deep Transfer Learning

  • Changjian Zhou;Yutong Zhang;Wenzhong Zhao
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.149-158
    • /
    • 2024
  • Traditional manual identification of crop leaf diseases is challenging. Owing to the limitations in manpower and resources, it is challenging to explore crop diseases on a large scale. The emergence of artificial intelligence technologies, particularly the extensive application of deep learning technologies, is expected to overcome these challenges and greatly improve the accuracy and efficiency of crop disease identification. Crop leaf disease identification models have been designed and trained using large-scale training data, enabling them to predict different categories of diseases from unlabeled crop leaves. However, these models, which possess strong feature representation capabilities, require substantial training data, and there is often a shortage of such datasets in practical farming scenarios. To address this issue and improve the feature learning abilities of models, this study proposes a deep transfer learning adaptation strategy. The novel proposed method aims to transfer the weights and parameters from pre-trained models in similar large-scale training datasets, such as ImageNet. ImageNet pre-trained weights are adopted and fine-tuned with the features of crop leaf diseases to improve prediction ability. In this study, we collected 16,060 crop leaf disease images, spanning 12 categories, for training. The experimental results demonstrate that an impressive accuracy of 98% is achieved using the proposed method on the transferred ResNet-50 model, thereby confirming the effectiveness of our transfer learning approach.

Semi-Supervised SAR Image Classification via Adaptive Threshold Selection (선별적인 임계값 선택을 이용한 준지도 학습의 SAR 분류 기술)

  • Jaejun Do;Minjung Yoo;Jaeseok Lee;Hyoi Moon;Sunok Kim
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.3
    • /
    • pp.319-328
    • /
    • 2024
  • Semi-supervised learning is a good way to train a classification model using a small number of labeled and large number of unlabeled data. We applied semi-supervised learning to a synthetic aperture radar(SAR) image classification model with a limited number of datasets that are difficult to create. To address the previous difficulties, semi-supervised learning uses a model trained with a small amount of labeled data to generate and learn pseudo labels. Besides, a lot of number of papers use a single fixed threshold to create pseudo labels. In this paper, we present a semi-supervised synthetic aperture radar(SAR) image classification method that applies different thresholds for each class instead of all classes sharing a fixed threshold to improve SAR classification performance with a small number of labeled datasets.

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

  • William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.789-816
    • /
    • 2019
  • Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.

Presence of Pituitary Specific Transcription Factor Pit-1 in the Rat Brain: Intracerebroventricular Administration of Antisense Pit-1 Oligodeoxynucleotide Decreases Brain Prolactin mRNA Level

  • Tae Woo Kim;Hyun-Ju Kim;Byung Ju Lee
    • Animal cells and systems
    • /
    • v.3 no.3
    • /
    • pp.311-317
    • /
    • 1999
  • Prolactin (PRL) was reported to be locally synthesized in many brain areas including the hypothalamus, thalamus (TH) and hippocampus (HIP). In the pituitary lactotrophs, PRL synthesis is dependent upon a pituitary-specific transcription factor, Pit-1. In the present study, we attempted to identify Pit-1 or Pit-1-like protein in brain areas known as the synthetic sites of PRL. Reverse transcription-polymerase chain reaction (RT-PCR) and Northern blot analysis showed the same Pit-1 transcripts in brain areas such as the medial basal hypothalamus (MBH), preoptic area (POA), TH, and HIP with the Pit-1 transcripts in the anterior pituitary (AP). Electrophoretic mobility shift assay (EMSA) was run with nuclear protein extracts from brain tissues using a double strand oligomer probe containing a putative Pit-1 binding domain. Shifted bands were found in EMSA results with nuclear proteins from MBH, POA, TH and HIP. Specific binding of the Pit-1-like protein was further confirmed by competition with an unlabeled cold probe. Antisense Pit-1 oligodeoxynucleotide (Pit-1 ODN), which was designed to bind to the Pit-1 translation initiation site and block Pit-1 biosynthesis, was used to test Pit-1 dependent brain PRL transcription. Two nmol of Pit-1 ODN was introduced into the lateral ventricle of a 60-day old male rat brain. RNA blot hybridization and in situ hybridization indicated a decrease of PRL mRNA signals by the treatment of Pit-1 ODN. Taken together, the present study suggests that Pit-1 may play an important role in the transcriptional regulation of local PRL synthesis in the brain.

  • PDF

A study on the waveform-based end-to-end deep convolutional neural network for weakly supervised sound event detection (약지도 음향 이벤트 검출을 위한 파형 기반의 종단간 심층 콘볼루션 신경망에 대한 연구)

  • Lee, Seokjin;Kim, Minhan;Jeong, Youngho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.24-31
    • /
    • 2020
  • In this paper, the deep convolutional neural network for sound event detection is studied. Especially, the end-to-end neural network, which generates the detection results from the input audio waveform, is studied for weakly supervised problem that includes weakly-labeled and unlabeled dataset. The proposed system is based on the network structure that consists of deeply-stacked 1-dimensional convolutional neural networks, and enhanced by the skip connection and gating mechanism. Additionally, the proposed system is enhanced by the sound event detection and post processings, and the training step using the mean-teacher model is added to deal with the weakly supervised data. The proposed system was evaluated by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 dataset, and the result shows that the proposed system has F1-scores of 54 % (segment-based) and 32 % (event-based).