• Title/Summary/Keyword: label noise

Search Result 40, Processing Time 0.024 seconds

A Comparative Study of Classification Methods Using Data with Label Noise (레이블 노이즈가 존재하는 자료의 판별분석 방법 비교연구)

  • Kwon, So Young;Kim, Kyoung Hee
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2853-2864
    • /
    • 2018
  • Discriminant analysis predicts a class label of a new observation with an unknown label, using information from the existing labeled data. Hence, observed labels play a critical role in the analysis and we usually assume that these labels are correct. If the observed label contains an error, the data has label noise. Label noise can frequently occur in real data, which would affect classification performance. In order to resolve this, a comparative study was carried out using simulated data with label noise. In particular, we considered 4 different classification techniques such as LDA (linear discriminant analysis classifiers), QDA (quadratic discriminant analysis classifiers), KNN (k-nearest neighbour), and SVM (support vector machine). Then we evaluated each method via average accuracy using generated data from various scenarios. The effect of label noise was investigated through its occurrence rate and type (noise location). We confirmed that the label noise is a significant factor influencing the classification performance.

A Novel Classification Model for Efficient Patent Information Research (효율적인 특허정보 조사를 위한 분류 모형)

  • Kim, Youngho;Park, Sangsung;Jang, Dongsik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.103-110
    • /
    • 2019
  • A patent contains detailed information of the developed technology and is published to the public. Thus, patents can be used to overcome the limitations of traditional technology trend research and prediction techniques. Recently, due to the advantages of patented analytical methodology, IP R&D is carried out worldwide. The patent is big data and has a huge amount, various domains, and structured and unstructured data characteristics. For this reason, there are many difficulties in collecting and researching patent information. Patent research generally writes the Search formula to collect patent documents from DB. The collected patent documents contain some noise patents that are irrelevant to the purpose of analysis, so they are removed. However, eliminating noise patents is a manual task of reading and classifying technology, which is time consuming and expensive. In this study, we propose a model that automatically classifies The Noise patent for efficient patent information research. The proposed method performs Patent Embedding using Word2Vec and generates Noise seed label. In addition, noise patent classification is performed using the Random forest. The experimental data is published and registered with the USPTO among the patents related to Ocean Surveillance & Tracking Network technology. As a result of experimenting with the proposed model, it showed 73% accuracy with the label actually given by experts.

High Representation based GAN defense for Adversarial Attack

  • Sutanto, Richard Evan;Lee, Suk Ho
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.141-146
    • /
    • 2019
  • These days, there are many applications using neural networks as parts of their system. On the other hand, adversarial examples have become an important issue concerining the security of neural networks. A classifier in neural networks can be fooled and make it miss-classified by adversarial examples. There are many research to encounter adversarial examples by using denoising methods. Some of them using GAN (Generative Adversarial Network) in order to remove adversarial noise from input images. By producing an image from generator network that is close enough to the original clean image, the adversarial examples effects can be reduced. However, there is a chance when adversarial noise can survive the approximation process because it is not like a normal noise. In this chance, we propose a research that utilizes high-level representation in the classifier by combining GAN network with a trained U-Net network. This approach focuses on minimizing the loss function on high representation terms, in order to minimize the difference between the high representation level of the clean data and the approximated output of the noisy data in the training dataset. Furthermore, the generated output is checked whether it shows minimum error compared to true label or not. U-Net network is trained with true label to make sure the generated output gives minimum error in the end. At last, the remaining adversarial noise that still exist after low-level approximation can be removed with the U-Net, because of the minimization on high representation terms.

Food Detection by Fine-Tuning Pre-trained Convolutional Neural Network Using Noisy Labels

  • Alshomrani, Shroog;Aljoudi, Lina;Aljabri, Banan;Al-Shareef, Sarah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.7
    • /
    • pp.182-190
    • /
    • 2021
  • Deep learning is an advanced technology for large-scale data analysis, with numerous promising cases like image processing, object detection and significantly more. It becomes customarily to use transfer learning and fine-tune a pre-trained CNN model for most image recognition tasks. Having people taking photos and tag themselves provides a valuable resource of in-data. However, these tags and labels might be noisy as people who annotate these images might not be experts. This paper aims to explore the impact of noisy labels on fine-tuning pre-trained CNN models. Such effect is measured on a food recognition task using Food101 as a benchmark. Four pre-trained CNN models are included in this study: InceptionV3, VGG19, MobileNetV2 and DenseNet121. Symmetric label noise will be added with different ratios. In all cases, models based on DenseNet121 outperformed the other models. When noisy labels were introduced to the data, the performance of all models degraded almost linearly with the amount of added noise.

Towards Improved Performance on Plant Disease Recognition with Symptoms Specific Annotation

  • Dong, Jiuqing;Fuentes, Alvaro;Yoon, Sook;Kim, Taehyun;Park, Dong Sun
    • Smart Media Journal
    • /
    • v.11 no.4
    • /
    • pp.38-45
    • /
    • 2022
  • Object detection models have become the current tool of choice for plant disease detection in precision agriculture. Most existing research improves the performance by ameliorating networks and optimizing the loss function. However, the data-centric part of a whole project also needs more investigation. In this paper, we proposed a systematic strategy with three different annotation methods for plant disease detection: local, semi-global, and global label. Experimental results on our paprika disease dataset show that a single class annotation with semi-global boxes may improve accuracy. In addition, we also studied the noise factor during the labeling process. An ablation study shows that annotation noise within 10% is acceptable for keeping good performance. Overall, this data-centric numerical analysis helps us to understand the significance of annotation methods, which provides practitioners a way to obtain higher performance and reduce annotation costs on plant disease detection tasks. Our work encourages researchers to pay more attention to label quality and the essential issues of labeling methods.

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

Robust Video-Based Barcode Recognition via Online Sequential Filtering

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.8-16
    • /
    • 2014
  • We consider the visual barcode recognition problem in a noisy video data setup. Unlike most existing single-frame recognizers that require considerable user effort to acquire clean, motionless and blur-free barcode signals, we eliminate such extra human efforts by proposing a robust video-based barcode recognition algorithm. We deal with a sequence of noisy blurred barcode image frames by posing it as an online filtering problem. In the proposed dynamic recognition model, at each frame we infer the blur level of the frame as well as the digit class label. In contrast to a frame-by-frame based approach with heuristic majority voting scheme, the class labels and frame-wise noise levels are propagated along the frame sequences in our model, and hence we exploit all cues from noisy frames that are potentially useful for predicting the barcode label in a probabilistically reasonable sense. We also suggest a visual barcode tracking approach that efficiently localizes barcode areas in video frames. The effectiveness of the proposed approaches is demonstrated empirically on both synthetic and real data setup.

The Improved Watershed Algorithm using Adaptive Local Threshold (적응적 지역 임계치를 이용한 개선된 워터쉐드 알고리즘)

  • Lee Seok-Hee;Kwon Dong-Jin;Kwak Nae-Joung;Ahn Jae-Hyeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.891-894
    • /
    • 2004
  • This paper proposes an improved image segmentation algorithm by the watershed algorithm based on the local adaptive threshold on local minima search and the fixing threshold on label allocation. The previous watershed algorithm generates the problem of over-segmentation. The over-segmentation makes the boundary in the inaccuracy region by occurring around the object. In order to solve those problems we quantize the input color image by the vector quantization, remove noise and find the gradient image. We sorted local minima applying the local adaptive threshold on local minima search of the input color image. The simulation results show that the proposed algorithm controls over-segmentation and makes the fine boundary around segmented region applying the fixing threshold based on sorted local minima on label allocation.

  • PDF

Label-free Femtomolar Detection of Cancer Biomarker by Reduced Graphene Oxide Field-effect Transistor

  • Kim, Duck-Jin;Sohn, Il-Yung;Jung, Jin-Heak;Yoon, Ok-Ja;Lee, N.E.;Park, Joon-Shik
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.02a
    • /
    • pp.549-549
    • /
    • 2012
  • Early detection of cancer biomarkers in the blood is of vital importance for reducing the mortality and morbidity in a number of cancers. From this point of view, immunosensors based on nanowire (NW) and carbon nanotube (CNT) field-effect transistors (FETs) that allow the ultra-sensitive, highly specific, and label-free electrical detection of biomarkers received much attention. Nevertheless 1D nano-FET biosensors showed high performance, several challenges remain to be resolved for the uncomplicated, reproducible, low-cost and high-throughput nanofabrication. Recently, two-dimensional (2D) graphene and reduced GO (RGO) nanosheets or films find widespread applications such as clean energy storage and conversion devices, optical detector, field-effect transistors, electromechanical resonators, and chemical & biological sensors. In particular, the graphene- and RGO-FETs devices are very promising for sensing applications because of advantages including large detection area, low noise level in solution, ease of fabrication, and the high sensitivity to ions and biomolecules comparable to 1D nano-FETs. Even though a limited number of biosensor applications including chemical vapor deposition (CVD) grown graphene film for DNA detection, single-layer graphene for protein detection and single-layer graphene or solution-processed RGO film for cell monitoring have been reported, development of facile fabrication methods and full understanding of sensing mechanism are still lacking. Furthermore, there have been no reports on demonstration of ultrasensitive electrical detection of a cancer biomarker using the graphene- or RGO-FET. Here we describe scalable and facile fabrication of reduced graphene oxide FET (RGO-FET) with the capability of label-free, ultrasensitive electrical detection of a cancer biomarker, prostate specific antigen/${\alpha}$ 1-antichymotrypsin (PSA-ACT) complex, in which the ultrathin RGO channel was formed by a uniform self-assembly of two-dimensional RGO nanosheets, and also we will discuss about the immunosensing mechanism.

  • PDF