• 제목/요약/키워드: Supervised Data

검색결과 659건 처리시간 0.023초

Breast Cancer Classification in Ultrasound Images using Semi-supervised method based on Pseudo-labeling

  • Seokmin Han
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권1호
    • /
    • pp.124-131
    • /
    • 2024
  • Breast cancer classification using ultrasound, while widely employed, faces challenges due to its relatively low predictive value arising from significant overlap in characteristics between benign and malignant lesions, as well as operator-dependency. To alleviate these challenges and reduce dependency on radiologist interpretation, the implementation of automatic breast cancer classification in ultrasound image can be helpful. To deal with this problem, we propose a semi-supervised deep learning framework for breast cancer classification. In the proposed method, we could achieve reasonable performance utilizing less than 50% of the training data for supervised learning in comparison to when we utilized a 100% labeled dataset for training. Though it requires more modification, this methodology may be able to alleviate the time-consuming annotation burden on radiologists by reducing the number of annotation, contributing to a more efficient and effective breast cancer detection process in ultrasound images.

SVM을 이용한 고속철도 궤도틀림 식별에 관한 연구 (A Study on Identification of Track Irregularity of High Speed Railway Track Using an SVM)

  • 김기동;황순현
    • 산업기술연구
    • /
    • 제33권A호
    • /
    • pp.31-39
    • /
    • 2013
  • There are two methods to make a distinction of deterioration of high-speed railway track. One is that an administrator checks for each attribute value of track induction data represented in graph and determines whether maintenance is needed or not. The other is that an administrator checks for monthly trend of attribute value of the corresponding section and determines whether maintenance is needed or not. But these methods have a weak point that it takes longer times to make decisions as the amount of track induction data increases. As a field of artificial intelligence, the method that a computer makes a distinction of deterioration of high-speed railway track automatically is based on machine learning. Types of machine learning algorism are classified into four type: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. This research uses supervised learning that analogizes a separating function form training data. The method suggested in this research uses SVM classifier which is a main type of supervised learning and shows higher efficiency binary classification problem. and it grasps the difference between two groups of data and makes a distinction of deterioration of high-speed railway track.

  • PDF

딥 러닝에서 Labeling 부담을 줄이기 위한 연구분석 (An Analysis of the methods to alleviate the cost of data labeling in Deep learning)

  • 한석민
    • 문화기술의 융합
    • /
    • 제8권1호
    • /
    • pp.545-550
    • /
    • 2022
  • 딥러닝은 많은 데이터를 필요로 한다는 것은 이미 널리 알려져있다. 이를 통해, 딥러닝에 쓰이는 신경망의 수없이 많은 parameter들을 학습시킨다. 학습과정에는 데이터뿐 아니라, 각 데이터별로 전문가가 입력한 label이 필요한 경우가 대부분인데, 이 label을 얻는 과정은 시간과 자원 소비가 심하다. 이 문제를 완화하기 위해, few-shot learning, self-supervised learning, weak-supervised learning등이 연구되어오고 있다. 본 논문에서는, label을 상대적으로 적은 노력으로 수행하기 위한 연구들의 동향을 살펴보고, 앞으로의 개선 방향을 제시하도록 한다.

Semi-Supervised Learning Based Anomaly Detection for License Plate OCR in Real Time Video

  • Kim, Bada;Heo, Junyoung
    • International journal of advanced smart convergence
    • /
    • 제9권1호
    • /
    • pp.113-120
    • /
    • 2020
  • Recently, the license plate OCR system has been commercialized in a variety of fields and preferred utilizing low-cost embedded systems using only cameras. This system has a high recognition rate of about 98% or more for the environments such as parking lots where non-vehicle is restricted; however, the environments where non-vehicle objects are not restricted, the recognition rate is about 50% to 70%. This low performance is due to the changes in the environment by non-vehicle objects in real-time situations that occur anomaly data which is similar to the license plates. In this paper, we implement the appropriate anomaly detection based on semi-supervised learning for the license plate OCR system in the real-time environment where the appearance of non-vehicle objects is not restricted. In the experiment, we compare systems which anomaly detection is not implemented in the preceding research with the proposed system in this paper. As a result, the systems which anomaly detection is not implemented had a recognition rate of 77%; however, the systems with the semi-supervised learning based on anomaly detection had 88% of recognition rate. Using the techniques of anomaly detection based on the semi-supervised learning was effective in detecting anomaly data and it was helpful to improve the recognition rate of real-time situations.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

The Classifications using by the Merged Imagery from SPOT and LANDSAT

  • Kang, In-Joon;Choi, Hyun;Kim, Hong-Tae;Lee, Jun-Seok;Choi, Chul-Ung
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 1999년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.262-266
    • /
    • 1999
  • Several commercial companies that plan to provide improved panchromatic and/or multi-spectral remote sensor data in the near future are suggesting that merge datasets will be of significant value. This study evaluated the utility of one major merging process-process components analysis and its inverse. The 6 bands of 30$\times$30m Landsat TM data and the 10$\times$l0m SPOT panchromatic data were used to create a new 10$\times$10m merged data file. For the image classification, 6 bands that is 1st, 2nd, 3rd, 4th, 5th and 7th band may be used in conjunction with supervised classification algorithms except band 6. One of the 7 bands is Band 6 that records thermal IR energy and is rarely used because of its coarse spatial resolution (120m) except being employed in thermal mapping. Because SPOT panchromatic has high resolution it makes 10$\times$10m SPOT panchromatic data be used to classify for the detailed classification. SPOT as the Landsat has acquired hundreds of thousands of images in digital format that are commercially available and are used by scientists in different fields. After the merged, the classifications used supervised classification and neural network. The method of the supervised classification is what used parallelepiped and/or minimum distance and MLC(Maximum Likelihood Classification) The back-propagation in the multi-layer perception is one of the neural network. The used method in this paper is MLC(Maximum Likelihood Classification) of the supervised classification and the back-propagation of the neural network. Later in this research SPOT systems and images are compared with these classification. A comparative analysis of the classifications from the TM and merged SPOT/TM datasets will be resulted in some conclusions.

  • PDF

연합학습의 의료분야 적용을 위한 자기지도 메타러닝 (Self-supervised Meta-learning for the Application of Federated Learning on the Medical Domain)

  • 공희산;김광수
    • 지능정보연구
    • /
    • 제28권4호
    • /
    • pp.27-40
    • /
    • 2022
  • 최근 많은 발전을 이룬 의료 인공지능은 의사가 진단과 결정을 내리는 데 도움을 주는 등 중요한 역할을 수행하고 있다. 특히, 흉부 엑스레이 분야는 접근성 및 흉부질환 탐지에 유용함과 최근 COVID-19 상황이 도래함에 따라 많은 관심을 받고 있다. 그러나, 데이터의 수가 많음에도 레이블이 있는 데이터의 수가 부족하므로 효과적인 인공지능 모델을 만드는데 한계가 있다. 이러한 문제를 완화하는 방안으로 연합학습을 흉부 엑스레이 데이터에 적용한 연구가 등장했지만, 여전히 다음과 같은 문제를 내포하고 있다. 1) Non-IID 환경에서 발생할 수 있는 문제를 고려하지 않았다. 2) 연합학습 환경에서도 여전히 클라이언트의 레이블이 있는 데이터가 부족하다. 우리는 자기지도학습 모델을 연합학습의 Global 모델로 사용함으로써 위와 같은 문제를 해결하는 방법을 제안한다. 이를 위해 흉부 엑스레이 데이터를 사용한 연합학습에 알맞은 자기지도학습 방법론을 실험적으로 탐색하며, 자기지도학습 모델을 연합학습에 사용함으로써 얻을 수 있는 장점을 검증한다.

클래스간의 거리를 고려한 학습법칙을 사용한 퍼지 신경회로망 모델 (Fuzzy Neural Network Model Using A Learning Rule Considering the Distances Between Classes)

  • 김용수;백용선;이세열
    • 한국지능시스템학회논문지
    • /
    • 제16권4호
    • /
    • pp.460-465
    • /
    • 2006
  • 본 논문은 입력 벡터와 클래스들의 대표값들간의 유클리디안 거리들을 사용한 새로운 퍼지 학습법칙을 제안한다. 이 새로운 퍼지 학습을 supervised IAFC(Integrated Adaptive Fuzzy Clustering) 신경회로망 4에 적용하였다. 이 신경회로망은 안정성을 유지하면서도 유연성을 가지고 있다. iris 데이터를 사용하여 테스트한 결과 supervised IAFC 신경회로망 4는 오류역전파 신경회로망과 LVQ 알고리듬보다 성능이 우수하였다.

Combining Geostatistical Indicator Kriging with Bayesian Approach for Supervised Classification

  • Park, No-Wook;Chi, Kwang-Hoon;Moon, Wooil-M.;Kwon, Byung-Doo
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2002년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.382-387
    • /
    • 2002
  • In this paper, we propose a geostatistical approach incorporated to the Bayesian data fusion technique for supervised classification of multi-sensor remote sensing data. Traditional spectral based classification cannot account for the spatial information and may result in unrealistic classification results. To obtain accurate spatial/contextual information, the indicator kriging that allows one to estimate the probability of occurrence of classes on the basis of surrounding observations is incorporated into the Bayesian framework. This approach has its merit incorporating both the spectral information and spatial information and improves the confidence level in the final data fusion task. To illustrate the proposed scheme, supervised classification of multi-sensor test remote sensing data set was carried out.

  • PDF

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

  • Frigui, Hichem;Bchir, Ouiem;Baili, Naouel
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제13권4호
    • /
    • pp.254-268
    • /
    • 2013
  • For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.