• 제목/요약/키워드: Supervised machine learning

검색결과 256건 처리시간 0.028초

A Novel Feature Selection Approach to Classify Breast Cancer Drug using Optimized Grey Wolf Algorithm

  • Shobana, G.;Priya, N.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권9호
    • /
    • pp.258-270
    • /
    • 2022
  • Cancer has become a common disease for the past two decades throughout the globe and there is significant increase of cancer among women. Breast cancer and ovarian cancers are more prevalent among women. Majority of the patients approach the physicians only during their final stage of the disease. Early diagnosis of cancer remains a great challenge for the researchers. Although several drugs are being synthesized very often, their multi-benefits are less investigated. With millions of drugs synthesized and their data are accessible through open repositories. Drug repurposing can be done using machine learning techniques. We propose a feature selection technique in this paper, which is novel that generates multiple populations for the grey wolf algorithm and classifies breast cancer drugs efficiently. Leukemia drug dataset is also investigated and Multilayer perceptron achieved 96% prediction accuracy. Three supervised machine learning algorithms namely Random Forest classifier, Multilayer Perceptron and Support Vector Machine models were applied and Multilayer perceptron had higher accuracy rate of 97.7% for breast cancer drug classification.

준지도 학습에서 꼭지점 중요도를 고려한 레이블 추론 (A Label Inference Algorithm Considering Vertex Importance in Semi-Supervised Learning)

  • 오병화;양지훈;이현진
    • 정보과학회 논문지
    • /
    • 제42권12호
    • /
    • pp.1561-1567
    • /
    • 2015
  • 준지도 학습은 기계 학습의 한 분야로서, 레이블된 데이터와 레이블되지 않은 데이터 모두를 사용하여 모델을 학습함으로써 지도 학습에 비해 예측 정확도를 높일 수 있다. 최근 각광받고 있는 그래프 기반 준지도 학습은 입력 데이터를 그래프의 형태로 변환하는 그래프 구축 단계와 이를 사용하여 레이블되지 않은 데이터의 레이블을 예측하는 레이블 추론 단계로 나뉜다. 이 추론은 준지도 학습에서의 평활도 가정을 기본으로 한다. 본 연구에서는 추가로 각 꼭지점 중요도를 결합함으로써 개선된 레이블 추론 알고리즘을 제안한다. 이와 함께 알고리즘의 수렴성을 증명하고, 또한 실험을 통해 알고리즘의 우수성을 검증하였다.

기계학습을 이용한 기록 텍스트 자동분류 사례 연구 (A Study on Automatic Classification of Record Text Using Machine Learning)

  • 김해찬솔;안대진;임진희;이해영
    • 정보관리학회지
    • /
    • 제34권4호
    • /
    • pp.321-344
    • /
    • 2017
  • 기록이나 문헌의 자동분류에 관한 연구는 오래 전부터 시작되었다. 최근에는 인공지능 기술이 발전하면서 기계학습이나 딥러닝을 접목한 연구로 발전되고 있다. 이 연구에서는 우선 문헌의 자동분류와 인공지능의 학습방식이 발전해 온 과정을 살펴보았다. 또 기계학습 중 특히 지도학습 방식의 특징과 다양한 사례를 통해 기록관리 분야에 인공지능 기술을 적용해야 할 필요성에 대해 알아보았다. 그리고 실제로 지도학습 방식으로 서울시의 결재문서를 ETRI의 엑소브레인을 통해 정부기능분류체계로 자동분류해 보았다. 이를 통해 기록을 다양한 방식의 분류체계로 자동분류하기 위한 각 과정의 고려사항을 도출하였다.

Asymmetric Semi-Supervised Boosting Scheme for Interactive Image Retrieval

  • Wu, Jun;Lu, Ming-Yu
    • ETRI Journal
    • /
    • 제32권5호
    • /
    • pp.766-773
    • /
    • 2010
  • Support vector machine (SVM) active learning plays a key role in the interactive content-based image retrieval (CBIR) community. However, the regular SVM active learning is challenged by what we call "the small example problem" and "the asymmetric distribution problem." This paper attempts to integrate the merits of semi-supervised learning, ensemble learning, and active learning into the interactive CBIR. Concretely, unlabeled images are exploited to facilitate boosting by helping augment the diversity among base SVM classifiers, and then the learned ensemble model is used to identify the most informative images for active learning. In particular, a bias-weighting mechanism is developed to guide the ensemble model to pay more attention on positive images than negative images. Experiments on 5000 Corel images show that the proposed method yields better retrieval performance by an amount of 0.16 in mean average precision compared to regular SVM active learning, which is more effective than some existing improved variants of SVM active learning.

A Hybrid Selection Method of Helpful Unlabeled Data Applicable for Semi-Supervised Learning Algorithm

  • Le, Thanh-Binh;Kim, Sang-Woon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제3권4호
    • /
    • pp.234-239
    • /
    • 2014
  • This paper presents an empirical study on selecting a small amount of useful unlabeled data to improve the classification accuracy of semi-supervised learning algorithms. In particular, a hybrid method of unifying the simply recycled selection method and the incrementally-reinforced selection method was considered and evaluated empirically. The experimental results, which were obtained from well-known benchmark data sets using semi-supervised support vector machines, demonstrated that the hybrid method works better than the traditional ones in terms of the classification accuracy.

Supervised learning-based DDoS attacks detection: Tuning hyperparameters

  • Kim, Meejoung
    • ETRI Journal
    • /
    • 제41권5호
    • /
    • pp.560-573
    • /
    • 2019
  • Two supervised learning algorithms, a basic neural network and a long short-term memory recurrent neural network, are applied to traffic including DDoS attacks. The joint effects of preprocessing methods and hyperparameters for machine learning on performance are investigated. Values representing attack characteristics are extracted from datasets and preprocessed by two methods. Binary classification and two optimizers are used. Some hyperparameters are obtained exhaustively for fast and accurate detection, while others are fixed with constants to account for performance and data characteristics. An experiment is performed via TensorFlow on three traffic datasets. Three scenarios are considered to investigate the effects of learning former traffic on sequential traffic analysis and the effects of learning one dataset on application to another dataset, and determine whether the algorithms can be used for recent attack traffic. Experimental results show that the used preprocessing methods, neural network architectures and hyperparameters, and the optimizers are appropriate for DDoS attack detection. The obtained results provide a criterion for the detection accuracy of attacks.

Native API 빈도 기반의 퍼지 군집화를 이용한 악성코드 재그룹화 기법연구 (Malicious Codes Re-grouping Methods using Fuzzy Clustering based on Native API Frequency)

  • 권오철;배성재;조재익;문종섭
    • 정보보호학회논문지
    • /
    • 제18권6A호
    • /
    • pp.115-127
    • /
    • 2008
  • Native API(Application Programming Interfaces)는 관리자 권한에서 수행되는 system call의 일종으로 관리자 권한을 획득하여 공격하는 다양한 종류의 악성코드를 탐지하는데 사용된다. 이에 따라 Native API의 특징을 기반으로한 탐지방법들이 제안되고 있으며 다수의 탐지방법이 교사학습(supervised learning) 방법의 기계학습(machine learning)을 사용하고 있다. 하지만 Anti-Virus 업체의 분류기준은 Native API의 특징점을 반영하지 않았기 때문에 교사학습을 이용한 탐지에 적합한 학습 집합을 제공하지 못한다. 따라서 Native API를 이용한 탐지에 적합한 분류기준에 대한 연구가 필요하다. 본 논문에서는 정량적으로 악성코드를 분류하기 위해 Native API를 기준으로 악성코드를 퍼지 군집화하여 재그룹화하는 방법을 제시한다. 제시하는 재그룹화 방법의 적합성은 기계학습을 이용한 탐지성능의 차이를 기존 분류방법을 결과와 비교하여 검증한다.

Semi-supervised Multi-view Manifold Discriminant Intact Space Learning

  • Han, Lu;Wu, Fei;Jing, Xiao-Yuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권9호
    • /
    • pp.4317-4335
    • /
    • 2018
  • Semi-supervised multi-view latent space learning is gaining considerable popularity recently in many machine learning applications due to the high cost and difficulty to obtain the large amount of label information of data. Although some semi-supervised multi-view latent space learning methods have been presented, there is still much space for improvement: 1) How to learn latent discriminant intact feature representations by employing data of multiple views; 2) How to exploit the manifold structure of both labeled and unlabeled point in the learned latent intact space effectively. To address the above issues, we propose an approach called semi-supervised multi-view manifold discriminant intact space learning ($SM^2DIS$) for image classification in this paper. $SM^2DIS$ aims to seek a manifold discriminant intact space for data of different views by making use of both the discriminant information of labeled data and the manifold structure of both labeled and unlabeled data. Experimental results on MNIST, COIL-20, Multi-PIE, and Caltech-101 databases demonstrate the effectiveness and robustness of our proposed approach.

The use of support vector machines in semi-supervised classification

  • Bae, Hyunjoo;Kim, Hyungwoo;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.193-202
    • /
    • 2022
  • Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but effective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.

IRSML: An intelligent routing algorithm based on machine learning in software defined wireless networking

  • Duong, Thuy-Van T.;Binh, Le Huu
    • ETRI Journal
    • /
    • 제44권5호
    • /
    • pp.733-745
    • /
    • 2022
  • In software-defined wireless networking (SDWN), the optimal routing technique is one of the effective solutions to improve its performance. This routing technique is done by many different methods, with the most common using integer linear programming problem (ILP), building optimal routing metrics. These methods often only focus on one routing objective, such as minimizing the packet blocking probability, minimizing end-to-end delay (EED), and maximizing network throughput. It is difficult to consider multiple objectives concurrently in a routing algorithm. In this paper, we investigate the application of machine learning to control routing in the SDWN. An intelligent routing algorithm is then proposed based on the machine learning to improve the network performance. The proposed algorithm can optimize multiple routing objectives. Our idea is to combine supervised learning (SL) and reinforcement learning (RL) methods to discover new routes. The SL is used to predict the performance metrics of the links, including EED quality of transmission (QoT), and packet blocking probability (PBP). The routing is done by the RL method. We use the Q-value in the fundamental equation of the RL to store the PBP, which is used for the aim of route selection. Concurrently, the learning rate coefficient is flexibly changed to determine the constraints of routing during learning. These constraints include QoT and EED. Our performance evaluations based on OMNeT++ have shown that the proposed algorithm has significantly improved the network performance in terms of the QoT, EED, packet delivery ratio, and network throughput compared with other well-known routing algorithms.