• Title/Summary/Keyword: classification algorithm

Search Result 2,919, Processing Time 0.029 seconds

Document Classification of Small Size Documents Using Extended Relief-F Algorithm (확장된 Relief-F 알고리즘을 이용한 소규모 크기 문서의 자동분류)

  • Park, Heum
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.233-238
    • /
    • 2009
  • This paper presents an approach to the classifications of small size document using the instance-based feature filtering Relief-F algorithm. In the document classifications, we have not always good classification performances of small size document included a few features. Because total number of feature in the document set is large, but feature count of each document is very small relatively, so the similarities between documents are very low when we use general assessment of similarity and classifiers. Specially, in the cases of the classification of web document in the directory service and the classification of the sectors that cannot connect with the original file after recovery hard-disk, we have not good classification performances. Thus, we propose the Extended Relief-F(ERelief-F) algorithm using instance-based feature filtering algorithm Relief-F to solve problems of Relief-F as preprocess of classification. For the performance comparison, we tested information gain, odds ratio and Relief-F for feature filtering and getting those feature values, and used kNN and SVM classifiers. In the experimental results, the Extended Relief-F(ERelief-F) algorithm, compared with the others, performed best for all of the datasets and reduced many irrelevant features from document sets.

Study on Classification Algorithm based on Weight of Support and Confidence Degree (지지도와 신뢰도의 가중치에 기반한 분류알고리즘에 관한 연구)

  • Kim, Keun-Hyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.4
    • /
    • pp.700-713
    • /
    • 2009
  • Most of any existing classification algorithm in data mining area have focused on goals improving efficiency, which is to generate decision tree more rapidly by utilizing just less computing resources. In this paper, we focused on the efficiency as well as effectiveness that is able to generate more meaningful classification rules in application area, which might consist of the ontology automatic generation, business environment and so on. For this, we proposed not only novel function with the weight of support and confidence degree but also analyzed the characteristics of the weighted function in theoretical viewpoint. Furthermore, we proposed novel classification algorithm based on the weighted function and the characteristics. In the result of evaluating the proposed algorithm, we could perceive that the novel algorithm generates more classification rules with significance more rapidly.

A Radiomics-based Unread Cervical Imaging Classification Algorithm (자궁경부 영상에서의 라디오믹스 기반 판독 불가 영상 분류 알고리즘 연구)

  • Kim, Go Eun;Kim, Young Jae;Ju, Woong;Nam, Kyehyun;Kim, Soonyung;Kim, Kwang Gi
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.5
    • /
    • pp.241-249
    • /
    • 2021
  • Recently, artificial intelligence for diagnosis system of obstetric diseases have been actively studied. Artificial intelligence diagnostic assist systems, which support medical diagnosis benefits of efficiency and accuracy, may experience problems of poor learning accuracy and reliability when inappropriate images are the model's input data. For this reason, before learning, We proposed an algorithm to exclude unread cervical imaging. 2,000 images of read cervical imaging and 257 images of unread cervical imaging were used for this study. Experiments were conducted based on the statistical method Radiomics to extract feature values of the entire images for classification of unread images from the entire images and to obtain a range of read threshold values. The degree to which brightness, blur, and cervical regions were photographed adequately in the image was determined as classification indicators. We compared the classification performance by learning read cervical imaging classified by the algorithm proposed in this paper and unread cervical imaging for deep learning classification model. We evaluate the classification accuracy for unread Cervical imaging of the algorithm by comparing the performance. Images for the algorithm showed higher accuracy of 91.6% on average. It is expected that the algorithm proposed in this paper will improve reliability by effectively excluding unread cervical imaging and ultimately reducing errors in artificial intelligence diagnosis.

The Efficiency of Boosting on SVM

  • Seok, Kyung-Ha;Ryu, Tae-Wook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.55-64
    • /
    • 2002
  • In this paper, we introduce SVM(support vector machine) developed to solve the problem of generalization of neural networks. We also introduce boosting algorithm which is a general method to improve accuracy of some given learning algorithm. We propose a new algorithm combining SVM and boosting to solve classification problem. Through the experiment with real and simulated data sets, we can obtain better performance of the proposed algorithm.

  • PDF

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

Application of Multi-Class AdaBoost Algorithm to Terrain Classification of Satellite Images

  • Nguyen, Ngoc-Hoa;Woo, Dong-Min
    • Journal of IKEEE
    • /
    • v.18 no.4
    • /
    • pp.536-543
    • /
    • 2014
  • Terrain classification is still a challenging issue in image processing, especially with high resolution satellite images. The well-known obstacles include low accuracy in the detection of targets, especially for the case of man-made structures, such as buildings and roads. In this paper, we present an efficient approach to classify and detect building footprints, foliage, grass and road from high resolution grayscale satellite images. Our contribution is to build a strong classifier using AdaBoost based on a combination of co-occurrence and Haar-like features. We expect that the inclusion of Harr-like feature improves the classification performance of the man-made structures, since Haar-like feature is extracted from corner features and rectangle features. Also, the AdaBoost algorithm selects only critical features and generates an extremely efficient classifier. Experimental result indicates that the classification accuracy of AdaBoost classifier is much higher than that of the conventional classifier using back propagation algorithm. Also, the inclusion of Harr-like feature significantly improves the classification accuracy. The accuracy of the proposed method is 98.4% for the target detection and 92.8% for the classification on high resolution satellite images.

Application of Random Forest Algorithm for the Decision Support System of Medical Diagnosis with the Selection of Significant Clinical Test (의료진단 및 중요 검사 항목 결정 지원 시스템을 위한 랜덤 포레스트 알고리즘 적용)

  • Yun, Tae-Gyun;Yi, Gwan-Su
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.6
    • /
    • pp.1058-1062
    • /
    • 2008
  • In clinical decision support system(CDSS), unlike rule-based expert method, appropriate data-driven machine learning method can easily provide the information of individual feature(clinical test) for disease classification. However, currently developed methods focus on the improvement of the classification accuracy for diagnosis. With the analysis of feature importance in classification, one may infer the novel clinical test sets which highly differentiate the specific diseases or disease states. In this background, we introduce a novel CDSS that integrate a classifier and feature selection module together. Random forest algorithm is applied for the classifier and the feature importance measure. The system selects the significant clinical tests discriminating the diseases by examining the classification error during backward elimination of the features. The superior performance of random forest algorithm in clinical classification was assessed against artificial neural network and decision tree algorithm by using breast cancer, diabetes and heart disease data in UCI Machine Learning Repository. The test with the same data sets shows that the proposed system can successfully select the significant clinical test set for each disease.

Nearest-Neighbors Based Weighted Method for the BOVW Applied to Image Classification

  • Xu, Mengxi;Sun, Quansen;Lu, Yingshu;Shen, Chenming
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.4
    • /
    • pp.1877-1885
    • /
    • 2015
  • This paper presents a new Nearest-Neighbors based weighted representation for images and weighted K-Nearest-Neighbors (WKNN) classifier to improve the precision of image classification using the Bag of Visual Words (BOVW) based models. Scale-invariant feature transform (SIFT) features are firstly extracted from images. Then, the K-means++ algorithm is adopted in place of the conventional K-means algorithm to generate a more effective visual dictionary. Furthermore, the histogram of visual words becomes more expressive by utilizing the proposed weighted vector quantization (WVQ). Finally, WKNN classifier is applied to enhance the properties of the classification task between images in which similar levels of background noise are present. Average precision and absolute change degree are calculated to assess the classification performance and the stability of K-means++ algorithm, respectively. Experimental results on three diverse datasets: Caltech-101, Caltech-256 and PASCAL VOC 2011 show that the proposed WVQ method and WKNN method further improve the performance of classification.

Electromyography Pattern Recognition and Classification using Circular Structure Algorithm (원형 구조 알고리즘을 이용한 근전도 패턴 인식 및 분류)

  • Choi, Yuna;Sung, Minchang;Lee, Seulah;Choi, Youngjin
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.1
    • /
    • pp.62-69
    • /
    • 2020
  • This paper proposes a pattern recognition and classification algorithm based on a circular structure that can reflect the characteristics of the sEMG (surface electromyogram) signal measured in the arm without putting the placement limitation of electrodes. In order to recognize the same pattern at all times despite the electrode locations, the data acquisition of the circular structure is proposed so that all sEMG channels can be connected to one another. For the performance verification of the sEMG pattern recognition and classification using the developed algorithm, several experiments are conducted. First, although there are no differences in the sEMG signals themselves, the similar patterns are much better identified in the case of the circular structure algorithm than that of conventional linear ones. Second, a comparative analysis is shown with the supervised learning schemes such as MLP, CNN, and LSTM. In the results, the classification recognition accuracy of the circular structure is above 98% in all postures. It is much higher than the results obtained when the linear structure is used. The recognition difference between the circular and linear structures was the biggest with about 4% when the MLP network was used.

Development of Age Classification Deep Learning Algorithm Using Korean Speech (한국어 음성을 이용한 연령 분류 딥러닝 알고리즘 기술 개발)

  • So, Soonwon;You, Sung Min;Kim, Joo Young;An, Hyun Jun;Cho, Baek Hwan;Yook, Sunhyun;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.39 no.2
    • /
    • pp.63-68
    • /
    • 2018
  • In modern society, speech recognition technology is emerging as an important technology for identification in electronic commerce, forensics, law enforcement, and other systems. In this study, we aim to develop an age classification algorithm for extracting only MFCC(Mel Frequency Cepstral Coefficient) expressing the characteristics of speech in Korean and applying it to deep learning technology. The algorithm for extracting the 13th order MFCC from Korean data and constructing a data set, and using the artificial intelligence algorithm, deep artificial neural network, to classify males in their 20s, 30s, and 50s, and females in their 20s, 40s, and 50s. finally, our model confirmed the classification accuracy of 78.6% and 71.9% for males and females, respectively.