• Title/Summary/Keyword: k-NN classification

Search Result 188, Processing Time 0.025 seconds

Feature Analysis of Multi-Channel Time Series EEG Based on Incremental Model (점진적 모델에 기반한 다채널 시계열 데이터 EEG의 특징 분석)

  • Kim, Sun-Hee;Yang, Hyung-Jeong;Ng, Kam Swee;Jeong, Jong-Mun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.1
    • /
    • pp.63-70
    • /
    • 2009
  • BCI technology is to control communication systems or machines by brain signal among biological signals followed by signal processing. For the implementation of BCI systems, it is required that the characteristics of brain signal are learned and analyzed in real-time and the learned characteristics are applied. In this paper, we detect feature vector of EEG signal on left and right hand movements based on incremental approach and dimension reduction using the detected feature vector. In addition, we show that the reduced dimension can improve the classification performance by removing unnecessary features. The processed data including sufficient features of input data can reduce the time of processing and boost performance of classification by removing unwanted features. Our experiments using K-NN classifier show the proposed approach 5% outperforms the PCA based dimension reduction.

Classification Performance Improvement of UNSW-NB15 Dataset Based on Feature Selection (특징선택 기법에 기반한 UNSW-NB15 데이터셋의 분류 성능 개선)

  • Lee, Dae-Bum;Seo, Jae-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.5
    • /
    • pp.35-42
    • /
    • 2019
  • Recently, as the Internet and various wearable devices have appeared, Internet technology has contributed to obtaining more convenient information and doing business. However, as the internet is used in various parts, the attack surface points that are exposed to attacks are increasing, Attempts to invade networks aimed at taking unfair advantage, such as cyber terrorism, are also increasing. In this paper, we propose a feature selection method to improve the classification performance of the class to classify the abnormal behavior in the network traffic. The UNSW-NB15 dataset has a rare class imbalance problem with relatively few instances compared to other classes, and an undersampling method is used to eliminate it. We use the SVM, k-NN, and decision tree algorithms and extract a subset of combinations with superior detection accuracy and RMSE through training and verification. The subset has recall values of more than 98% through the wrapper based experiments and the DT_PSO showed the best performance.

A Study on Robust Speech Emotion Feature Extraction Under the Mobile Communication Environment (이동통신 환경에서 강인한 음성 감성특징 추출에 대한 연구)

  • Cho Youn-Ho;Park Kyu-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.6
    • /
    • pp.269-276
    • /
    • 2006
  • In this paper, we propose an emotion recognition system that can discriminate human emotional state into neutral or anger from the speech captured by a cellular-phone in real time. In general. the speech through the mobile network contains environment noise and network noise, thus it can causes serious System performance degradation due to the distortion in emotional features of the query speech. In order to minimize the effect of these noise and so improve the system performance, we adopt a simple MA (Moving Average) filter which has relatively simple structure and low computational complexity, to alleviate the distortion in the emotional feature vector. Then a SFS (Sequential Forward Selection) feature optimization method is implemented to further improve and stabilize the system performance. Two pattern recognition method such as k-NN and SVM is compared for emotional state classification. The experimental results indicate that the proposed method provides very stable and successful emotional classification performance such as 86.5%. so that it will be very useful in application areas such as customer call-center.

Quality Control of Two Dimensions Using Digital Image Processing and Neural Networks (디지털 영상처리와 신경망을 이용한 2차원 평면 물체 품질 제어)

  • Kim, Jin-Hwan;Seo, Bo-Hyeok;Park, Seong-Wook
    • Proceedings of the KIEE Conference
    • /
    • 2004.07d
    • /
    • pp.2580-2582
    • /
    • 2004
  • In this paper, a Neural Network(NN) based approach for classification of two dimensions images. The proposed algorithm is able to apply in the actual industry. The described diagnostic algorithm is presented to defect surface failures on tiles. A way to get data for a digital image process is several kinds of it. The tiles are scanned and the digital images are preprocessed and classified using neural networks. It is important to reduce the amount of input data with problem specific preprocessing. The auto-associative neural network is used for feature generation and selection while the probabilistic neural network is used for classification. The proposed algorithm is evaluated experimentally using one hundred of the real tile images. Sample image data to preprocess have histogram. The histogram is used as input value of probabilistic neural network. Auto-associative neural network compress input data and compressed data is classified using probabilistic neural network. Classified sample images are determined by human state. So it is intervened human subjectivity. But digital image processing and neural network are better than human classification ability. Therefore it is very useful of quality control improvement.

  • PDF

Simultaneous Motion Recognition Framework using Data Augmentation based on Muscle Activation Model (근육 활성화 모델 기반의 데이터 증강을 활용한 동시 동작 인식 프레임워크)

  • Sejin Kim;Wan Kyun Chung
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.2
    • /
    • pp.203-212
    • /
    • 2024
  • Simultaneous motion is essential in the activities of daily living (ADL). For motion intention recognition, surface electromyogram (sEMG) and corresponding motion label is necessary. However, this process is time-consuming and it may increase the burden of the user. Therefore, we propose a simultaneous motion recognition framework using data augmentation based on muscle activation model. The model consists of multiple point sources to be optimized while the number of point sources and their initial parameters are automatically determined. From the experimental results, it is shown that the framework has generated the data which are similar to the real one. This aspect is quantified with the following two metrics: structural similarity index measure (SSIM) and mean squared error (MSE). Furthermore, with k-nearest neighbor (k-NN) or support vector machine (SVM), the classification accuracy is also enhanced with the proposed framework. From these results, it can be concluded that the generalization property of the training data is enhanced and the classification accuracy is increased accordingly. We expect that this framework reduces the burden of the user from the excessive and time-consuming data acquisition.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

A New Memory-based Learning using Dynamic Partition Averaging (동적 분할 평균을 이용한 새로운 메모리 기반 학습기법)

  • Yih, Hyeong-Il
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.456-462
    • /
    • 2008
  • The classification is that a new data is classified into one of given classes and is one of the most generally used data mining techniques. Memory-Based Reasoning (MBR) is a reasoning method for classification problem. MBR simply keeps many patterns which are represented by original vector form of features in memory without rules for reasoning, and uses a distance function to classify a test pattern. If training patterns grows in MBR, as well as size of memory great the calculation amount for reasoning much have. NGE, FPA, and RPA methods are well-known MBR algorithms, which are proven to show satisfactory performance, but those have serious problems for memory usage and lengthy computation. In this paper, we propose DPA (Dynamic Partition Averaging) algorithm. it chooses partition points by calculating GINI-Index in the entire pattern space, and partitions the entire pattern space dynamically. If classes that are included to a partition are unique, it generates a representative pattern from partition, unless partitions relevant partitions repeatedly by same method. The proposed method has been successfully shown to exhibit comparable performance to k-NN with a lot less number of patterns and better result than EACH system which implements the NGE theory and FPA, and RPA.

The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance (학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능)

  • Shim, Kyung;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.265-285
    • /
    • 2006
  • In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in $F_1$ value. On the other hand, the Recat-1 set scores $F_1$ value of 61%, which is 3.6 times higher than that of the Initial set.

Document Classification of Small Size Documents Using Extended Relief-F Algorithm (확장된 Relief-F 알고리즘을 이용한 소규모 크기 문서의 자동분류)

  • Park, Heum
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.233-238
    • /
    • 2009
  • This paper presents an approach to the classifications of small size document using the instance-based feature filtering Relief-F algorithm. In the document classifications, we have not always good classification performances of small size document included a few features. Because total number of feature in the document set is large, but feature count of each document is very small relatively, so the similarities between documents are very low when we use general assessment of similarity and classifiers. Specially, in the cases of the classification of web document in the directory service and the classification of the sectors that cannot connect with the original file after recovery hard-disk, we have not good classification performances. Thus, we propose the Extended Relief-F(ERelief-F) algorithm using instance-based feature filtering algorithm Relief-F to solve problems of Relief-F as preprocess of classification. For the performance comparison, we tested information gain, odds ratio and Relief-F for feature filtering and getting those feature values, and used kNN and SVM classifiers. In the experimental results, the Extended Relief-F(ERelief-F) algorithm, compared with the others, performed best for all of the datasets and reduced many irrelevant features from document sets.

Performance Evaluation of Car Model Recognition System Using HOG and Artificial Neural Network (HOG와 인공신경망을 이용한 자동차 모델 인식 시스템 성능 분석)

  • Park, Ki-Wan;Bang, Ji-Sung;Kim, Byeong-Man
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.5
    • /
    • pp.1-10
    • /
    • 2016
  • In this paper, a car model recognition system using image processing and machine learning is proposed and it's performance is also evaluated. The system recognizes the front of car because the front of car is different for every car model and manufacturer, and difficult to remodel. The proposed method extracts HOG features from training data set, then builds classification model by the HOG features. If user takes photo of the front of car, then HOG features are extracted from the photo image and are used to determine the model of car based on the trained classification model. Experimental results show a high average recognition rate of 98%.