• Title/Summary/Keyword: K-NN(K-Nearest Neighbor)

Search Result 198, Processing Time 0.028 seconds

The Optimized Detection Range of RFID-based Positioning System using k-Nearest Neighbor Algorithm

  • Kim, Jung-Hwan;Heo, Joon;Han, Soo-Hee;Kim, Sang-Min
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2008.10a
    • /
    • pp.297-302
    • /
    • 2008
  • The positioning technology for a moving object is an important and essential component of ubiquitous computing environment and applications, for which Radio Frequency Identification(RFID) has been considered as a core technology. RFID-based positioning system calculates the position of moving object based on k-nearest neighbor(k-nn) algorithm using detected k-tags which have known coordinates and kcan be determined according to the detection range of RFID system. In this paper, RFID-based positioning system determines the position of moving object not using weight factor which depends on received signal strength but assuming that tags within the detection range always operate and have same weight value. Because the latter system is much more economical than the former one. The geometries of tags were determined with considerations in huge buildings like office buildings, shopping malls and warehouses, so they were determined as the line in I-Dimensional space, the square in 2-Dimensional space. In 1-Dimensional space, the optimal detection range is determined as 125% of the tag spacing distance through the analytical and numerical approach. Here, the analytical approach means a mathematical proof and the numerical approach means a simulation using matlab. But the analytical approach is very difficult in 2-Dimensional space, so through the numerical approach, the optimal detection range is determined as 134% of the tag spacing distance in 2-Dimensional space. This result can be used as a fundamental study for designing RFID-based positioning system.

  • PDF

Dynamic Emotion Classification through Facial Recognition (얼굴 인식을 통한 동적 감정 분류)

  • Han, Wuri;Lee, Yong-Hwan;Park, Jeho;Kim, Youngseop
    • Journal of the Semiconductor & Display Technology
    • /
    • v.12 no.3
    • /
    • pp.53-57
    • /
    • 2013
  • Human emotions are expressed in various ways. It can be expressed through language, facial expression and gestures. In particular, the facial expression contains many information about human emotion. These vague human emotion appear not in single emotion, but in combination of various emotion. This paper proposes a emotional expression algorithm using Active Appearance Model(AAM) and Fuzz k- Nearest Neighbor which give facial expression in similar with vague human emotion. Applying Mahalanobis distance on the center class, determine inclusion level between center class and each class. Also following inclusion level, appear intensity of emotion. Our emotion recognition system can recognize a complex emotion using Fuzzy k-NN classifier.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

An Exploratory Study on Survey Data Categorization using DDI metadata (메타데이터를 활용한 조사자료의 문서범주화에 관한 연구)

  • Park, Ja-Hyun;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2012.08a
    • /
    • pp.73-76
    • /
    • 2012
  • 본 연구는 DDI 메타데이터를 활용하여 귀납적 학습모델(supervised learning model)의 문서범주화 실험을 수행함으로써 조사자료의 체계적이고 효율적인 분류작업을 설계하는데 그 목적이 있다. 구체적으로 조사자료의 DDI 메타데이터를 대상으로 단순 TF 가중치, TF-IDF 가중치, Okapi TF 가중치에 따른 나이브 베이즈(Naive Bayes), kNN(k nearest neighbor), 결정트리(Decision tree) 분류기의 성능비교 실험을 하였다. 그 결과, 나이브 베이즈가 가장 좋은 성능을 보였으며, 단순 TF 가중치와 TF-IDF 가중치는 나이브 베이즈, kNN, 결정트리 분류기에서 동일한 성능을 보였으나, Okapi TF 가중치의 경우 나이브 베이즈에서 가장 좋은 성능을 보였다.

  • PDF

Comparison of Machine Learning Classification Models for the Development of Simulators for General X-ray Examination Education (일반엑스선검사 교육용 시뮬레이터 개발을 위한 기계학습 분류모델 비교)

  • Lee, In-Ja;Park, Chae-Yeon;Lee, Jun-Ho
    • Journal of radiological science and technology
    • /
    • v.45 no.2
    • /
    • pp.111-116
    • /
    • 2022
  • In this study, the applicability of machine learning for the development of a simulator for general X-ray examination education is evaluated. To this end, k-nearest neighbor(kNN), support vector machine(SVM) and neural network(NN) classification models are analyzed to present the most suitable model by analyzing the results. Image data was obtained by taking 100 photos each corresponding to Posterior anterior(PA), Posterior anterior oblique(Obl), Lateral(Lat), Fan lateral(Fan lat). 70% of the acquired 400 image data were used as training sets for learning machine learning models and 30% were used as test sets for evaluation. and prediction model was constructed for right-handed PA, Obl, Lat, Fan lat image classification. Based on the data set, after constructing the classification model using the kNN, SVM, and NN models, each model was compared through an error matrix. As a result of the evaluation, the accuracy of kNN was 0.967 area under curve(AUC) was 0.993, and the accuracy of SVM was 0.992 AUC was 1.000. The accuracy of NN was 0.992 and AUC was 0.999, which was slightly lower in kNN, but all three models recorded high accuracy and AUC. In this study, right-handed PA, Obl, Lat, Fan lat images were classified and predicted using the machine learning classification models, kNN, SVM, and NN models. The prediction showed that SVM and NN were the same at 0.992, and AUC was similar at 1.000 and 0.999, indicating that both models showed high predictive power and were applicable to educational simulators.

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

  • Ohn, Syng-Yup;Chi, Seung-Do
    • Journal of the Korea Society for Simulation
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2011
  • It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

Research on the Emotion Recognition System based on Electrocardiograph and Pulse Signals (심전도 및 맥파신호 기반의 감정인식 시스템에 관한 연구)

  • Hong, Yoon-Jung;Hwang, Yun-Kyung;Shin, Dong-Kyoo;Kim, Dong-Hyun;Shin, Dong-Il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.175-178
    • /
    • 2008
  • 본 논문은 생체 신호들 중 데이터 획득이 간편한 심전도와 맥파를 실시간으로 취득하여 기계학습 기법인 SVM (Support Vector Machine)알고리즘과 클러스터링 기법인 k-NN (Nearest Neighbor)알고리즘을 적용한 인간의 감정을 분석하는 시스템에 대한 연구결과를 제시한다.

Forest Thematic Maps and Forest Statistics Using the k-Nearest Neighbor Technique for Pyeongchang-Gun, Gangwon-Do (kNN 기법을 이용한 강원도 평창군의 산림 주제도 작성과 산림통계량 추정)

  • Yim, Jong-Su;Kong, Gee Su;Kim, Sung Ho;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.3
    • /
    • pp.259-268
    • /
    • 2007
  • This study was conducted to produce forest thematic maps and estimate forest statistics for Pyeongchang Gun using the kNN technique, which has been applied to produce thematic maps of variables of interest including unobserved plots by combining field plot data, remotely sensed data and other digital map data in forest inventories. The estimation errors for three horizontal reference areas (HRAs), whose radii are 20, 40 and 60 km respectively, were compared. Although the precision for the 40 km radius was lower compared to that for the 60 km radius, the 40 km radius was found to be an efficient HRA because their difference in precision was modest. At a value of k=5 nearest neighbors for the selected HRA, the overall accuracy was high. As a result, using the k=5 neighbors within the HRA of 40 km radius, thematic maps of number of trees, basal area, and growing stock per hectare were generated. As compared to the forest statistics based on field sample plots, the estimated means of each parameter from the produced maps were underestimated.