• Title/Summary/Keyword: 최근접 데이터 선택

Search Result 28, Processing Time 0.024 seconds

Hyper-Rectangle Based Prototype Selection Algorithm Preserving Class Regions (클래스 영역을 보존하는 초월 사각형에 의한 프로토타입 선택 알고리즘)

  • Baek, Byunghyun;Euh, Seongyul;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.3
    • /
    • pp.83-90
    • /
    • 2020
  • Prototype selection offers the advantage of ensuring low learning time and storage space by selecting the minimum data representative of in-class partitions from the training data. This paper designs a new training data generation method using hyper-rectangles that can be applied to general classification algorithms. Hyper-rectangular regions do not contain different class data and divide the same class space. The median value of the data within a hyper-rectangle is selected as a prototype to form new training data, and the size of the hyper-rectangle is adjusted to reflect the data distribution in the class area. A set cover optimization algorithm is proposed to select the minimum prototype set that represents the whole training data. The proposed method reduces the time complexity that requires the polynomial time of the set cover optimization algorithm by using the greedy algorithm and the distance equation without multiplication. In experimented comparison with hyper-sphere prototype selections, the proposed method is superior in terms of prototype rate and generalization performance.

A Hierarchical Bitmap-based Spatial Index use k-Nearest Neighbor Query Processing on the Wireless Broadcast Environment (무선방송환경에서 계층적 비트맵 기반 공간 색인을 이용한 k-최근접 질의처리)

  • Song, Doo-Hee;Park, Kwang-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.203-209
    • /
    • 2012
  • Recently, k-nearest neighbors query methods based on wireless broadcasting environment are actively studied. The advantage of wireless broadcasting environment is the scalability that enables collective query processing for unspecified users connected to the server. However, in case existing k-NN query is applied in wireless broadcasting environment, there can be a disadvantage that backtracking may occur and consequently the query processing time is increasing. In this paper proposes a hierarchical bitmap-based spatial index in order to efficiently process the k-NN queries in wireless broadcasting environment. HBI reduces the bitmap size using such bitmap information and tree structure. As a result, reducing the broadcast cycle can reduce the client's tuning time and query processing time. In addition, since the locations of all the objects can be detected using bitmap information, it is possible to tune to necessary data selectively. For this paper, a test was conducted implementing HBI to k-NN query and the proposed technique was proved to be excellent by a performance evaluation.

A Comparison of Distance Metric Learning Methods for Face Recognition (얼굴인식을 위한 거리척도학습 방법 비교)

  • Suvdaa, Batsuri;Ko, Jae-Pil
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.6
    • /
    • pp.711-718
    • /
    • 2011
  • The k-Nearest Neighbor classifier that does not require a training phase is appropriate for a variable number of classes problem like face recognition, Recently distance metric learning methods that is trained with a given data set have reported the significant improvement of the kNN classifier. However, the performance of a distance metric learning method is variable for each application, In this paper, we focus on the face recognition and compare the performance of the state-of-the-art distance metric learning methods, Our experimental results on the public face databases demonstrate that the Mahalanobis distance metric based on PCA is still competitive with respect to both performance and time complexity in face recognition.

Optimal Band Selection Techniques for Hyperspectral Image Pixel Classification using Pooling Operations & PSNR (초분광 이미지 픽셀 분류를 위한 풀링 연산과 PSNR을 이용한 최적 밴드 선택 기법)

  • Chang, Duhyeuk;Jung, Byeonghyeon;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.5
    • /
    • pp.141-147
    • /
    • 2021
  • In this paper, in order to improve the utilization of hyperspectral large-capacity data feature information by reducing complex computations by dimension reduction of neural network inputs in embedded systems, the band selection algorithm is applied in each subset. Among feature extraction and feature selection techniques, the feature selection aim to improve the optimal number of bands suitable for datasets, regardless of wavelength range, and the time and performance, more than others algorithms. Through this experiment, although the time required was reduced by 1/3 to 1/9 times compared to the others band selection technique, meaningful results were improved by more than 4% in terms of performance through the K-neighbor classifier. Although it is difficult to utilize real-time hyperspectral data analysis now, it has confirmed the possibility of improvement.

Face Recognition based on Hybrid Classifiers with Virtual Samples (가상 데이터와 융합 분류기에 기반한 얼굴인식)

  • 류연식;오세영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.1
    • /
    • pp.19-29
    • /
    • 2003
  • This paper presents a novel hybrid classifier for face recognition with artificially generated virtual training samples. We utilize both the nearest neighbor approach in feature angle space and a connectionist model to obtain a synergy effect by combining the results of two heterogeneous classifiers. First, a classifier called the nearest feature angle (NFA), based on angular information, finds the most similar feature to the query from a given training set. Second, a classifier has been developed based on the recall of stored frontal projection of the query feature. It uses a frontal recall network (FRN) that finds the most similar frontal one among the stored frontal feature set. For FRN, we used an ensemble neural network consisting of multiple multiplayer perceptrons (MLPs), each of which is trained independently to enhance generalization capability. Further, both classifiers used the virtual training set generated adaptively, according to the spatial distribution of each person's training samples. Finally, the results of the two classifiers are combined to comprise the best matching class, and a corresponding similarit measure is used to make the final decision. The proposed classifier achieved an average classification rate of 96.33% against a large group of different test sets of images, and its average error rate is 61.5% that of the nearest feature line (NFL) method, and achieves a more robust classification performance.

Recommendation system for supporting self-directed learning on e-learning marketplace (이러닝 마켓플레이스에서 자기주도학습지원을 위한 추천시스템)

  • Kwon, Byung-Il;Moon, Nam-Mee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.2
    • /
    • pp.135-146
    • /
    • 2010
  • In this paper, we propose an Recommendation System for supporting self-directed learning on e-learning marketplace. The key idea of this system is recommendation system using revised collaborative filtering to support marketplace. Exisiting collaborative filtering method consists of 3 stages as preparing low data, building familiar customer group by selecting nearest neighbor, creating recommendation list. This study designs recommendation system to support self-directed learning by using collaborative filtering added nearest neighbor learning course that considered industry and learning level. This service helps to select right learning course to learner in industry. Recommendation System can be built by many method and to recommend the service content including explicit properties using revised collaborative filtering method can solve limitations in existing content recommendation.

Prediction of golf scores on the PGA tour using statistical models (PGA 투어의 골프 스코어 예측 및 분석)

  • Lim, Jungeun;Lim, Youngin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.1
    • /
    • pp.41-55
    • /
    • 2017
  • This study predicts the average scores of top 150 PGA golf players on 132 PGA Tour tournaments (2013-2015) using data mining techniques and statistical analysis. This study also aims to predict the Top 10 and Top 25 best players in 4 different playoffs. Linear and nonlinear regression methods were used to predict average scores. Stepwise regression, all best subset, LASSO, ridge regression and principal component regression were used for the linear regression method. Tree, bagging, gradient boosting, neural network, random forests and KNN were used for nonlinear regression method. We found that the average score increases as fairway firmness or green height or average maximum wind speed increases. We also found that the average score decreases as the number of one-putts or scrambling variable or longest driving distance increases. All 11 different models have low prediction error when predicting the average scores of PGA Tournaments in 2015 which is not included in the training set. However, the performances of Bagging and Random Forest models are the best among all models and these two models have the highest prediction accuracy when predicting the Top 10 and Top 25 best players in 4 different playoffs.

Personalized Size Recommender System for Online Apparel Shopping: A Collaborative Filtering Approach

  • Dongwon Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.39-48
    • /
    • 2023
  • This study was conducted to provide a solution to the problem of sizing errors occurring in online purchases due to discrepancies and non-standardization in clothing sizes. This paper discusses an implementation approach for a machine learning-based recommender system capable of providing personalized sizes to online consumers. We trained multiple validated collaborative filtering algorithms including Non-Negative Matrix Factorization (NMF), Singular Value Decomposition (SVD), k-Nearest Neighbors (KNN), and Co-Clustering using purchasing data derived from online commerce and compared their performance. As a result of the study, we were able to confirm that the NMF algorithm showed superior performance compared to other algorithms. Despite the characteristic of purchase data that includes multiple buyers using the same account, the proposed model demonstrated sufficient accuracy. The findings of this study are expected to contribute to reducing the return rate due to sizing errors and improving the customer experience on e-commerce platforms.

Prediction of arrhythmia using multivariate time series data (다변량 시계열 자료를 이용한 부정맥 예측)

  • Lee, Minhai;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.5
    • /
    • pp.671-681
    • /
    • 2019
  • Studies on predicting arrhythmia using machine learning have been actively conducted with increasing number of arrhythmia patients. Existing studies have predicted arrhythmia based on multivariate data of feature variables extracted from RR interval data at a specific time point. In this study, we consider that the pattern of the heart state changes with time can be important information for the arrhythmia prediction. Therefore, we investigate the usefulness of predicting the arrhythmia with multivariate time series data obtained by extracting and accumulating the multivariate vectors of the feature variables at various time points. When considering 1-nearest neighbor classification method and its ensemble for comparison, it is confirmed that the multivariate time series data based method can have better classification performance than the multivariate data based method if we select an appropriate time series distance function.

A Study on the Development of Tracking Algorithm for Shipborne Automatic Tracking Aids (선박자동추적장치(ATA)의 목표물 추적 알고리즘 개발에 관한 연구)

  • Kim Seok Jae;Koo Ja Yun;Yoon Su Weon
    • Proceedings of KOSOMES biannual meeting
    • /
    • 2003.11a
    • /
    • pp.13-21
    • /
    • 2003
  • Ships if 500 gross tonnage and upwards constructed on or after 1 July 2002 shall have an automatic tracking aids according to SOLAS V /19 but existing ships less than 10,000 gross tonnage constructed before 1 July 2002 have potential collision risks due to the lack of automatic plotting devices like as an ATA This paper aims to provide a homemade ATA by developing the tracking algorithm for ATA and to prevent collision incidents by distributing ATA system to coasters.

  • PDF