• Title/Summary/Keyword: k-nearest neighbor method (kNN)

Search Result 95, Processing Time 0.027 seconds

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Monitoring Continuous k-Nearest Neighbor Queries, using c-MBR

  • Jung Ha-Rim;Kang Sang-Won;Song Moon-Bae;Im Seok-Jin;Kim Jong-Wan;Hwang Chong-Sun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06c
    • /
    • pp.46-48
    • /
    • 2006
  • This paper addresses the problem of monitoring continuous k-nearest neighbor (k-NN) queries. Given a set of moving (or static) objects and a set of moving (or static) query points, monitoring continuous k-NN query retrieves and updates the closest k objects to a query point continually. In order to support location based services (LBSs) in highly dynamic environments, where objects and/or queries are frequently moving, monitoring continuous queries require real-time updated results when objects and/or queries change their locations. Thus, it is important to minimize time delay for maintaining up to date the results. In this paper, we present monitoring method to shorten time delay for updating continuous k-NN queries based on the notion of result region and the minimum bounding rectangle enclosing all objects in each cell, referred to as c-MBR, in the grid index structure. Simulations are conducted to show the efficiency of the proposed method.

  • PDF

Reverse k-Nearest Neighbor Query Processing Method for Continuous Query Processing in Bigdata Environments (빅데이터 환경에서 연속 질의 처리를 위한 리버스 k-최근접 질의 처리 기법)

  • Lim, Jongtae;Park, Sunyong;Seo, Kiwon;Lee, Minho;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.10
    • /
    • pp.454-462
    • /
    • 2014
  • With the development of location aware technologies and mobile devices, location-based services have been studied. To provide location-based services, many researchers proposed methods for processing various query types with Mapreduce(MR). One of the proposed methods, is a Reverse k-nearest neighbor(RkNN) query processing method with MR. However, the existing methods spend too much cost to process the continuous RkNN query. In this paper, we propose an efficient continuous RkNN query processing method with MR to resolve the problems of the existing methods. The proposed method uses the 60-degree-pruning method. The proposed method does not need to reprocess the query for continuous query processing because the proposed method draws and monitors the monitoring area including the candidate objects of a RkNN query. In order to show the superiority of the proposed method, we compare it with the query processing performance of the existing method.

A Proposal of Remaining Useful Life Prediction Model for Turbofan Engine based on k-Nearest Neighbor (k-NN을 활용한 터보팬 엔진의 잔여 유효 수명 예측 모델 제안)

  • Kim, Jung-Tae;Seo, Yang-Woo;Lee, Seung-Sang;Kim, So-Jung;Kim, Yong-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.4
    • /
    • pp.611-620
    • /
    • 2021
  • The maintenance industry is mainly progressing based on condition-based maintenance after corrective maintenance and preventive maintenance. In condition-based maintenance, maintenance is performed at the optimum time based on the condition of equipment. In order to find the optimal maintenance point, it is important to accurately understand the condition of the equipment, especially the remaining useful life. Thus, using simulation data (C-MAPSS), a prediction model is proposed to predict the remaining useful life of a turbofan engine. For the modeling process, a C-MAPSS dataset was preprocessed, transformed, and predicted. Data pre-processing was performed through piecewise RUL, moving average filters, and standardization. The remaining useful life was predicted using principal component analysis and the k-NN method. In order to derive the optimal performance, the number of principal components and the number of neighbor data for the k-NN method were determined through 5-fold cross validation. The validity of the prediction results was analyzed through a scoring function while considering the usefulness of prior prediction and the incompatibility of post prediction. In addition, the usefulness of the RUL prediction model was proven through comparison with the prediction performance of other neural network-based algorithms.

Performance of Indoor Positioning using Visible Light Communication System (가시광 통신을 이용한 실내 사용자 단말 탐지 시스템)

  • Park, Young-Sik;Hwang, Yu-Min;Song, Yu-Chan;Kim, Jin-Young
    • Journal of Digital Contents Society
    • /
    • v.15 no.1
    • /
    • pp.129-136
    • /
    • 2014
  • Wi-Fi fingerprinting system is a very popular positioning method used in indoor spaces. The system depends on Wi-Fi Received Signal Strength (RSS) from Access Points (APs). However, the Wi-Fi RSS is changeable by multipath fading effect and interference due to walls, obstacles and people. Therefore, the Wi-Fi fingerprinting system produces low position accuracy. Also, Wi-Fi signals pass through walls. For this reason, the existing system cannot distinguish users' floor. To solve these problems, this paper proposes a LED fingerprinting system for accurate indoor positioning. The proposed system uses a received optical power from LEDs and LED-Identification (LED-ID) instead of the Wi-Fi RSS. In training phase, we record LED fingerprints in database at each place. In serving phase, we adopt a K-Nearest Neighbor (K-NN) algorithm for comparing existing data and new received data of users. We show that our technique performs in terms of CDF by computer simulation results. From simulation results, the proposed system shows that a positioning accuracy is improved by 8.6 % on average.

Face Recognition using Fisherface Method with Fuzzy Membership Degree (퍼지 소속도를 갖는 Fisherface 방법을 이용한 얼굴인식)

  • 곽근창;고현주;전명근
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.6
    • /
    • pp.784-791
    • /
    • 2004
  • In this study, we deal with face recognition using fuzzy-based Fisherface method. The well-known Fisherface method is more insensitive to large variation in light direction, face pose, and facial expression than Principal Component Analysis method. Usually, the various methods of face recognition including Fisherface method give equal importance in determining the face to be recognized, regardless of typicalness. The main point here is that the proposed method assigns a feature vector transformed by PCA to fuzzy membership rather than assigning the vector to particular class. In this method, fuzzy membership degrees are obtained from FKNN(Fuzzy K-Nearest Neighbor) initialization. Experimental results show better recognition performance than other methods for ORL and Yale face databases.

Short-term Traffic States Prediction Using k-Nearest Neighbor Algorithm: Focused on Urban Expressway in Seoul (k-NN 알고리즘을 활용한 단기 교통상황 예측: 서울시 도시고속도로 사례)

  • KIM, Hyungjoo;PARK, Shin Hyoung;JANG, Kitae
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.2
    • /
    • pp.158-167
    • /
    • 2016
  • This study evaluates potential sources of errors in k-NN(k-nearest neighbor) algorithm such as procedures, variables, and input data. Previous research has been thoroughly reviewed for understanding fundamentals of k-NN algorithm that has been widely used for short-term traffic states prediction. The framework of this algorithm commonly includes historical data smoothing, pattern database, similarity measure, k-value, and prediction horizon. The outcomes of this study suggests that: i) historical data smoothing is recommended to reduce random noise of measured traffic data; ii) the historical database should contain traffic state information on both normal and event conditions; and iii) trial and error method can improve the prediction accuracy by better searching for the optimum input time series and k-value. The study results also demonstrates that predicted error increases with the duration of prediction horizon and rapidly changing traffic states.

HD-Tree: High performance Lock-Free Nearest Neighbor Search KD-Tree (HD-Tree: 고성능 Lock-Free NNS KD-Tree)

  • Lee, Sang-gi;Jung, NaiHoon
    • Journal of Korea Game Society
    • /
    • v.20 no.5
    • /
    • pp.53-64
    • /
    • 2020
  • Supporting NNS method in KD-Tree algorithm is essential in multidimensional data applications. In this paper, we propose HD-Tree, a high-performance Lock-Free KD-Tree that supports NNS in situations where reads and writes occurs concurrently. HD-Tree reduced the number of synchronization nodes used in NNS and requires less atomic operations during Lock-Free method execution. Comparing with existing algorithms, in a multi-core system with 8 core 16 thread, HD-Tree's performance has improved up to 95% on NNS and 15% on modifying in oversubscription situation.

A Method for Continuous k Nearest Neighbor Search With Partial Order (부분순위 연속 k 최근접 객체 탐색 기법)

  • Kim, Jin-Deog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.1
    • /
    • pp.126-132
    • /
    • 2011
  • In the application areas of LBS(Location Based Service) and ITS(Intelligent Transportation System), continuous k-nearest neighbor query(CkNN) which is defined as a query to find the nearest points of interest to all the points on a given path is widely used. It is necessary to acquire results quickly in the above applications and be applicable to spatial network databases. It is also able to cope successfully with frequent updates of POI objects. This paper proposes a new method to search nearest POIs for moving query objects on the spatial networks. The method produces a set of split points and their corresponding k-POIs as results with partial order among k-POIs. The results obtained from experiments with real dataset show that the proposed method outperforms the existing methods. The proposed method achieves very short processing time(15%) compared with the existing method.

Comparison of the Tracking Methods for Multiple Maneuvering Targets (다중 기동 표적에 대한 추적 방식의 비교)

  • Lim, Sang Seok
    • Journal of Advanced Navigation Technology
    • /
    • v.1 no.1
    • /
    • pp.35-46
    • /
    • 1997
  • Over last decade Multiple Target Tracking (MTT) has been the subject of numerous presentations and conferences [1979-1900]. Various approaches have been proposed to solve the problem. Representative works in the problem are Nearest Neighbor (NN) method based on non-probabilistic data association (DA), Multiple Hypothesis Test (MHT) and Joint Probabilistic Data Association (JPDA) as the probabilistic approaches. These techniques have their own advantages and limitations in computational requirements and in the tracking performances. In this paper, the three promising algorithms based on the NN standard filter, MHT and JPDA methods are presented and their performances against simulated multiple maneuvering targets are compared through numerical simulations.

  • PDF