• Title/Summary/Keyword: K-Nearest Neighbor

Search Result 644, Processing Time 0.028 seconds

Comparison of Forest Growing Stock Estimates by Distance-Weighting and Stratification in k-Nearest Neighbor Technique (거리 가중치와 층화를 이용한 최근린기반 임목축적 추정치의 정확도 비교)

  • Yim, Jong Su;Yoo, Byung Oh;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.101 no.3
    • /
    • pp.374-380
    • /
    • 2012
  • The k-Nearest Neighbor (kNN) technique is popularly applied to assess forest resources at the county level and to provide its spatial information by combining large area forest inventory data and remote sensing data. In this study, two approaches such as distance-weighting and stratification of training dataset, were compared to improve kNN-based forest growing stock estimates. When compared with five distance weights (0 to 2 by 0.5), the accuracy of kNN-based estimates was very similar ranged ${\pm}0.6m^3/ha$ in mean deviation. The training dataset were stratified by horizontal reference area (HRA) and forest cover type, which were applied by separately and combined. Even though the accuracy of estimates by combining forest cover type and HRA- 100 km was slightly improved, that by forest cover type was more efficient with sufficient number of training data. The mean of forest growing stock based kNN with HRA-100 and stratification by forest cover type when k=7 were somewhat underestimated ($5m^3/ha$) compared to statistical yearbook of forestry at 2011.

An Improved Algorithm of Searching Neighbor Agents in a Large Flocking Behavior (대규모 무리 짓기에서 이웃 에이전트 탐색의 개선된 알고리즘)

  • Lee, Jae-Moon;Jung, In-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.5
    • /
    • pp.763-770
    • /
    • 2010
  • This paper proposes an algorithm to enhance the performance of the spatial partitioning method for a flocking behavior. One of the characteristics in a flocking behavior is that two agents may share many common neighbors if they are spatially close to each other. This paper improves the spatial partitioning method by applying this characteristic. While the conventional spatial partitioning method computes the k-nearest neighbors of an agent one by one, the proposed method computes simultaneously the k-nearest neighbors of agents if they are spatially close to each other. The proposed algorithm was implemented and its performance was experimentally compared with the original spatial partitioning method. The results of the comparison showed that the proposed algorithm outperformed the original method by about 33% in average.

A Modified Grey-Based k-NN Approach for Treatment of Missing Value

  • Chun, Young-M.;Lee, Joon-W.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.421-436
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the deng's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(Wen's GRG & weighted mean) method is the best of any other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF

Neighborhood Selection with Intrinsic Partitions (데이터 분포에 기반한 유사 군집 선택법)

  • Kim, Kye-Hyeon;Choi, Seung-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.428-432
    • /
    • 2007
  • We present a novel method for determining k nearest neighbors, which accurately recognizes the underlying clusters in a data set. To this end, we introduce the "tiling neighborhood" which is constructed by tiling a number of small local circles rather than a single circle, as existing neighborhood schemes do. Then we formulate the problem of determining the tiling neighborhood as a minimax optimization, leading to an efficient message passing algorithm. For several real data sets, our method outperformed the k-nearest neighbor method. The results suggest that our method can be an alternative to existing for general classification tasks, especially for data sets which have many missing values.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

SOSiM: Shape-based Object Similarity Matching using Shape Feature Descriptors (SOSiM: 형태 특징 기술자를 사용한 형태 기반 객체 유사성 매칭)

  • Noh, Chung-Ho;Lee, Seok-Lyong;Chung, Chin-Wan;Kim, Sang-Hee;Kim, Deok-Hwan
    • Journal of KIISE:Databases
    • /
    • v.36 no.2
    • /
    • pp.73-83
    • /
    • 2009
  • In this paper we propose an object similarity matching method based on shape characteristics of an object in an image. The proposed method extracts edge points from edges of objects and generates a log polar histogram with respect to each edge point to represent the relative placement of extracted points. It performs the matching in such a way that it compares polar histograms of two edge points sequentially along with edges of objects, and uses a well-known k-NN(nearest neighbor) approach to retrieve similar objects from a database. To verify the proposed method, we've compared it to an existing Shape-Context method. Experimental results reveal that our method is more accurate in object matching than the existing method, showing that when k=5, the precision of our method is 0.75-0.90 while that of the existing one is 0.37, and when k=10, the precision of our method is 0.61-0.80 while that of the existing one is 0.31. In the experiment of rotational transformation, our method is also more robust compared to the existing one, showing that the precision of our method is 0.69 while that of the existing one is 0.30.

Stochastic disaggregation of daily rainfall based on K-Nearest neighbor resampling method (K번째 최근접 표본 재추출 방법에 의한 일 강우량의 추계학적 분해에 대한 연구)

  • Park, HeeSeong;Chung, GunHui
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.4
    • /
    • pp.283-291
    • /
    • 2016
  • As the infrastructures and populations are the condensed in the mega city, urban flood management becomes very important due to the severe loss of lives and properties. For the more accurate calculation of runoff from the urban catchment, hourly or even minute rainfall data have been utilized. However, the time steps of the measured or forecasted data under climate change scenarios are longer than hourly, which causes the difficulty on the application. In this study, daily rainfall data was disaggregated into hourly using the stochastic method. Based on the historical hourly precipitation data, Gram Schmidt orthonormalization process and K-Nearest Neighbor Resampling (KNNR) method were applied to disaggregate daily precipitation into hourly. This method was originally developed to disaggregate yearly runoff data into monthly. Precipitation data has smaller probability density than runoff data, therefore, rainfall patterns considering the previous and next days were proposed as 7 different types. Disaggregated rainfall was resampled from the only same rainfall patterns to improve applicability. The proposed method was applied rainfall data observed at Seoul weather station where has 52 years hourly rainfall data and the disaggregated hourly data were compared to the measured data. The proposed method might be applied to disaggregate the climate change scenarios.

A Study on the Treatment of Missing Value using Grey Relational Grade and k-NN Approach

  • Chun, Young-Min;Chung, Sung-Suk
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.55-62
    • /
    • 2006
  • Huang proposed a grey-based nearest neighbor approach to predict accurately missing attribute value in 2004. Our study proposes which way to decide the number of nearest neighbors using not only the dong's grey relational grade but also the wen's grey relational grade. Besides, our study uses not an arithmetic(unweighted) mean but a weighted one. Also, GRG is used by a weighted value when we impute a missing values. There are four different methods - DU, DW, WU, WW. The performance of WW(wen's GRG & weighted mean) method is the best of my other methods. It had been proven by Huang that his method was much better than mean imputation method and multiple imputation method. The performance of our study is far superior to that of Huang.

  • PDF

Molecular Dynamics Simulation Studies of Benzene, Toluene, and p-Xylene in a Canonical Ensemble

  • Kim, Ja-Hun;Lee, Song-Hui
    • Bulletin of the Korean Chemical Society
    • /
    • v.23 no.3
    • /
    • pp.441-446
    • /
    • 2002
  • We have presented the results of thermodynamic, structural and dynamic properties of liquid benzene, toluene, and p-xylene in canonical (NVT) ensemble at 293.15 K by molecular dynamics (MD) simulations. The molecular model adopted for these molecules is a combination of the rigid body treatment for the benzene ring and an atomistically detailed model for the methyl hydrogen atoms. The calculated pressures are too low in the NVT ensemble MD simulations. The various thermodynamic properties reflect that the intermolecular interactions become stronger as the number of methyl group attached into the benzene ring increases. The pronounced nearest neighbor peak in the center of mass g(r) of liquid benzene at 293.15 K, provides the interpretation that nearest neighbors tend to be perpendicular. Two self-diffusion coefficients of liquid benzene at 293.15 K calculated from MSD and VAC function are in excellent agreement with the experimental measures. The self-diffusion coefficients of liquid toluene also agree well with the experimental ones for toluene in benzene and for toluene in cyclohexane.