• Title/Summary/Keyword: Average nearest neighbor analysis

Search Result 33, Processing Time 0.026 seconds

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

Evaluation of Raingauge Network using Area Average Rainfall Estimation and the Estimation Error (면적평균강우량 산정을 통한 강우관측망 평가 및 추정오차)

  • Lee, Ji Ho;Jun, Hwan Don
    • Journal of Wetlands Research
    • /
    • v.16 no.1
    • /
    • pp.103-112
    • /
    • 2014
  • Area average rainfall estimation is important to determine the exact amount of the available water resources and the essential input data for rainfall-runoff analysis. Like that, the necessary criterion for accurate area average rainfall estimate is the uniform spatial distribution of raingauge network. In this study, we suggest the spatial distribution evaluation methodology of raingauge network to estimate better area average rainfall and after the suggested method is applied to Han River and Geum River basin. The spatial distribution of rainfall network can be quantified by the nearest neighbor index. In order to evaluate the effects of the spatial distribution of rainfall network by each basin, area average rainfall was estimated by arithmetic mean method, the Thiessen's weighting method and estimation theory for 2013's rainfall event, and evaluated the involved errors by each cases. As a result, it can be found that the estimation error at the best basin of spatial distribution was lower than the worst basin of spatial distribution.

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

A Design of HPPS(Hybrid Preference Prediction System) for Customer-Tailored Service (고객 맞춤 서비스를 위한 HPPS(Hybrid Preference Prediction System) 설계)

  • Jeong, Eun-Hee;Lee, Byung-Kwan
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1467-1477
    • /
    • 2011
  • This paper proposes a HPPS(Hybrid Preference Prediction System) design using the analysis of user profile and of the similarity among users precisely to predict the preference for custom-tailored service. Contrary to the existing NBCFA(Neighborhood Based Collaborative Filtering Algorithm), this paper is designed using these following rules. First, if there is no neighbor's commodity rating value in a preference prediction formula, this formula uses the rating average value for a commodity. Second, this formula reflects the weighting value through the analysis of a user's characteristics. Finally, when the nearest neighbor is selected, we consider the similarity, the commodity rating, and the rating frequency. Therefore, the first and second preference prediction formula made HPPS improve the precision by 97.24%, and the nearest neighbor selection method made HPPS improve the precision by 75%, compared with the existing NBCFA.

Detection and Analysis of DNA Hybridization Characteristics by using Thermodynamic Method (열역학법을 이용한 DNA hybridization 특성 검출 및 해석)

  • Kim, Do-Gyun;Gwon, Yeong-Su
    • The Transactions of the Korean Institute of Electrical Engineers C
    • /
    • v.51 no.6
    • /
    • pp.265-270
    • /
    • 2002
  • The determination of DNA hybridization reaction can apply the molecular biology research, clinic diagnostics, bioengineering, environment monitoring, food science and application area. So, the improvement of DNA hybridization detection method is very important for the determination of this hybridization reaction. Several molecular biological techniques require accurate predictions of matched versus mismatched hybridization thermodynamics, such as PCR, sequencing by hybridization, gene diagnostics and antisense oligonucleotide probes. In addition, recent developments of oligonucleotide chip arrays as means for biochemical assays and DNA sequencing requires accurate knowledge of hybridization thermodynamics and population ratios at matched and mismatched target sites. In this study, we report the characteristics of the probe and matched, mismatched target oligonucleotide hybridization reaction using thermodynamic method. Thermodynamic of 5 oligonucleotides with central and terminal mismatch sequences were obtained by measured UV-absorbance as a function of temperature. The data show that the nearest-neighbor base-pair model is adequate for predicting thermodynamics of oligonucleotides with average deviations for $\Delta$H$^{0}$ , $\Delta$S$^{0}$ , $\Delta$G$_{37}$ $^{0}$ and T$_{m}$, respectively.>$^{0}$ and T$_{m}$, respectively.

Enhancing Classification Performance of Temporal Keyword Data by Using Moving Average-based Dynamic Time Warping Method (이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안)

  • Jeong, Do-Heon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.4
    • /
    • pp.83-105
    • /
    • 2019
  • This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

A Proposal of Remaining Useful Life Prediction Model for Turbofan Engine based on k-Nearest Neighbor (k-NN을 활용한 터보팬 엔진의 잔여 유효 수명 예측 모델 제안)

  • Kim, Jung-Tae;Seo, Yang-Woo;Lee, Seung-Sang;Kim, So-Jung;Kim, Yong-Geun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.4
    • /
    • pp.611-620
    • /
    • 2021
  • The maintenance industry is mainly progressing based on condition-based maintenance after corrective maintenance and preventive maintenance. In condition-based maintenance, maintenance is performed at the optimum time based on the condition of equipment. In order to find the optimal maintenance point, it is important to accurately understand the condition of the equipment, especially the remaining useful life. Thus, using simulation data (C-MAPSS), a prediction model is proposed to predict the remaining useful life of a turbofan engine. For the modeling process, a C-MAPSS dataset was preprocessed, transformed, and predicted. Data pre-processing was performed through piecewise RUL, moving average filters, and standardization. The remaining useful life was predicted using principal component analysis and the k-NN method. In order to derive the optimal performance, the number of principal components and the number of neighbor data for the k-NN method were determined through 5-fold cross validation. The validity of the prediction results was analyzed through a scoring function while considering the usefulness of prior prediction and the incompatibility of post prediction. In addition, the usefulness of the RUL prediction model was proven through comparison with the prediction performance of other neural network-based algorithms.

A Study of Travel Time Prediction using K-Nearest Neighborhood Method (K 최대근접이웃 방법을 이용한 통행시간 예측에 대한 연구)

  • Lim, Sung-Han;Lee, Hyang-Mi;Park, Seong-Lyong;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.5
    • /
    • pp.835-845
    • /
    • 2013
  • Travel-time is considered the most typical and preferred traffic information for intelligent transportation systems(ITS). This paper proposes a real-time travel-time prediction method for a national highway. In this paper, the K-nearest neighbor(KNN) method is used for travel time prediction. The KNN method (a nonparametric method) is appropriate for a real-time traffic management system because the method needs no additional assumptions or parameter calibration. The performances of various models are compared based on mean absolute percentage error(MAPE) and coefficient of variation(CV). In real application, the analysis of real traffic data collected from Korean national highways indicates that the proposed model outperforms other prediction models such as the historical average model and the Kalman filter model. It is expected to improve travel-time reliability by flexibly using travel-time from the proposed model with travel-time from the interval detectors.

The spatial distribution characteristics of Automatic Weather Stations in the mountainous area over South Korea (우리나라 산악기상관측망의 공간분포 특성)

  • Yoon, Sukhee;Jang, Keunchang;Won, Myoungsoo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.20 no.1
    • /
    • pp.117-126
    • /
    • 2018
  • The purpose of this study is to analyze the spatial distribution characteristics and spatial changes of Automatic Weather Stations (AWS) in mountainous areas with altitude more than 200 meters in South Korea. In order to analyze the spatial distribution patterns, spatial analysis was performed on 203 Automatic Mountain Meteorology Observation Station (AMOS) points from 2012 to 2016 by Euclidean distance analysis, nearest neighbor index analysis, and Kernel density analysis methods. As a result, change of the average distance between 2012 and 2016 decreased up to 16.4km. The nearest neighbor index was 0.666632 to 0.811237, and the result of Z-score test was -4.372239 to -5.145115(P<0.01). The spatial distributions of AMOSs through Kernel density analysis were analyzed to cover 129,719ha/a station in 2012 and 50,914ha/a station in 2016. The result of a comparison between 2012 and 2016 on the spatial distribution has decreased about 169,399ha per a station for the past 5 years. Therefore it needs to be considered the mountainous regions with low density when selecting the site of AMOS.

Onion yield estimation using spatial panel regression model (공간 패널 회귀모형을 이용한 양파 생산량 추정)

  • Choi, Sungchun;Baek, Jangsun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.873-885
    • /
    • 2016
  • Onions are grown in a few specific regions of Korea that depend on the climate and the regional characteristic of the production area. Therefore, when onion yields are to be estimated, it is reasonable to use a statistical model in which both the climate and the region are considered simultaneously. In this paper, using a spatial panel regression model, we predicted onion yields with the different weather conditions of the regions. We used the spatial auto regressive (SAR) model that reflects the spatial lag, and panel data of several climate variables for 13 main onion production areas from 2006 to 2015. The spatial weight matrix was considered for the model by the threshold value method and the nearest neighbor method, respectively. Autocorrelation was detected to be significant for the best fitted model using the nearest neighbor method. The random effects model was chosen by the Hausman test, and the significant climate variables of the model were the cumulative duration time of sunshine (January), the average relative humidity (April), the average minimum temperature (June), and the cumulative precipitation (November).