• 제목/요약/키워드: K-nearest neighbor approach

검색결과 100건 처리시간 0.028초

Determining the optimal number of cases to combine in a case-based reasoning system for eCRM

  • Hyunchul Ahn;Kim, Kyoung-jae;Ingoo Han
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2003년도 Proceeding
    • /
    • pp.178-184
    • /
    • 2003
  • Case-based reasoning (CBR) often shows significant promise for improving effectiveness of complex and unstructured decision making. Consequently, it has been applied to various problem-solving areas including manufacturing, finance and marketing. However, the design of appropriate case indexing and retrieval mechanisms to improve the performance of CBR is still challenging issue. Most of previous studies to improve the effectiveness for CBR have focused on the similarity function or optimization of case features and their weights. However, according to some of prior researches, finding the optimal k parameter for k-nearest neighbor (k-NN) is also crucial to improve the performance of CBR system. Nonetheless, there have been few attempts which have tried to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. In this study, we introduce a genetic algorithm (GA) to optimize the number of neighbors to combine. This study applies the new model to the real-world case provided by an online shopping mall in Korea. Experimental results show that a GA-optimized k-NN approach outperforms other AI techniques for purchasing behavior forecasting.

  • PDF

손목 움직임 추정을 위한 Gaussian Mixture Model 기반 표면 근전도 패턴 분류 알고리즘 (A Gaussian Mixture Model Based Surface Electromyogram Pattern Classification Algorithm for Estimation of Wrist Motions)

  • 정의철;유송현;이상민;송영록
    • 대한의용생체공학회:의공학회지
    • /
    • 제33권2호
    • /
    • pp.65-71
    • /
    • 2012
  • In this paper, the Gaussian Mixture Model(GMM) which is very robust modeling for pattern classification is proposed to classify wrist motions using surface electromyograms(EMG). EMG is widely used to recognize wrist motions such as up, down, left, right, rest, and is obtained from two electrodes placed on the flexor carpi ulnaris and extensor carpi ulnaris of 15 subjects under no strain condition during wrist motions. Also, EMG-based feature is derived from extracted EMG signals in time domain for fast processing. The estimated features based in difference absolute mean value(DAMV) are used for motion classification through GMM. The performance of our approach is evaluated by recognition rates and it is found that the proposed GMM-based method yields better results than conventional schemes including k-Nearest Neighbor(k-NN), Quadratic Discriminant Analysis(QDA) and Linear Discriminant Analysis(LDA).

단어선택과 SMOTE 알고리즘을 이용한 불균형 텍스트 데이터의 소수 범주 예측성능 향상 기법 (Improving minority prediction performance of support vector machine for imbalanced text data via feature selection and SMOTE)

  • 김종찬;장성준;손원
    • 응용통계연구
    • /
    • 제37권4호
    • /
    • pp.395-410
    • /
    • 2024
  • 텍스트 데이터는 일반적으로 많은 다양한 단어들로 구성되어 있다. 평범한 텍스트 데이터의 경우에도 수만 개의 서로 다른 단어들을 포함하고 있는 경우를 흔히 관찰할 수 있으며 방대한 양의 텍스트 데이터에서는 수십만 개에 이르는 고유한 단어들이 포함되어 있는 경우도 있다. 텍스트 데이터를 전처리하여 문서-단어 행렬을 만드는 경우 고유한 단어를 하나의 변수로 간주하게 되는데 이렇게 많은 단어들을 각각 하나의 변수로 간주한다면 텍스트 데이터는 매우 많은 변수를 가진 데이터로 볼 수 있다. 한편, 텍스트 데이터의 분류 문제에서는 분류의 목표변수가 되는 범주의 비중에 큰 차이가 나는 불균형 데이터 문제를 자주 접하게 된다. 이렇게 범주의 비중에 큰 차이가 있는 불균형 데이터의 경우에는 일반적인 분류모형의 성능이 크게 저하될 수 있다는 사실이 잘 알려져 있다. 따라서 불균형 데이터에서의 분류 성능을 개선하기 위해 소수집단의 관측값들을 합성하여 소수집단에 포함되는 새로운 관측값을 생성하는 합성과표집기법(synthetic over-sampling technique; SMOTE) 등의 알고리즘을 적용할 수 있다. SMOTE는 k-최근접이웃(k-nearset neighbor; kNN) 알고리즘을 이용하여 새로운 합성 데이터를 생성하는데 텍스트 데이터와 같이 많은 변수를 가진 데이터의 경우에는 오차가 누적되어 kNN의 성능에 문제가 생길 수 있다. 이 논문에서는 변수선택을 통해 변수가 많은 불균형 텍스트 데이터를 오차가 축소된 공간에 표현하고 이 공간에서 새로운 합성 관측값을 생성하여 불균형 텍스트 데이터에서 소수 범주에 대한 SVM 분류모형의 예측 성능을 향상시키는 방법을 제안한다.

거리 기반 유사도 측정을 통한 유방 초음파 영상의 내용 기반 검색 컴퓨터 보조 진단 시스템에 관한 연구 (A Study of CBIR(Content-based Image Retrieval) Computer-aided Diagnosis System of Breast Ultrasound Images using Similarity Measures of Distance)

  • 김민정;조현종
    • 전기학회논문지
    • /
    • 제66권8호
    • /
    • pp.1272-1277
    • /
    • 2017
  • To assist radiologists for the characterization of breast masses, Computer-aided Diagnosis(CADx) system has been studied. The CADx system can improve the diagnostic accuracy of radiologists by providing objective information about breast masses. Morphological and texture features were extracted from the breast ultrasound images. Based on extracted features, the CADx system retrieves masses that are similar to a query mass from a reference library using a k-nearest neighbor (k-NN) approach. Eight similarity measures of distance, Euclidean, Chebyshev(Minkowski family), Canberra, Lorentzian($F_2$ family), Wave Hedges, Motyka(Intersection family), and Cosine, Dice(Inner Product family) are evaluated by ROC(Receiver Operating Characteristic) analysis. The Inner Product family measure used with the k-NN classifier provided slightly higher performance for classification of malignant and benign masses than those with the Minkowski, $F_2$, and Intersection family measures.

Object-oriented Information Extraction and Application in High-resolution Remote Sensing Image

  • WEI Wenxia;Ma Ainai;Chen Xunwan
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2004년도 Proceedings of ISRS 2004
    • /
    • pp.125-127
    • /
    • 2004
  • High-resolution satellite images offer abundance information of the earth surface for remote sensing applications. The information includes geometry, texture and attribute characteristic. The pixel-based image classification can't satisfy high-resolution satellite image's classification precision and produce large data redundancy. Object-oriented information extraction not only depends on spectrum character, but also use geometry and structure information. It can provide an accessible and truly revolutionary approach. Using Beijing Spot 5 high-resolution image and object-oriented classification with the eCognition software, we accomplish the cultures' precise classification. The test areas have five culture types including water, vegetation, road, building and bare lands. We use nearest neighbor classification and appraise the overall classification accuracy. The average of five species reaches 0.90. All of maximum is 1. The standard deviation is less than 0.11. The overall accuracy can reach $95.47\%.$ This method offers a new technology for high-resolution satellite images' available applications in remote sensing culture classification.

  • PDF

Imputation of Medical Data Using Subspace Condition Order Degree Polynomials

  • Silachan, Klaokanlaya;Tantatsanawong, Panjai
    • Journal of Information Processing Systems
    • /
    • 제10권3호
    • /
    • pp.395-411
    • /
    • 2014
  • Temporal medical data is often collected during patient treatments that require personal analysis. Each observation recorded in the temporal medical data is associated with measurements and time treatments. A major problem in the analysis of temporal medical data are the missing values that are caused, for example, by patients dropping out of a study before completion. Therefore, the imputation of missing data is an important step during pre-processing and can provide useful information before the data is mined. For each patient and each variable, this imputation replaces the missing data with a value drawn from an estimated distribution of that variable. In this paper, we propose a new method, called Newton's finite divided difference polynomial interpolation with condition order degree, for dealing with missing values in temporal medical data related to obesity. We compared the new imputation method with three existing subspace estimation techniques, including the k-nearest neighbor, local least squares, and natural cubic spline approaches. The performance of each approach was then evaluated by using the normalized root mean square error and the statistically significant test results. The experimental results have demonstrated that the proposed method provides the best fit with the smallest error and is more accurate than the other methods.

Measurements of Impervious Surfaces - per-pixel, sub-pixel, and object-oriented classification -

  • Kang, Min Jo;Mesev, Victor;Kim, Won Kyung
    • 대한원격탐사학회지
    • /
    • 제31권4호
    • /
    • pp.303-319
    • /
    • 2015
  • The objectives of this paper are to measure surface imperviousness using three different classification methods: per-pixel, sub-pixel, and object-oriented classification. They are tested on high-spatial resolution QuickBird data at 2.4 meters (four spectral bands and three principal component bands) as well as a medium-spatial resolution Landsat TM image at 30 meters. To measure impervious surfaces, we selected 30 sample sites with different land uses and residential densities across image representing the city of Phoenix, Arizona, USA. For per-pixel an unsupervised classification is first conducted to provide prior knowledge on the possible candidate spectral classes, and then a supervised classification is performed using the maximum-likelihood rule. For sub-pixel classification, a Linear Spectral Mixture Analysis (LSMA) is used to disentangle land cover information from mixed pixels. For object-oriented classification several different sets of scale parameters and expert decision rules are implemented, including a nearest neighbor classifier. The results from these three methods show that the object-oriented approach (accuracy of 91%) provides more accurate results than those achieved by per-pixel algorithm (accuracy of 67% and 83% using Landsat TM and QuickBird, respectively). It is also clear that sub-pixel algorithm gives more accurate results (accuracy of 87%) in case of intensive and dense urban areas using medium-resolution imagery.

Statistical Approach to Noisy Band Removal for Enhancement of HIRIS Image Classification

  • Huan, Nguyen Van;Kim, Hak-Il
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2008년도 춘계학술대회 논문집
    • /
    • pp.195-200
    • /
    • 2008
  • The accuracy of classifying pixels in HIRIS images is usually degraded by noisy bands since noisy bands may deform the typical shape of spectral reflectance. Proposed in this paper is a statistical method for noisy band removal which mainly makes use of the correlation coefficients between bands. Considering each band as a random variable, the correlation coefficient measures the strength and direction of a linear relationship between two random variables. While the correlation between two signal bands is high, existence of a noisy band will produce a low correlation due to ill-correlativeness and undirectedness. The application of the correlation coefficient as a measure for detecting noisy bands is under a two-pass screening scheme. This method is independent of the prior knowledge of the sensor or the cause resulted in the noise. The classification in this experiment uses the unsupervised k-nearest neighbor algorithm in accordance with the well-accepted Euclidean distance measure and the spectral angle mapper measure. This paper also proposes a hierarchical combination of these measures for spectral matching. Finally, a separability assessment based on the between-class and within-class scatter matrices is followed to evaluate the performance.

  • PDF

Ride comfort of the bridge-traffic-wind coupled system considering bridge surface deterioration

  • Liu, Yang;Yin, Xinfeng;Deng, Lu;Cai, C.S.
    • Wind and Structures
    • /
    • 제23권1호
    • /
    • pp.19-43
    • /
    • 2016
  • In the present study, a new methodology is presented to study the ride comfort and bridge responses of a long-span bridge-traffic-wind coupled vibration system considering stochastic characteristics of traffic flow and bridge surface progressive deterioration. A three-dimensional vehicle model with 24 degrees-of-freedoms (DOFs) including a three-dimensional non-linear suspension seat model and the longitudinal vibration of the vehicle is firstly presented to study the ride comfort. An improved cellular automaton (CA) model considering the influence of the next-nearest neighbor vehicles and a progressive deterioration model for bridge surface roughness are firstly introduced. Based on the equivalent dynamic vehicle model approach, the bridge-traffic-wind coupled equations are established by combining the equations of motion of both the bridge and vehicles in traffic using the displacement relationship and interaction force relationship at the patch contact. The numerical simulations show that the proposed method can simulate rationally the ride comfort and bridge responses of the bridge-traffic-wind coupled system; and the vertical, lateral, and longitudinal vibrations of the driver seat model can affect significantly the driver's comfort, as expected.

An Ontology-Based Labeling of Influential Topics Using Topic Network Analysis

  • Kim, Hyon Hee;Rhee, Hey Young
    • Journal of Information Processing Systems
    • /
    • 제15권5호
    • /
    • pp.1096-1107
    • /
    • 2019
  • In this paper, we present an ontology-based approach to labeling influential topics of scientific articles. First, to look for influential topics from scientific article, topic modeling is performed, and then social network analysis is applied to the selected topic models. Abstracts of research papers related to data mining published over the 20 years from 1995 to 2015 are collected and analyzed in this research. Second, to interpret and to explain selected influential topics, the UniDM ontology is constructed from Wikipedia and serves as concept hierarchies of topic models. Our experimental results show that the subjects of data management and queries are identified in the most interrelated topic among other topics, which is followed by that of recommender systems and text mining. Also, the subjects of recommender systems and context-aware systems belong to the most influential topic, and the subject of k-nearest neighbor classifier belongs to the closest topic to other topics. The proposed framework provides a general model for interpreting topics in topic models, which plays an important role in overcoming ambiguous and arbitrary interpretation of topics in topic modeling.