• Title/Summary/Keyword: k-Nearest Neighbor Method

Search Result 313, Processing Time 0.026 seconds

An Evaluation of Category Features in Text Categorization Using Nearest Neighbor Method (Nearest Neighbor 방법을 이용한 문서 범주화에서 범주 자질의 평가)

  • Kwon, Oh-Woog;Lee, Jong-Hyeok;Lee, Geun-Bae
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.7-14
    • /
    • 1997
  • 문서 범주화에서 문서의 내용에 따라 적합한 범주의 종류와 수를 찾는 문제를 해결하기 위해서는 문서 당 하나의 범주를 할당할 경우에 가장 좋은 성능을 보이는 모델이 효과적일 것이다. 그러므로, 본 논문에서는 문서 당 하나의 범주를 할당할 경우에 좋은 결과를 보이는 k-nearest neighbor 방법을 이용한다. 그리고 k-nearest neighbor 방법을 이용한 문서 범주화의 성능을 향상시키기 위해서, 문서 표현에 사용하는 단어들을 범주 자질의 성격을 갖는 단어들로 제한하는 방법을 제안한다. 제안한 방법은 Router 신문 일년치로 구성된 Router-21578 테스트 집합에서 breakeven point 82%라는 좋은 결과를 보였다.

  • PDF

VLSI design of a FNNPDS encoder for vector quantization (벡터양자화를 위한 FNNPDS 인코더의 VLSI 설계)

  • Kim Hyeung-Cheol;Shim Jeong-Bo;Jo Je-Hwang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.2 s.332
    • /
    • pp.83-88
    • /
    • 2005
  • We propose the design method for the VLSI architecture of FNNPDS combined PDS(partial distance search) and FNNS(fast nearest neighbor search), which are used to fast encoding in vector quantization, and obtain the results that FNNPDS(fast nearest neighbor partial distance search) is faster method than the conventional methods by simulation. In simulations, we investigate timing diagrams described searching time of the nearest codevector for an input vector, and compare the average clock cycles per input vector for Lena and Peppers images. According to the result of simulations, the number of the clock cycle of FNNPDS was reduced to $79.2\%\~11.7\%$ as compared with the number using the conventional techniques.

Probabilistic K-nearest neighbor classifier for detection of malware in android mobile (안드로이드 모바일 악성 앱 탐지를 위한 확률적 K-인접 이웃 분류기)

  • Kang, Seungjun;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.817-827
    • /
    • 2015
  • In this modern society, people are having a close relationship with smartphone. This makes easier for hackers to gain the user's information by installing the malware in the user's smartphone without the user's authority. This kind of action are threats to the user's privacy. The malware characteristics are different to the general applications. It requires the user's authority. In this paper, we proposed a new classification method of user requirements method by each application using the Principle Component Analysis(PCA) and Probabilistic K-Nearest Neighbor(PKNN) methods. The combination of those method outputs the improved result to classify between malware and general applications. By using the K-fold Cross Validation, the measurement precision of PKNN is improved compare to the previous K-Nearest Neighbor(KNN). The classification which difficult to solve by KNN also can be solve by PKNN with optimizing the discovering the parameter k and ${\beta}$. Also the sample that has being use in this experiment is based on the Contagio.

Feature Selection for Multiple K-Nearest Neighbor classifiers using GAVaPS (GAVaPS를 이용한 다수 K-Nearest Neighbor classifier들의 Feature 선택)

  • Lee, Hee-Sung;Lee, Jae-Hun;Kim, Eun-Tai
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.871-875
    • /
    • 2008
  • This paper deals with the feature selection for multiple k-nearest neighbor (k-NN) classifiers using Genetic Algorithm with Varying reputation Size (GAVaPS). Because we use multiple k-NN classifiers, the feature selection problem for them is vary hard and has large search region. To solve this problem, we employ the GAVaPS which outperforms comparison with simple genetic algorithm (SGA). Further, we propose the efficient combining method for multiple k-NN classifiers using GAVaPS. Experiments are performed to demonstrate the efficiency of the proposed method.

A Missing Data Imputation by Combining K Nearest Neighbor with Maximum Likelihood Estimation for Numerical Software Project Data (K-NN과 최대 우도 추정법을 결합한 소프트웨어 프로젝트 수치 데이터용 결측값 대치법)

  • Lee, Dong-Ho;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.273-282
    • /
    • 2009
  • Missing data is one of the common problems in building analysis or prediction models using software project data. Missing imputation methods are known to be more effective missing data handling method than deleting methods in small software project data. While K nearest neighbor imputation is a proper missing imputation method in the software project data, it cannot use non-missing information of incomplete project instances. In this paper, we propose an approach to missing data imputation for numerical software project data by combining K nearest neighbor and maximum likelihood estimation; we also extend the average absolute error measure by normalization for accurate evaluation. Our approach overcomes the limitation of K nearest neighbor imputation and outperforms on our real data sets.

Model-Based Object Recognition using PCA & Improved k-Nearest Neighbor (PCA와 개선된 k-Nearest Neighbor를 이용한 모델 기반형 물체 인식)

  • Jung Byeong-Soo;Kim Byung-Gi
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.53-62
    • /
    • 2006
  • Object recognition techniques using principal component analysis are disposed to be decreased recognition rate when lighting change of image happens. The purpose of this thesis is to propose an object recognition technique using new PCA analysis method that discriminates an object in database even in the case that the variation of illumination in training images exists. And the object recognition algorithm proposed here represents more enhanced recognition rate using improved k-Nearest Neighbor. In this thesis, we proposed an object recognition algorithm which creates object space by pre-processing and being learned image using histogram equalization and median filter. By spreading histogram of test image using histogram equalization, the effect to change of illumination is reduced. This method is stronger to change of illumination than basic PCA method and normalization, and almost removes effect of illumination, therefore almost maintains constant good recognition rate. And, it compares ingredient projected test image into object space with distance of representative value and recognizes after representative value of each object in model image is made. Each model images is used in recognition unit about some continual input image using improved k-Nearest Neighbor in this thesis because existing method have many errors about distance calculation.

Continuous K-Nearest Neighbor Query Processing Considering Peer Mobilities in Mobile P2P Networks (모바일 P2P 네트워크에서 피어의 이동성을 고려한 연속적인 k-최근접 질의 처리)

  • Bok, Kyoung-Soo;Lee, Hyun-Jung;Park, Young-Hun;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.8
    • /
    • pp.47-58
    • /
    • 2012
  • In this paper, we propose a continuous k-nearest neighborhood query processing method for updating the query results in real-time over mobile peer-to-peer environments. The proposed method disseminates a monitoring region to efficiently monitor the k-nearest neighbor peers. The Monitoring Region is created to assure at least k peers as the result of the query within the time range using the vector of neighbor peers. In the propose method, the monitoring region is valid for a long time because it is calculated by the vector of neighbor peers of the query peer. Therefore, the proposed method decreases the cost of re-processing by monitoring region invalidation. In order to show the superiority of the proposed method, we compare it with the previous schemes through performance evaluation.

Enhancement of Text Classification Method (텍스트 분류 기법의 발전)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.155-156
    • /
    • 2019
  • Traditional machine learning based emotion analysis methods such as Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) are less accurate. In this paper, we propose an improved kNN classification method. Improved methods and data normalization achieve the goal of improving accuracy. Then, three classification algorithms and an improved algorithm were compared based on experimental data.

  • PDF

Nearest Neighbor Query Processing in the Mobile Environment

  • Choi Hyun Mi;Jung Young Jin;Lee Eung Jae;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.677-680
    • /
    • 2004
  • In the mobile environment, according to the movement of the object, the query finds the nearest special object or place from object position. However, because query object moves continuously in the mobile environment, query demand changes according to the direction attribute of query object. Also, in the case of moving of query object and simply the minimum distance value of query result, sometimes we find the result against the query object direction. Especially, in most road condition, as user has to return after reaching U-turn area, user rather spends time and cost. Therefore, in order to solve those problems, in this paper we propose the nearest neighbor method considering moving object position and direction for mobile recommendation system.

  • PDF

A KD-Tree-Based Nearest Neighbor Search for Large Quantities of Data

  • Yen, Shwu-Huey;Hsieh, Ya-Ju
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.3
    • /
    • pp.459-470
    • /
    • 2013
  • The discovery of nearest neighbors, without training in advance, has many applications, such as the formation of mosaic images, image matching, image retrieval and image stitching. When the quantity of data is huge and the number of dimensions is high, the efficient identification of a nearest neighbor (NN) is very important. This study proposes a variation of the KD-tree - the arbitrary KD-tree (KDA) - which is constructed without the need to evaluate variances. Multiple KDAs can be constructed efficiently and possess independent tree structures, when the amount of data is large. Upon testing, using extended synthetic databases and real-world SIFT data, this study concludes that the KDA method increases computational efficiency and produces satisfactory accuracy, when solving NN problems.