A K-Nearest Neighbor Algorithm for Categorical Sequence Data

Oh Seung-Joon;

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

Volume 10 Issue 2 Serial No. 34
/
Pages.215-221
/
2005
/
1598-849X(pISSN)
/
2383-9945(eISSN)

Korean Society of Computer Information (한국컴퓨터정보학회)

A K-Nearest Neighbor Algorithm for Categorical Sequence Data

범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘

Oh Seung-Joon

오승준 (경기공업대학 산업경영시스템과)

Published : 2005.05.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

최근에는 단백질 시퀀스, 소매점 거래 데이터, 웹 로그 등과 같은 상업적이거나 과학적인 데이터의 폭발적인 증가를 볼 수 있다. 이런 데이터들은 순서적인 면을 가지고 있는 시퀀스 데이터들이다. 본 논문에서는 이런 시퀀스 데이터들을 분류하는 문제를 다룬다. 분류 기법 으로는 의사결정 나무나 베이지안 분류기, K-NN방법 등 석러 종류가 있는데, 본 연구에서는 또-U방법을 이용하여 시퀀스들을 분류한다. 또한, 시퀀스들간의 유사도를 구하기 위한 새로운 계산 방법과 효율적인 계산 방법도 제안한다.

Journal of the Korea Society of Computer and Information (한국컴퓨터정보학회논문지)

A K-Nearest Neighbor Algorithm for Categorical Sequence Data

범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)