Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure

;;

Proceedings of the Korea Inteligent Information System Society Conference (한국지능정보시스템학회:학술대회논문집)

2004.11a
/
Pages.221-229
/
2004

Korea Intelligent Information System Society (한국지능정보시스템학회)

Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure

시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링

오승준 (한양대학교 산업공학과) ;
김재련 (한양대학교 산업공학과)

Published : 2004.11.01

PDF

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

Proceedings of the Korea Inteligent Information System Society Conference (한국지능정보시스템학회:학술대회논문집)

Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure

시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링

Abstract

Keywords

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)