Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2003.10B.6.719

An Improvement Of Efficiency For kNN By Using A Heuristic  

Lee, Jae-Moon (한성대학교 정보전산학부)
Abstract
This paper proposed a heuristic to enhance the speed of kNN without loss of its accuracy. The proposed heuristic minimizes the computation of the similarity between two documents which is the dominant factor in kNN. To do this, the paper proposes a method to calculate the upper limit of the similarity and to sort the training documents. The proposed heuristic was implemented on the existing framework of the text categorization, so called, AI :: Categorizer and it was compared with the conventional kNN with the well-known data, Router-21578. The comparisons show that the proposed heuristic outperforms kNN about 30∼40% with respect to the execution time.
Keywords
Text Categorization; Training Document; Testing Document; kNN; NaiveBayes; SVM; Document Vector;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Y. Yang, 'A Study on Thresholding Strategies for Text Categorization,' In 24th Annual Intermational ACM SIGIR Conference on Research and Development in Information Retrieval, New York, 2001
2 Williams K. and R.A. Calvo, 'A Framework for Text Categorization,' 7th Australian Document Computing Symposium, December, 2002
3 Y. Yang, 'Expert Network : Effective and efficient learning from human decisions in text categorization and retrieval,' In 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994
4 S.T. Dumais, J. Platt, D. Heckerman, and M. Sahami, 'Inductive learning algorithms and representations for text categorization,' In CIKN   DOI
5 Calvo, R.A. and H.A. Ceccatto, 'Intelligent Document Classification,' Intelligent Data Analysis, 4(5), 2000
6 Sebastiani F., 'Machine learning in automated text categorization,' ACM Computing Surveys, 34(1), pp.1-47, 2002   DOI   ScienceOn
7 Reuters-21578 Document Collection, http://about.reuters.com/researchandstandards/corpus
8 Y. Yang and X. Liu, 'A re-examination of text categorization methods,' In 22nd Annual International ACM SIGIR Congerence on Reseaech and Development in Information Retrieval, Berkley, August, 1999   DOI
9 Calvo, R.A. and J.M. Lee, 'Coping with the News: the machine learning way,' The 9th Australian Workd Wide Web Conference(AUSWEB 03), 2003
10 김한준, '텍스트 마이닝 기술을 적용한 대용량 온라인 문서데이터의 계층적 조직화 기법,' 서울대학교 대학원 박사학위 논문, 2002
11 Calvo, R.A., 'Classifying financial news with neural networs,' In 6th Australian Document Symposium, p.6, December, 2001
12 Tom Ault and Y. Yang, 'kNN, Rocchio and Metrics for Information Fitering at TREC-10,' In The 10th Text Retrieval Conference(TREC-10), NIST, 2001