Browse > Article
http://dx.doi.org/10.6109/JKIICE.2007.11.1.163

Semantic Topic Selection Method of Document for Classification  

Ko, kwang-Sup (건국대학교 컴퓨터공학과)
Kim, Pan-Koo (조선대학교 컴퓨터공학부)
Lee, Chang-Hoon (건국대학교 컴퓨터공학과)
Hwang, Myung-Gwon (조선대학교 컴퓨터공학부)
Abstract
The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.
Keywords
TF;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Illlhoi Yoo, Xiaohua Hu, 'A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE', International Conference on Digital Libraries archive Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries table of contents, pp. 220-229, ISBB:1-59593-354-9, 2006
2 Yifen Huang, Tom M. Mitchell,'Text clustering with extended user feedback', Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 413-420, ISBN: 1-59593- 369-7, 2006
3 Hyunjang Kong, Myunggwon Hwang, Gwangsu Hwang, Jaehong Shim, Pankoo Kim, 'Topic Selection of Web Documents Using Specific Domain Ontology', MICAI 2006: Advances in Artificial Intelligence, LNAI 4293, pp.1047-1056, 2006
4 Koller, D., Sahami, M.: Hierarchically Classifying Documents Using Very Few Words. In the Proceeding of Machine Learning (ICML-97) (1997) 170-176
5 김준수, 옥철영, '정제된 의미정보와 시소러스를 이용한 동형이의어 분별시스템', 정보처리학회논문 지 B 제 12-B권 제7호 pp.829-840 2005. 12   과학기술학회마을   DOI
6 쵀재혁, 서혜성, 노상욱, 최경희, 정기현, '온톨로지 기반의 웹 페이지 분류시스템',정보처리학회논문 지 B 제 11-Brnjs, 제 6호, pp723-734, 2004년 10월
7 Quek, C.Y, Mitchell, T: Classification of World Wide Web Documents. Seniors Honors Thesis, School of Computer Science, Carnegie Melon University (1998)
8 Thierson Couto, Marco Cristo, Marcos Andre Goncalves, Pavel Calado, Nivio Ziviani, Edleno Moura, Berthier Ribeiro-Neto, Belo Horizonte, 'A comparative study of citations and links in document classification', Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, ISBN: 1-59593-354-9, pp.75-84, 2006
9 황명권, 배용근, 김판구, '문서 내용의 계층화률 이용한 문서 비교 방법', 한국해양정보통신학회논문 제 제 10권 12호, pp2335-2342, 2006년 12월   과학기술학회마을
10 Greiner, R., Grove, A, Schuurmans, D.: On learning hierarchical Classifications (1997)
11 허준희, 최준혁, 이정현, 김중배, 임기옥, '문서의 주 제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템', 정보처리학회논문지 B 제 8-brnjs 제5호 pp.447-454 2001.10
12 R.Hanson, J.Stutz and P.Cheeseman, 'Bayesian Classification Theory', Techinical Report FIA-90-12-7-01, NASA Ames research Center, AI Branch, 1991
13 Jinze Liu, Wei Wang, Jiong Yang, 'Research track posters: A framework for ontology-driven subspace clustering', Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining KDD '04, pp. 623-628, ISBN:1-58113-888-1, Aug. 2004
14 Hwanjo Yu, ChengXiang Zhai, Jiawei Han, 'Text classification from positiveand unlabeled documents', Source Conference on Information and Knowledge Management archive Proceedings of the twelfth international conference on Information and knowledge management , ISBN:1-58113-723-0, pp.232-239, 2003
15 M.P.Sinka and D.W.Corne, 'A large benchma가 dataset for web document clustering,' Soft Computing Systems:Design, Management and Applications, Frontiers in Artificial Intelligence and Applications, Vol.87, pp.881-890, 2002
16 http://en.wikipedia.org/wiki/Tf-idf