Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2009.16-B.1.85

Harmful Document Classification Using the Harmful Word Filtering and SVM  

Lee, Won-Hee (전북대학교 컴퓨터공학과)
Chung, Sung-Jong (전북대학교 전자정보공학부)
An, Dong-Un (전북대학교 전자정보공학부)
Abstract
As World Wide Web is more popularized nowadays, the environment is flooded with the information through the web pages. However, despite such convenience of web, it is also creating many problems due to uncontrolled flood of information. The pornographic, violent and other harmful information freely available to the youth, who must be protected by the society, or other users who lack the power of judgment or self-control is creating serious social problems. To resolve those harmful words, various methods proposed and studied. This paper proposes and implements the protecting system that it protects internet youth user from harmful contents. To classify effective harmful/harmless contents, this system uses two step classification systems that is harmful word filtering and SVM learning based filtering. We achieved result that the average precision of 92.1%.
Keywords
Harmful Word; Classification; SVM; Filtering; Web Document;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 Nello Cristianini and John Shawe-Taylor, “An Introduction to Support Vector Machines and other kernel-based learning methods,” Cambridge university press, 2000
2 P.Y.Lee, S.C.Hui and A.C.M. Fong, “Neural Networks for Web Content Filtering,” IEEE Intelligent Systems, pp.48-57, Sept./Oct. 2002   DOI
3 Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin, “A Practical Guide to Support Vector Classification,” http://www.csie.ntu.edu.tw/~cjlin/libsvm/
4 Dequan Zheng, Yi Hu, Tiejun Zhao, Hao Yu and Sheng Li, “Research of Machine Learning Method for Specific Information Recognition on the Internet,” IEEE International Conference on Multimedia Interfaces(ICMI), pp, October 2002   DOI
5 Huicheng Zheng, Hongmei Liu and Mohamed Daoudi, “Blocking Objectionable Image : Adult Images and Harmful Symbols,” IEEE International Conference on Multimedia and Expo(ICME), pp.1223-1226, June 2004   DOI
6 Jae-Sun Lee and Young-Hee Jeon, “A Study on the Effective Selective Filtering Technology of Harmful Website Using Internet Content Rating Service,” Communication of KIPS Review, Vol.09, No.02, Oct. 2002
7 M. Hammami, Y.Chahir and L.Chen, “WebGuard: Web Based Adult Content Detection and Filtering System,” IEEE WIC International Conference. Web Intelligence, pp.574-578, 2003
8 Mohamed Hammami, Youssef Chahir and Liming Chen, “WebGuard: A Web Filtering Engine Combining Textual, Structural, and Visual Content-Based Analysis,” IEEE Transaction On Knowledge and Data Engineering, Vol.18, No.2, February 2006   DOI   ScienceOn
9 Seung-Man Lee, Young-Hun Jang and Jung-Hwan Lim, “Implementation of a Harmful Website's Automatic Classification System based on Morphological Analysis and Skin-Color Distribution's Human Detection Algorithm,” KISS Spring Conference Vol.31, No.1, pp.601-603, Apr. 2004   과학기술학회마을
10 Qing Yang and Fang-Min Li, “SUPPORT VECTOR MACHINE FOR CUSTOMIZED EMAIL FILTERING BASED ON IMPROVING LATENT SEMANTIC INDEXING,” Proceedings of the Fourth International conference on Machine Learning and Cybernetics, Vol.6, pp.3787-3791, Aug. 2005   DOI
11 Thorsten Joachims, “Learning to Classify Text using Support Vector Machines,” Kluwer Academic Publishers, 2002
12 Chih-Chung Chang and Chih-Jen Lin, “LIBSVM:a Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html
13 김영수, 남택용, 원동호, “등급에 따른 웹 유해 문서 분류 기술”, 한국정보처리학회논문지C, 제13C권 7호, pp.859-864, 2006   과학기술학회마을   DOI
14 김영택 외, “자연언어처리”, 생능출판사, 2003
15 권용진, 황수찬 역, “정보검색개론”, 도서출판 미래컴, 2003
16 Reed J.W, Jiao Yu, Potok T.E, Klump B.A, Elmore M.T and Hurson A.R, “TF-ICF, A New Term Weighting Scheme for Clustering Dynamic Data Streams,” Machine Learning and Applications, 2006. ICMLA '06. 5th International Conference on Dec. 2006 Page(s), 258-263   DOI
17 KwangHyun Kim, JoungMi Choi and JoonHo Lee, “Detecting Harmful Web Documents Based on Web Document Analyses,” Communication of KIPS Review, Vol.12-D, No.5, pp.683-688, Oct. 2005   과학기술학회마을   DOI   ScienceOn
18 P.Y. Lee and S.C. Hui, “An Intelligent Categorization Engine for Bilingual Web Content Filtering,” IEEE Transaction On Multimedia, Vol.7, No.6, December 2005   DOI   ScienceOn
19 Yun-Jung Jang, Taehun Lee, Kyu Cheol Jung and Kihong Park, “The Method of Hurtfulness Site Interception Using Poisonous Character Weight,” KIPS Spring Conference, Vol.10, No.01, pp.2185-2188, May 2003   과학기술학회마을
20 Christopher D. Hunter, “Internet Filter Effectiveness : Testing Over and Underinclusive Blocking Decisions of Four Popular Filters,” Proceedings of the tenth conference on Computers, freedom and privacy: challenging the assumptions, pp.287-294, April 2000   DOI