Browse > Article
http://dx.doi.org/10.3745/KIPSTC.2006.13C.7.859

A Distinction Technology for Harmful Web Documents by Rates  

Kim, Yong-Soo (한국전자통신연구원 정보보호연구단)
Nam, Taek-Yong (한국전자통신연구원 정보보호연구단 보안게이트웨이연구팀)
Won, Dong-Ho (성균관대학교 정보통신공학부)
Abstract
The openness of the Web allows any user to access almost any type of information easily at any time and anywhere. However, with function of easy access for useful information, internet has dysfunctions of providing users with harmful contents indiscriminately. Some information, such as adult content, is not appropriate for all users, notably children. Additionally for adults, some contents included in abnormal porn sites can do ordinary people's mental health harm. In the meantime, since Internet is a worldwide open network it has a limit to regulate users providing harmful contents through each countrie's national laws or systems. Additionally it is not a desirable way of developing a certain system-specific classification technology for harmful contents, because internet users can contact with them in diverse way, for example, porn sites, harmful spams, or peer-to-peer networks, etc. Therefore, it is being emphasized to research and develop context-based core technologies for classifying harmful contents. In this paper, we propose an efficient text filter for blocking harmful texts of web documents using context-based technologies.
Keywords
Text Classification; Harmful Web Text Filtering; Machine Learning; Keyword Matching;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W.Frakes and R.Baeza-Yates, Information Retrieval: Data Structures and Algorithms, Chapter7, Prentice-Hall, 1992
2 T.Joachims, Estimating the Generalization Performance of a SVM Efficiently, Proceedings of the International Conference on Machine Learning, 2000
3 G.Siolas, Support Vector Machines based on a semantic kernel for text categorization, IJCNN 2000, Vol.5, pp.205-209, 2000   DOI
4 시소러스, http://www.minjung.net/bbs/zboard.php?id=hk221a&page=1&sn1=&divpage=1&sn=off&ss=on&sc=on&select_arrange=hit&desc=asc&no=294&PHPSESSID=91520360f59f5ba41270dc082eaf5b21
5 강승식, 한국어 형태소 분석과 정보 검색, 홍릉과학출판사, 2002
6 M.Shin and C.Park, A Radial Basis Function Approach to Pattern Recognition and its Applications, ETRI Journal, Vol.22, No.2, pp.1-10, 2000   DOI   ScienceOn
7 Y.Yang and J.O.Pederson, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the Fourteenth International Conference on Machine Learning(ICML'97), pp.412-420, 1997