Browse > Article
http://dx.doi.org/10.5391/JKIIS.2013.23.2.126

Website Classification based on Occurrence Frequency of Medical Terms and Hyperlinks in Webpage  

Lee, In Keun (Department of Medical Informatics, Kyungpook National University)
Kim, Hwa Sun (Department of Medical Information Technology, Daegu Haany University)
Cho, Hune (Department of Medical Informatics, Kyungpook National University)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.23, no.2, 2013 , pp. 126-132 More about this Journal
Abstract
This study proposed a method to classify internet websites based on occurrence frequency of medical terms in the webpages and website structure composed with webpages and hyperlinks. The classification was done by using the suitability measure defined by three factors: (1)occurrence frequency of medical terms in the whole terms involved in a webpage, (2)occurrence frequency of medical terms in de-duplicated terms involved in the webpage, and (3)the number of hyperlinks to reach to a specific webpage from homepage. We conducted an experiment to verify the proposed method with the 80 websites registered in directories related to medical field and 127 websites in nonmedical field directories, and the experiment result showed 82.5 % of accuracy of the classification.
Keywords
Website Classification; Term Frequency; Website Structure; Suitability Measure;
Citations & Related Records
Times Cited By KSCI : 5  (Citation Analysis)
연도 인용수 순위
1 S.S. Lee, "Korean Document Classification Using Extended Vector Space Mode," KIPS Transactions: PartB, vol. 18-B, no. 2, pp. 93-108, 2011.
2 X. Qi and B.D. Davison, "Web Page Classification: Features and algorithms," ACM Computing Surveys, vol. 41, pp. 1-31, 2009.
3 S. Chakrabarti, B. van den Berg, and B. Dom, "Focused crawling: a new approach to topic-specific Web resource discovery," In Proceeding of the 8th International Converence on World Wide Web, pp. 1623-1640, 1999.
4 D. Mladenic, "Turning Yahoo into an automatic Web-page classifier," In Proceedings of the European Conference on Artificial Intelligence, pp. 473-474, 1998.
5 C. Li, D.R. Byun, and S.C. Park, "BPNN Algorithm with SVD Technique for Korean Document categorization," Journal of the Korea Industrial Information System Society, vol. 15, no. 2, pp. 49-57, 2010.   과학기술학회마을
6 W.H. Lee, S.J. Chung, and D.U. An, "Harmful Document Classification Using the Harmful Word Filtering and SVM," KIPS Transactions: PartB, vol. 16-B, no. 2, pp. 85-92, 2009.   과학기술학회마을   DOI   ScienceOn
7 D.-H. Park, W.-S. Choi, H.-J. Kim, and S.-L. Lee, "Web Document Classification Based on Hangeul Morpheme and Keyword Analyses," KIPS Transactions: PartD, vol. 19-D, no. 4, pp. 263-270, 2012.   과학기술학회마을   DOI   ScienceOn
8 Y.H. Tian, T.J. Huang, W. Gao, J. Cheng, and P.B. Kang, "Two-Phase Web Site Classification Based on Hidden Markov Tree Models," In Proceedings of the IEEE/WIC International Converence on Web Intelligence, 2003.
9 N. Kim and J. Park, "Personal Information Detection by Using Naive Bayes Methodology," Journal of Intelligence and Information Systems, vol. 18, no. 1, pp. 91-107, 2012.
10 K.S. Ko, M.G. Hwang, P.K. Kim, and C.H. Lee, "Semantic Topic Selection Method of Document for Classification," The J ournal of the Korean Institute of Information and Communication Engineering, vol. 11, no. 1, pp. 163-172, 2007.   과학기술학회마을
11 M. Ester, H.-P. Kriegel, and M. Schubert, "Web Site Mining: A new way to spot Competitors, Customers and Suppliers in the World Wide Web," In Proceedings of the 8th ACM SIGKDD, pp. 249-258, 2002.
12 O.-W. Kwon and J.-H. Lee, "Text Categorization based on k-nearest Neighbor Approach for Web site Classification," Information Processing and Management, vol. 39, pp. 25-44, 2003.   DOI   ScienceOn
13 E. Amitay, D. Carmel, A. Darlow, R. Lempel, and A. Soffer, "The Connectivity Sonar: Detecting Site Functionality by Structural Patterns," In Proceedings of the 14th ACM Conference on Hypertext and Hypermedia, pp. 38-47, 2003.
14 G. Salton, E.A. Fox, and H. Wu, "Extended Boolean Information Retrieval," Communications of the ACM, vol. 26, no. 12, pp. 1022-1036, 1983.   DOI   ScienceOn
15 "Espresso POS-K Tagger", Available: http://air.changwon.ac.kr/blog/2012/01/04/esspreso-pos-tagger-for-korean, [Accessed: July 26, 2012]
16 지제근, 알기쉬운의학용어 풀이집, 고려의학, 2004.