Browse > Article
http://dx.doi.org/10.5392/IJoC.2016.12.2.024

Semi-Automatic Ontology Construction from HTML Documents: A conversion of Text-formed Information into OWL 2  

Im, Chan jong (Department of Information and Telecommunication Pai Chai University)
Kim, Do wan (Pai Chai University)
Publication Information
Abstract
Ontology is known to be one of the most important technologies in achieving semantic web. It is critical as it represents the knowledge in a machine readable state. World Wide Web Consortium (W3C) has been contributing to the development of ontology for the last several years. However, the recommendation of W3C left out HTML despite the massive amount of information it contains. Also, it is difficult and time consuming to keep up with all the technologies especially in the case of constructing ontology. Thus, we propose a module and methods that reuse HTML documents, extract necessary information from HTML tags and mapping it to OWL 2. We will be combining two kinds of approaches which will be the structural refinement for making an ontology skeleton and linguistic approach for adding detailed information onto the skeleton.
Keywords
Ontology; Semantic Web; HTML; Natural Language Processing;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Thomas R. GRUBER, “Toward principles for the design of ontologies used for knowledge sharing?,” International journal of human-computer studies, vol. 43, issue. 5, 1995, pp. 907-928.   DOI
2 Basic Formal Ontology, Overview, 2014. Online, http://infomis.uni-saarland.de/bfo/overview - [Last accessed Jul. 14, 2015]
3 Suggested Upper Merged Ontology, Home, 2015. Online, http://www.adampease.org/OP/index.html - [Last accessed Jul. 14, 2015]
4 Hyoun-Soo KWAK, Su-Kayoung Kim, Yeong-Geun Kim, and Kee-Hong Ann, “A Conversion System of HTML Document into OWL Ontology Language, Korean journal Information Processing Society, vol. 11, no. 2, 2004, pp. 539-542.
5 Taimao SUN, Yiyeon YOON, Wooju KIM, “A Conversion from HTML5 to OWL Ontology,” Journal of Society for e-Business Studies, vol. 18, no. 3, 2013.   DOI
6 Jsoup: Java HTML Parser, 2015. Online, http://jsoup.org/ - [Last accessed Aug. 25, 2015]
7 TOUTANOVA, Kristina, et al., "Feature-rich part-of-speech tagging with a cyclic dependency network," In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, Association for Computational Linguistics, 2003, pp. 173-180.
8 Luciano DEL CORRO and Rainer GEMULLA, "Clausie: clause-based open information extraction," In: Proceedings of the 22nd international conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2013, pp. 355-366.
9 Gabor ANGELI, Melvin Johnson PREMKUMAR, and Christopher D. MANNING, "Leveraging Linguistic Structure for Open Domain Information Extraction," In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL, 2015, pp. 26-31.
10 Hoon HWANGBO and Hongchul LEE, "Reusing of information constructed in HTML documents: A conversion of HTML into OWL," In: Control, Automation and Systems, ICCAS 2008, International Conference on. IEEE, 2008, pp. 871-875.
11 Uta PRISS, “Formal concept analysis in information science,” Arist, vol. 40, no. 1, 2006, pp. 521-543.
12 Lauren WOOD, et al. Document Object Model (DOM) Level 3 Core Specification, 2000.
13 Saikat MUKHERJEE, et al., "Automatic discovery of semantic structures in html documents," In: Proceedings of the Seventh International Conference on Document Analysis and Recognition-Volume 1, IEEE Computer Society, 2003, p. 245.
14 Min-Gu Kim, "An Intelligent Taxonomy Relation Extraction System for Automatic Ontology Construction," Ph.D. Thesis, Ajou University, Suwon, Republic of Korea, p. 105.
15 Protégé, Products, 2015. Online, http://protege.stanford.edu/support.php - [Last accessed Feb. 15, 2016].
16 The Stanford Natural Language Processing Group: Software, 2014. Online, http://nlp.stanford.edu/software/index.shtml - [Last accessed Jul. 22, 2015].
17 Bernardo Cuenca GRAU, et al, “OWL 2: The next step for OWL,” Web Semantics: science, services and agents on the World Wide Web, vol. 6, no. 4, 2008, pp. 309-322.   DOI
18 George A. MILLER, “WordNet: a lexical database for English,” Communications of the ACM, vol. 38, no. 11, 1995, pp. 39-41.   DOI
19 Universal Dependencies, Universal dependency relations, 2014. Online, http://universaldependencies.github.io/docs/#language-u - [Last accessed Aug. 5, 2015].
20 GLOMIS, What is GLOMIS?, 2014. Online, http://glomis.pcu.ac.kr/ - [Last accessed August 18, 2015].
21 David NADEAU and Satoshi SEKINE, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, no. 1, 2007, pp. 3-26.   DOI
22 The Stanford Natural Language Processing Group, Stanford Named Entity Recognizer, 2015. Online, http://nlp.stanford.edu/software/CRF-NER.html - [Last accessed Feb. 15, 2016].
23 The Stanford Natural Language Processing Group, Stanford Open Information Extraction, 2015. Online, http://nlp.stanford.edu/software/openie.html - [Last accessed Feb. 15, 2016].