Browse > Article

A Methodology for Automatic Hierarchy Definition of Sentences in Engineering Documents  

Park, Sang-Il (연세대학교 토목공학과)
Kim, Bong-Geun (연세대학교 토목공학과)
Kim, Kyeong-Hwan (연세대학교 토목공학과)
Lee, Sang-Ho (연세대학교 사회환경시스템공학부)
Publication Information
Journal of the Computational Structural Engineering Institute of Korea / v.22, no.4, 2009 , pp. 323-330 More about this Journal
Abstract
This paper proposes a methodology for automatic hierarchy classification of subtitles in a engineering document by the a fact that heading symbols of subtitles represent a hierarchical structure of the document. The proposed methodology is composed of two methods: extracting subtitles from plan text document and determining hierarchical structure of the subtitles. The subtitles in a document is extracted by comparing heading symbol patterns with predefined heading symbol groups, and the depth levels of the subtitles are determined by analyzing relative location of subtitles according to change of the heading symbol patterns. A prototype module, which can transform a plain text document into a structured XML document in accordance with a hierarchical structure of subtitles, is developed based on the proposed methodology, and the performance of the module is analyzed with 20 engineering documents.
Keywords
engineering documents; automatic hierarchy definition; XML document;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Liu, S., McMahon, C.A., Darlington, M.J., Culley, S.J., Wild, P.J. (2006) A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management, Advanced Engineering Informatics, 20(1), pp.401-413   DOI   ScienceOn
2 McKechnie, J., Shaaban, S., Lockley, S. (2001) Computer assisted processing of large unstructured document sets: a case study in the construction industry, Proceedings of the 2001 ACM Symposium on Document engineering, pp.11-17, Atlanta, Georgia, USA   DOI
3 Zhu, Y., Issa, R.R. A., Cox, R.F. (2001) Web-based construction document processing via malleable frame, Journal of Computing in Civil Engineering, 15(3), pp.157-169   DOI   ScienceOn
4 Kosala, R., Blockeel, H., Bruynooghe, M., Bussche, H.V. (2006) Information extraction from structured documents using k-testable tree automaton inference, Data & Knowledge Engineering, 58(2), pp.129-158   DOI   ScienceOn
5 Rezgui, Y. (2006) Ontology-centered knowledge management using information retrieval techniques, Journal of Computing in Civil Engineering, 20(4), pp.261-270   DOI   ScienceOn
6 Meziane, F., Rezgui, Y. (2003) A document management methodology based on similarity contents, Information Sciences, 158, pp.15-36   DOI   ScienceOn
7 Caldas, C.H., Soibelman, L. (2003) Automating hierarchical document classification for construction management information systems, Automation in Construction, 12(4), pp.395-406   DOI   ScienceOn
8 Wang, Z., Wang, Y., Gao K. (2005) A new model of document structure analysis, Lecture Notes in Computer Science, 3614, pp.658-666   DOI   ScienceOn