Search | Korea Science

Transformation of Text Contents of Engineering Documents into an XML Document by using a Technique of Document Structure Extraction (문서구조 추출기법을 이용한 엔지니어링 문서 텍스트 정보의 XML 변환)

Lee, Sang-Ho;Park, Junwon;Park, Sang Il;Kim, Bong-Geun
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.31 no.6D
- /
- pp.849-856
- /
- 2011
This paper proposes a method for transforming unstructured text contents of engineering documents, which have complex hierarchical structure of subtitles with various heading symbols, into a semi-structured XML document according to the hierarchical subtitle structure. In order to extract the hierarchical structure from plain text information, this study employed a method of document structure extraction which is an analysis technique of the document structure. In addition, a method for processing enumerative text contents was developed to increase overall accuracy during extraction of the subtitles and construction of a hierarchical subtitle structure. An application module was developed based on the proposed method, and the performance of the module was evaluated with 40 test documents containing structural calculation records of bridges. The first test group of 20 documents related to the superstructure of steel girder bridges as applied in a previous study and they were used to verify the enhanced performance of the proposed method. The test results show that the new module guarantees an increase in accuracy and reliability in comparison with the test results of the previous study. The remaining 20 test documents were used to evaluate the applicability of the method. The final mean value of accuracy exceeded 99%, and the standard deviation was 1.52. The final results demonstrate that the proposed method can be applied to diverse heading symbols in various types of engineering documents to represent the hierarchical subtitle structure in a semi-structured XML document.
https://doi.org/10.12652/Ksce.2011.31.6D.849 인용 PDF KSCI

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
- Science of Emotion and Sensibility
- /
- v.13 no.3
- /
- pp.449-458
- /
- 2010
As occasion demands to obtain efficient information from many documents and reviews on the Internet in many kinds of fields, automatic classification of opinion or thought is required. These automatic classification is called sentiment classification, which can be divided into three steps, such as subjective expression classification to extract subjective sentences from documents, sentiment classification to classify whether the polarity of documents is positive or negative, and strength classification to classify whether the documents have weak polarity or strong polarity. The latest studies in Opinion Mining have used N-gram words, lexical phrase pattern, and syntactic phrase pattern, etc. They have not used single word as feature for classification. Especially, patterns have been used frequently as feature because they are more flexible than N-gram words and are also more deterministic than single word. Theses studies are mainly concerned with English, other studies using patterns for Korean are still at an early stage. Although Korean has a slight difference in the meaning between predicates by the change of endings, which is 'Eomi' in Korean, of declinable words, the earlier studies about Korean opinion classification removed endings from predicates only to extract stems. Finally, this study introduces the earlier studies and methods using pattern for English, uses extracted sentimental patterns from Korean documents, and classifies polarities of these documents. In this paper, it also analyses the influence of the change of endings on performances of opinion classification.
PDF

An Automatic Classification System of Official Documents in Middle Schools Using Term Weighting of Titles (제목의 단어 가중치를 이용한 중등학교 공문서 자동분류시스템)

Kang, Hyun-Hee;Jin, Min
- Journal of The Korean Association of Information Education
- /
- v.7 no.2
- /
- pp.219-226
- /
- 2003
It takes a lot of time to classify official documents in schools and educational institutions. In order to reduce the overhead, we propose an automatic document classification method using word information of the titles of documents in this paper. At first, meaningful words are extracted from titles of existing documents and Inverse Document Frequency(IDF) weights of words are calculated against each category. Then we build a word weight dictionary. Documents are automatically classified into the appropriate category of which the sum of weights of words of the title is the highest by using the word weight dictionary. We also evaluate the performance of the proposed method using a real dataset of a middle school.
PDF

Experimental Analysis of Correct Answer Characteristics in Question Answering Systems (질의응답시스템에서 정답 특징에 관한 실험적 분석)

Han, Kyoung-Soo
- Journal of Digital Contents Society
- /
- v.19 no.5
- /
- pp.927-933
- /
- 2018
One of the factors that have the greatest influence on the error of the question answering system that finds and provides answers to natural language questions is the step of searching for documents or passages that contain correct answers. In order to improve the retrieval performance, it is necessary to understand the characteristics of documents and passages containing correct answers. This paper experimentally analyzes how many question words appear in the correct answer documents, how the location of the question word is distributed, and how the topic of the question and the correct answer document are similar using the corpus composed of the question, the documents with correct answer, and the documents without correct answer. This study explains the causes of previous search research results for question answer system and discusses the necessary elements of effective search step.
https://doi.org/10.9728/dcs.2018.19.5.927 인용 KSCI

Analysis and Implementation of a Web Document Converter for Wireless Internet Use XHTML On Mobile Communication Environment (이동통신환경에서 XHTML을 이용한 무선인터넷 문서변환기 분석 및 구현)

백진영;이종옥;조성언;조경룡
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2001.10a
- /
- pp.105-108
- /
- 2001
This paper is purposed in design and implement of a device which can convert XHTML documents in web-Server into WML documents when users access the web by using portable devices. Users access XHTML(so-called HTML) web page and ask for informations, this document convertor recognizes of XHTML documents structures, reconstructs into simple WML documents by using

Title, Summary, Keyword

Publications

Publication Year

Volume

Issue

Page

Author

Affiliation

Publisher

DOI

Publication Type

Journal

Conference Proceeding Paper

Magazine

Search Result 1,074, Processing Time 0.033 seconds

Transformation of Text Contents of Engineering Documents into an XML Document by using a Technique of Document Structure Extraction (문서구조 추출기법을 이용한 엔지니어링 문서 텍스트 정보의 XML 변환)

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

An Automatic Classification System of Official Documents in Middle Schools Using Term Weighting of Titles (제목의 단어 가중치를 이용한 중등학교 공문서 자동분류시스템)

Experimental Analysis of Correct Answer Characteristics in Question Answering Systems (질의응답시스템에서 정답 특징에 관한 실험적 분석)

Analysis and Implementation of a Web Document Converter for Wireless Internet Use XHTML On Mobile Communication Environment (이동통신환경에서 XHTML을 이용한 무선인터넷 문서변환기 분석 및 구현)

Image Search (β)