• Title/Summary/Keyword: Text Construction

Search Result 386, Processing Time 0.054 seconds

Construction of Full-text Database by SGML (문서기술언어 SGML에 의한 전문 데이터베이스의 구축)

  • Kim, Chang-Bong
    • Journal of Information Management
    • /
    • v.27 no.4
    • /
    • pp.35-56
    • /
    • 1996
  • SGML(Standard Generalized Markup Language) and its application to full-text database including a table, a figure and a picture are explained. A structure of SGML based full-text database Is defined by DTD(document type definition) written in SGML, and full-text itself is described with generalized markup depending on DTD. This article explains how to represent a document structure : a hierarchical structure like a chapter, a section, or a paragraph, or non-hierarchical(referencial) structure like a note, a table, a figure or a picture. Merits of SGML, electronic publishing, a retrieval system or hypertext and SGML tools are also described.

  • PDF

Construction of Full-Text Database and Implementation of Service Environment for Electronic Theses and Dissertations (학위논문 전문데이터베이스 구축 및 서비스환경 구현)

  • Lee, Kyi-Ho;Kim, Jin-Suk;Yoon, Wha-Muk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.41-49
    • /
    • 2000
  • Form the middle of 199os, most universities in Korea have requested their students to submit not only the original text books but also their Electronic Theses and Dissertations(ETD) for masters degree and doctorates degree. The ETD submitted by the students are usually developed by various kinds of word processors such as MS-Word, LaTex, and HWP. Since there is no standard format for ETD to merge various different formats yet, it is difficult to construct the integrated database that provides full-tex service. In this paper, we transform three different ETD formats into a unified one, construct a full-text database, and implement the full-text retrieval system for effective search in the Internet environment.

  • PDF

Analysis of Potential Construction Risk Types in Formal Documents Using Text Mining (텍스트 마이닝을 통한 건설공사 공문 잠재적 리스크 유형 분석)

  • Eom, Sae Ho;Cha, Gichun;Park, Sun Kyu;Park, Seunghee;Park, Jongho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.1
    • /
    • pp.91-98
    • /
    • 2023
  • Since risks occurring in construction projects can have a significant impact on schedules and costs, there have been many studies on this topic. However, risk analysis is often limited to only certain construction situations,and experience-dependent decision-making is therefore mainly performed. Data-based analyses have only been partially applied to safety and contract documents. Therefore, in this study, cluster analysis and a Word2Vec algorithm were applied to formal documents that contain important elements for contractors or clients. An initial classification of document content into six types was performed through cluster analysis, and 157 occurrence types were subdivided through application of the Word2Vec algorithm. The derived terms were re-classified into five categories and reviewed as to whether the terms could develop into potential construction risk factors. Identifying potential construction risk factors will be helpful as basic data for process management in the construction industry.

Fast Construction of Suffix Arrays for DNA Strings (DNA 스트링에 대하여 써픽스 배열을 구축하는 빠른 알고리즘)

  • Jo, Jun-Ha;Kim, Nam-Hee;Kwon, Ki-Ryong;Kim, Dong-Kyue
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.8
    • /
    • pp.319-326
    • /
    • 2007
  • To perform fast searching in massive data such as DNA strings, the most efficient method is to construct full-text index data structures of given strings. The widely used full-text index structures are suffix trees and suffix arrays. Since the suffix may uses less space than the suffix tree, the suffix array is proper for DNA strings. Previously developed construction algorithms of suffix arrays are not suitable for DNA strings since those are designed for integer alphabets. We propose a fast algorithm to construct suffix arrays on DNA strings whose alphabet sizes are fixed by 4. We reduce the construction time by improving encoding and merging steps on Kim et al.[1]'s algorithm. Experimental results show that our algorithm constructs suffix arrays on DNA strings 1.3-1.6 times faster than Kim et al.'s algorithm, and also for other algorithms in most cases.

Advanced CBS (Cost Breakdown Structure) Code Search Technology Applying NLP (Natural Language Processing) of Artificial Intelligence (인공지능 자연어 처리 기법을 이용한 개선된 내역코드 탐색방법)

  • Kim, HanDo;Nam, JeongYong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.44 no.5
    • /
    • pp.719-731
    • /
    • 2024
  • For efficient construction management, linking BIM with schedule and cost is essential, but there are limits to the application of 5D BIM due to the difficulty in disassembling thousands of WBS and CBS. To solve this problem, a standardized WBS-CBS set is configured in advance, and when a new construction project occurs, the CBS in the BOQ is automatically linked to the WBS when a text most similar to it is found among the standard CBS (Public Procurement Service standard construction code) of the already linked set. A method was used to compare the text similarity of CBS more efficiently using artificial intelligence natural language processing techniques. Firstly, we created a civil term dictionary (CTD) that organized the words used in civil projects and assigned numerical values, tokenized the text of all CBS into words defined in the dictionary, converted them into TF-IDF vectors, and determined them by cosine similarity. Additionally, the search success rate increased to nearly 70 % by considering CBS' hierarchical structure and changing keywords. The threshold value for judging similarity was 0.62 (1: perfect match, 0: no match).

Crafting a Quality Performance Evaluation Model Leveraging Unstructured Data (비정형데이터를 활용한 건축현장 품질성과 평가 모델 개발)

  • Lee, Kiseok;Song, Taegeun;Yoo, Wi Sung
    • Journal of the Korea Institute of Building Construction
    • /
    • v.24 no.1
    • /
    • pp.157-168
    • /
    • 2024
  • The frequent occurrence of structural failures at building construction sites in Korea has underscored the critical role of rigorous oversight in the inspection and management of construction projects. As mandated by prevailing regulations and standards, onsite supervision by designated supervisors encompasses thorough documentation of construction quality, material standards, and the history of any reconstructions, among other factors. These reports, predominantly consisting of unstructured data, constitute approximately 80% of the data amassed at construction sites and serve as a comprehensive repository of quality-related information. This research introduces the SL-QPA model, which employs text mining techniques to preprocess supervision reports and establish a sentiment dictionary, thereby enabling the quantification of quality performance. The study's findings, demonstrating a statistically significant Pearson correlation between the quality performance scores derived from the SL-QPA model and various legally defined indicators, were substantiated through a one-way analysis of variance of the correlation coefficients. The SL-QPA model, as developed in this study, offers a supplementary approach to evaluating the quality performance of building construction projects. It holds the promise of enhancing quality inspection and management practices by harnessing the wealth of unstructured data generated throughout the lifecycle of construction projects.