• Title/Summary/Keyword: Text Construction

Search Result 386, Processing Time 0.042 seconds

A Study on Construction of Technical Reports Management System Using Optical Technology (광기술을 이용한 연구보고서 관리시스템 구축)

  • 이상헌;김익철
    • Journal of the Korean Society for information Management
    • /
    • v.9 no.1
    • /
    • pp.131-164
    • /
    • 1992
  • In this study. a technical report management system using optical technology is described in detail. This management system is designed for both bibliographic (character) and full-text (image) information. Several optical filing systems already on the Korean market are scrutinized and compared with standard functions in order to build a more efficient management system for technical reports which can be easily integrated into existing KRISS library automation system. For that purpose, up-to-date technologies (i.e., digital image PI-ocessing (DIP), MARC standards, and optical character recognition (OCR), etc.) are applied to this system.

  • PDF

Implementation of Annotation and Thesaurus for Remote Sensing

  • Chae, Gee-Ju;Yun, Young-Bo;Park, Jong-Hyun
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.222-224
    • /
    • 2003
  • Many users want to add some their own information to data which was on the web and computer without actually needing to touch data. In remote sensing, the result data for image classification consist of image and text file in general. To overcome these inconvenience problems, we suggest the annotation method using XML language. We give the efficient annotation method which can be applied to web and viewing of image classification. We can apply the annotation for web and image classification with image and text file. The need for thesaurus construction is the lack of information for remote sensing and GIS on search engine like Empas, Naver and Google. In search engine, we can’t search the information for word which has many different names simultaneously. We select the remote sensing data from different sources and make the relation between many terms. For this process, we analyze the meaning for different terms which has similar meaning.

  • PDF

A CTR Prediction Approach for Text Advertising Based on the SAE-LR Deep Neural Network

  • Jiang, Zilong;Gao, Shu;Dai, Wei
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1052-1070
    • /
    • 2017
  • For the autoencoder (AE) implemented as a construction component, this paper uses the method of greedy layer-by-layer pre-training without supervision to construct the stacked autoencoder (SAE) to extract the abstract features of the original input data, which is regarded as the input of the logistic regression (LR) model, after which the click-through rate (CTR) of the user to the advertisement under the contextual environment can be obtained. These experiments show that, compared with the usual logistic regression model and support vector regression model used in the field of predicting the advertising CTR in the industry, the SAE-LR model has a relatively large promotion in the AUC value. Based on the improvement of accuracy of advertising CTR prediction, the enterprises can accurately understand and have cognition for the needs of their customers, which promotes the multi-path development with high efficiency and low cost under the condition of internet finance.

Manipulation of Complex Documents of DVI Format in the Internet Environment and Construction of Full-Text Database (인터넷을 기반으로 한 DVI 포맷의 복합문서 전송 및 전문 데이터베이스 구축 사례 연구)

  • 윤화묵;김진숙;이기호
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.153-155
    • /
    • 1999
  • 1990년대 중반부터 인터넷의 활성화와 다양하고 강력한 문서편집기의 보편화에 따라 복잡한 문서들이 대량으로 생산됨에 따라 인터넷을 통한 효율적인 문서교환의 필요성이 늘어나고 있다. 그러나 생산된 방대한 양의 전자형태 복합문서들은 ?글, MS-Word, LaTex 등 다양한 문서편집기로 작성되었고 문서형식의 표준화가 이루어지지 않아, 효율적으로 활용되지 못하고 특히 문서교환에 있어 많은 문제점을 야기하고 있는 실정이다. 본 논문에서는 다양한 형태로 존재하는 복합문서들을 하나의 통일된 중간포맷으로 변환하고, 변환된 복합문서들을 전문데이터베이스(full-text database)화하여 이를 인터넷을 통해 효율적으로 검색할 수 있는 전문검색시스템 모델을 제시한다.

  • PDF

Improving Explainability of Generative Pre-trained Transformer Model for Classification of Construction Accident Types: Validation of Saliency Visualization

  • Byunghee YOO;Yuncheul WOO;Jinwoo KIM;Moonseo PARK;Changbum Ryan AHN
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1284-1284
    • /
    • 2024
  • Leveraging large language models and safety accident report data has unique potential for analyzing construction accidents, including the classification of accident types, injured parts, and work processes, using unstructured free text accident scenarios. We previously proposed a novel approach that harnesses the power of fine-tuned Generative Pre-trained Transformer to classify 6 types of construction accidents (caught-in-between, cuts, falls, struck-by, trips, and other) with an accuracy of 82.33%. Furthermore, we proposed a novel methodology, saliency visualization, to discern which words are deemed important by black box models within a sentence associated with construction accidents. It helps understand how individual words in an input sentence affect the final output and seeks to make the model's prediction accuracy more understandable and interpretable for users. This involves deliberately altering the position of words within a sentence to reveal their specific roles in shaping the overall output. However, the validation of saliency visualization results remains insufficient and needs further analysis. In this context, this study aims to qualitatively validate the effectiveness of saliency visualization methods. In the exploration of saliency visualization, the elements with the highest importance scores were qualitatively validated against the construction accident risk factors (e.g., "the 4m pipe," "ear," "to extract staircase") emerging from Construction Safety Management's Integrated Information data scenarios provided by the Ministry of Land, Infrastructure, and Transport, Republic of Korea. Additionally, construction accident precursors (e.g., "grinding," "pipe," "slippery floor") identified from existing literature, which are early indicators or warning signs of potential accidents, were compared with the words with the highest importance scores of saliency visualization. We observed that the words from the saliency visualization are included in the pre-identified accident precursors and risk factors. This study highlights how employing saliency visualization enhances the interpretability of models based on large language processing, providing valuable insights into the underlying causes driving accident predictions.

Application of Domain-specific Thesaurus to Construction Documents based on Flow Margin of Semantic Similarity

  • Youmin PARK;Seonghyeon MOON;Jinwoo KIM;Seokho CHI
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.375-382
    • /
    • 2024
  • Large Language Models (LLMs) still encounter challenges in comprehending domain-specific expressions within construction documents. Analogous to humans acquiring unfamiliar expressions from dictionaries, language models could assimilate domain-specific expressions through the use of a thesaurus. Numerous prior studies have developed construction thesauri; however, a practical issue arises in effectively leveraging these resources for instructing language models. Given that the thesaurus primarily outlines relationships between terms without indicating their relative importance, language models may struggle in discerning which terms to retain or replace. This research aims to establish a robust framework for guiding language models using the information from the thesaurus. For instance, a term would be associated with a list of similar terms while also being included in the lists of other related terms. The relative significance among terms could be ascertained by employing similarity scores normalized according to relevance ranks. Consequently, a term exhibiting a positive margin of normalized similarity scores (termed a pivot term) could semantically replace other related terms, thereby enabling LLMs to comprehend domain-specific terms through these pivotal terms. The outcome of this research presents a practical methodology for utilizing domain-specific thesauri to train LLMs and analyze construction documents. Ongoing evaluation involves validating the accuracy of the thesaurus-applied LLM (e.g., S-BERT) in identifying similarities within construction specification provisions. This outcome holds potential for the construction industry by enhancing LLMs' understanding of construction documents and subsequently improving text mining performance and project management efficiency.

A Suggestion of the Direction of Construction Disaster Document Management through Text Data Classification Model based on Deep Learning (딥러닝 기반 분류 모델의 성능 분석을 통한 건설 재해사례 텍스트 데이터의 효율적 관리방향 제안)

  • Kim, Hayoung;Jang, YeEun;Kang, HyunBin;Son, JeongWook;Yi, June-Seong
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.5
    • /
    • pp.73-85
    • /
    • 2021
  • This study proposes an efficient management direction for Korean construction accident cases through a deep learning-based text data classification model. A deep learning model was developed, which categorizes five categories of construction accidents: fall, electric shock, flying object, collapse, and narrowness, which are representative accident types of KOSHA. After initial model tests, the classification accuracy of fall disasters was relatively high, while other types were classified as fall disasters. Through these results, it was analyzed that 1) specific accident-causing behavior, 2) similar sentence structure, and 3) complex accidents corresponding to multiple types affect the results. Two accuracy improvement experiments were then conducted: 1) reclassification, 2) elimination. As a result, the classification performance improved with 185.7% when eliminating complex accidents. Through this, the multicollinearity of complex accidents, including the contents of multiple accident types, was resolved. In conclusion, this study suggests the necessity to independently manage complex accidents while preparing a system to describe the situation of future accidents in detail.

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

Research on the Application of GIS-based Measures in the Advancement of the Construction Project Information System (건설사업정보시스템의 고도화를 위한 공간정보(GIS) 적용방안에 관한 연구)

  • Ok, Hyun;Kim, Seong-Jin
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.70-79
    • /
    • 2015
  • The Construction Project Information System(CPIS), an information system constructed as part of the Construction Continuous Acquisition & Life-cycle Support(CALS) of the Ministry of Land, Infrastructure, and Transport(MOLIT), is designed to digitize construction projects across all stages, and enable sharing of information so as to enhance the productivity and efficiency of construction projects and secure their transparent administration. One of MOLIT's internal work systems, CPIS focuses on work-handling and data management. However, now over 10 years old after its construction, it focuses on text and document-based construction project information, but it cannot be interfaced with the visualization-based GIS, which limits the sharing and dissemination of information and the determination of the overall construction project status. To resolve the existing CPIS limitations and problems and to upgrade the system, this study examined domestic and overseas GIS technology trends and relevant information systems, and analyzed the CPIS status and problems. It thus proposed total GIS application measures to upgrade CPIS. Also, it identified detailed CPIS utilization measures and GIS application measures by unit system, and analyzed considerations for GIS application.

Topic modeling and topic change trend analysis for advanced construction technologies (건설신기술에 대한 토픽 모델링 및 토픽 변화추이 분석)

  • Jeong, Seong Yun;Kim, Nam Gon
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.102-110
    • /
    • 2021
  • Currently, the advanced construction technology endorsement system is being operated to promote the development of domestic construction technology. We tried to examine the implicit meanings inherent in advanced construction technologies by analyzing the relationship between emerging vocabularies with high importance in relation to the advanced construction technologies endorsed through this system. For this purpose, 918 cases of advanced construction technology information were collected. Based on the endorsed year and summary of the advanced construction technologies, the importance of the emerging vocabularies was measured for each advanced construction technology. And, based on the LDA model, the degree of influence between related vocabularies was evaluated for each of the four topic areas. Topics according to the technical application fields were analyzed. From 1990 to 2021, the trend of changes in highly influential vocabularies by each topic was inferred. In the future, changes in the degree of influence of the topics of environment, machinery, facilities, and maintenance and reinforcement of structures and related technology fields were predicted.