• 제목/요약/키워드: large-language model

검색결과 341건 처리시간 0.029초

Improving Explainability of Generative Pre-trained Transformer Model for Classification of Construction Accident Types: Validation of Saliency Visualization

  • Byunghee YOO;Yuncheul WOO;Jinwoo KIM;Moonseo PARK;Changbum Ryan AHN
    • 국제학술발표논문집
    • /
    • The 10th International Conference on Construction Engineering and Project Management
    • /
    • pp.1284-1284
    • /
    • 2024
  • Leveraging large language models and safety accident report data has unique potential for analyzing construction accidents, including the classification of accident types, injured parts, and work processes, using unstructured free text accident scenarios. We previously proposed a novel approach that harnesses the power of fine-tuned Generative Pre-trained Transformer to classify 6 types of construction accidents (caught-in-between, cuts, falls, struck-by, trips, and other) with an accuracy of 82.33%. Furthermore, we proposed a novel methodology, saliency visualization, to discern which words are deemed important by black box models within a sentence associated with construction accidents. It helps understand how individual words in an input sentence affect the final output and seeks to make the model's prediction accuracy more understandable and interpretable for users. This involves deliberately altering the position of words within a sentence to reveal their specific roles in shaping the overall output. However, the validation of saliency visualization results remains insufficient and needs further analysis. In this context, this study aims to qualitatively validate the effectiveness of saliency visualization methods. In the exploration of saliency visualization, the elements with the highest importance scores were qualitatively validated against the construction accident risk factors (e.g., "the 4m pipe," "ear," "to extract staircase") emerging from Construction Safety Management's Integrated Information data scenarios provided by the Ministry of Land, Infrastructure, and Transport, Republic of Korea. Additionally, construction accident precursors (e.g., "grinding," "pipe," "slippery floor") identified from existing literature, which are early indicators or warning signs of potential accidents, were compared with the words with the highest importance scores of saliency visualization. We observed that the words from the saliency visualization are included in the pre-identified accident precursors and risk factors. This study highlights how employing saliency visualization enhances the interpretability of models based on large language processing, providing valuable insights into the underlying causes driving accident predictions.

거대언어모델 기반 SHAP 분석을 이용한 리튬 이온 배터리 잔존 수명 예측 기법 해석 (Large Language Model-based SHAP Analysis for Interpretation of Remaining Useful Life Prediction of Lithium-ion Battery)

  • 이재승;유제혁
    • 한국산업정보학회논문지
    • /
    • 제29권5호
    • /
    • pp.51-68
    • /
    • 2024
  • 이동성을 갖춘 전자 장비에 에너지원을 공급하는 리튬 이온 배터리를 안전하게 운영하기 위해서는 배터리의 잔존 수명을 정확히 예측하는 것이 중요하다. 최근, 기계학습 기술의 발달로, 인공지능 기반의 배터리 잔존 수명 예측 모델이 활발히 연구되고 있다. 하지만, 기존 모델들은 모델 내부에서 일어나는 추론 과정을 알 수 없어 기계학습을 통해 예측된 값을 완전히 신뢰하고 사용하는 데 제약이 있었다. 이를 해결하기 위해 여러 설명가능한 인공지능 기법이 제안되었지만, 이러한 기법들은 단순히 결과를 그래프 형태로 시각화하였기에 사용자가 직접 그래프를 분석해야 했다. 이에 본 논문에서는 거대언어모델에 기반한 SHAP 분석을 이용하여 예측 모델의 추론 과정을 텍스트 형태로 해석하는 설명가능한 리튬 이온 배터리 잔존 수명 예측 기법을 제안한다. 공개 리튬 이온 배터리 데이터셋을 이용한 실험 결과, 거대언어모델 기반 SHAP 분석을 통해 모델의 예측 근거를 텍스트 형태로 구체화하여 이해할 수 있었다.

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • 말소리와 음성과학
    • /
    • 제15권3호
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론 (Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning)

  • 정진명;김남규
    • 지능정보연구
    • /
    • 제29권4호
    • /
    • pp.165-184
    • /
    • 2023
  • 최근 구글의 BERT, OpenAI의 GPT 등, 언어모델(Language Model)을 사용한 비정형 텍스트 데이터에 대한 딥러닝(Deep Learning) 분석이 다양한 응용에서 괄목할 성과를 나타내고 있다. 대부분의 언어모델은 사전학습 데이터로부터 범용적인 언어정보를 학습하고, 이후 미세 조정(Fine-Tuning) 과정을 통해 다운스트림 태스크(Downstream Task)에 맞추어 갱신되는 방식으로 사용되고 있다. 하지만 최근 이러한 언어모델을 사용하는 과정에서 프라이버시가 침해될 수 있다는 우려가 제기되고 있다. 즉 데이터 소유자가 언어모델의 미세 조정을 수행하기 위해 다량의 데이터를 모델 소유자에게 제공하는 과정에서 데이터의 프라이버시가 침해될 수 있으며, 반대로 모델 소유자가 모델 전체를 데이터 소유자에게 공개하면 모델의 구조 및 가중치가 공개되어 모델의 프라이버시가 침해될 수 있다는 것이다. 이러한 상황에서 프라이버시를 보호하며 언어모델의 미세 조정을 수행하기 위해 최근 오프사이트 튜닝(Offsite Tuning)의 개념이 제안되었으나, 해당 연구는 제안 방법론을 텍스트 분류 모델에 적용하는 구체적인 방안을 제시하지 못했다는 한계를 갖는다. 이에 본 연구에서는 한글 문서에 대한 다중 분류 미세 조정 수행 시, 모델과 데이터의 프라이버시를 보호하기 위해 분류기를 추가한 오프사이트 튜닝을 적용하는 구체적인 방법을 제시한다. 제안 방법론의 성능을 평가하기 위해 AIHub에서 제공하는 ICT, 전기, 전자, 기계, 그리고 의학 총 5개의 대분야로 구성된 약 20만건의 한글 데이터에 대해 실험을 수행한 결과, 제안하는 플러그인 모델이 제로 샷 모델 및 오프사이트 모델에 비해 분류 정확도 측면에서 우수한 성능을 나타냄을 확인하였다.

영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현 (Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition)

  • 정승운;박병재
    • 센서학회지
    • /
    • 제33권2호
    • /
    • pp.78-85
    • /
    • 2024
  • We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.

Hybrid CTC-Attention Network-Based End-to-End Speech Recognition System for Korean Language

  • Hosung Park;Changmin Kim;Hyunsoo Son;Soonshin Seo;Ji-Hwan Kim
    • Journal of Web Engineering
    • /
    • 제21권2호
    • /
    • pp.265-284
    • /
    • 2021
  • In this study, an automatic end-to-end speech recognition system based on hybrid CTC-attention network for Korean language is proposed. Deep neural network/hidden Markov model (DNN/HMM)-based speech recognition system has driven dramatic improvement in this area. However, it is difficult for non-experts to develop speech recognition for new applications. End-to-end approaches have simplified speech recognition system into a single-network architecture. These approaches can develop speech recognition system that does not require expert knowledge. In this paper, we propose hybrid CTC-attention network as end-to-end speech recognition model for Korean language. This model effectively utilizes a CTC objective function during attention model training. This approach improves the performance in terms of speech recognition accuracy as well as training speed. In most languages, end-to-end speech recognition uses characters as output labels. However, for Korean, character-based end-to-end speech recognition is not an efficient approach because Korean language has 11,172 possible numbers of characters. The number is relatively large compared to other languages. For example, English has 26 characters, and Japanese has 50 characters. To address this problem, we utilize Korean 49 graphemes as output labels. Experimental result shows 10.02% character error rate (CER) when 740 hours of Korean training data are used.

Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals

  • Kiduk Kim;Kyungjin Cho;Ryoungwoo Jang;Sunggu Kyung;Soyoung Lee;Sungwon Ham;Edward Choi;Gil-Sun Hong;Namkug Kim
    • Korean Journal of Radiology
    • /
    • 제25권3호
    • /
    • pp.224-242
    • /
    • 2024
  • The emergence of Chat Generative Pre-trained Transformer (ChatGPT), a chatbot developed by OpenAI, has garnered interest in the application of generative artificial intelligence (AI) models in the medical field. This review summarizes different generative AI models and their potential applications in the field of medicine and explores the evolving landscape of Generative Adversarial Networks and diffusion models since the introduction of generative AI models. These models have made valuable contributions to the field of radiology. Furthermore, this review also explores the significance of synthetic data in addressing privacy concerns and augmenting data diversity and quality within the medical domain, in addition to emphasizing the role of inversion in the investigation of generative models and outlining an approach to replicate this process. We provide an overview of Large Language Models, such as GPTs and bidirectional encoder representations (BERTs), that focus on prominent representatives and discuss recent initiatives involving language-vision models in radiology, including innovative large language and vision assistant for biomedicine (LLaVa-Med), to illustrate their practical application. This comprehensive review offers insights into the wide-ranging applications of generative AI models in clinical research and emphasizes their transformative potential.

iSafe Chatbot: Natural Language Processing and Large Language Model Driven Construction Safety Learning through OSHA Rules and Video Content Delivery

  • Syed Farhan Alam ZAIDI;Muhammad Sibtain ABBAS;Rahat HUSSAIN;Aqsa SABIR;Nasrullah KHAN;Jaehun YANG;Chansik PARK
    • 국제학술발표논문집
    • /
    • The 10th International Conference on Construction Engineering and Project Management
    • /
    • pp.1238-1245
    • /
    • 2024
  • The construction industry faces the challenge of providing effective, engaging, and rule-specific safety learning. Traditional methodologies exhibit limited adaptability to technological advancement and struggle to deliver optimal learning experiences. Recently, there has been widespread adoption of information retrieval and ontology-based chatbots, as well as content delivery methods, for safety learning and education. However, existing information and content retrieval methods often struggle with accessing and presenting relevant safety learning materials efficiently. Additionally, the rigid and complex structures of ontology-based approaches pose obstacles in accommodating dynamic content and scaling for large datasets. They require more computational resources for ontology management. To address these limitations, this paper introduces iSafe Chatbot, a novel framework for construction safety learning. Leveraging Natural Language Processing (NLP) and Large Language Model (LLM), iSafe Chatbot aids safety learning by dynamically retrieving and interpreting relevant Occupational Safety and Health Administration (OSHA) rules from the comprehensive safety regulation database. When a user submits a query, iSafe Chatbot identifies relevant regulations and employs LLM techniques to provide clear explanations with practical examples. Furthermore, based on the user's query and context, iSafe Chatbot recommends training video content from video database, enhancing comprehension and engagement. Through advanced NLP, LLM, and video content delivery, iSafe Chatbot promises to revolutionize safety learning in construction, providing an effective, engaging, and rule-specific experience. Preliminary tests have demonstrated the potential of the iSafe Chatbot. This framework addresses challenges in accessing safety materials and aims to enhance knowledge and adherence to safety protocols within the industry.

유아 언어학습에 대한 하이퍼망 메모리 기반 모델 (Hypernetwork Memory-Based Model for Infant's Language Learning)

  • 이지훈;이은석;장병탁
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제15권12호
    • /
    • pp.983-987
    • /
    • 2009
  • 유아들의 언어습득에 있어서 중요한 점 하나는 학습자에 대한 언어환경의 노출이다. 유아가 접하는 언어환경은 부모와 같은 인간뿐만 아니라 각종 미디어와 같은 인공적 환경도 포함되며, 유아는 이러한 방대한 언어환경을 탐색하면서 언어를 학습한다. 본 연구는 대용량의 언어 데이터 노출이 영향을 미치는 유아언어학습을 유연하고 적절하게 모사하는 인지적 기제에 따른 기계학습 방식을 제안한다. 유아의 초기 언어학습은 문장수준의 학습과 생성 같은 행동들이 수반되는데, 이는 언어 코퍼스에 대한 노출만으로 모사가 가능하다. 모사의 핵심은 언어 하이퍼망 구조를 가진 기억기반 학습모델이다. 언어 하이퍼망은 언어구성 요소들 간의 상위차원 관계 표상을 가능케 함으로써 새로운 데이터 스트림에 대해 유사구조의 적용과 이용을 도모하여 발달적이고 점진적인 학습을 모사한다. 본 연구에서는 11 개의 유아용 비디오로부터 추출한 문장 32744개를 언어 하이퍼망을 통한 점진적 학습을 수행하여 문장을 생성해 유아의 점진적, 발달적 학습을 모사하였다.

Challenges and Future Directions for Large Language Models in Source Code Vulnerability Detection

  • 윤수빈;김현준;백윤흥
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 추계학술발표대회
    • /
    • pp.760-763
    • /
    • 2024
  • Detecting vulnerabilities in source code is essential for maintaining software security, but traditional methods like static and dynamic analysis often struggle with the complexity of modern software systems. Large Language Models (LLMs), such as GPT-4, have emerged as promising tools due to their ability to learn programming language patterns from extensive datasets. However, their application in vulnerability detection faces significant hurdles. This paper explores the key challenges limiting the effectiveness of LLMs in this domain, including limited understanding of code context, scarcity of high-quality training data, accuracy and reliability issues, constrained context windows, and lack of interpretability. We analyze how these factors impede the models' ability to detect complex vulnerabilities and discuss their implications for security-critical applications. To address these challenges, we propose several directions for improvement: developing specialized and diverse datasets, integrating LLMs with traditional static analysis tools, enhancing model architectures for better code comprehension, fostering collaboration between AI systems and human experts, and improving the interpretability of model outputs. By pursuing these strategies, we aim to enhance the capabilities of LLMs in vulnerability detection, contributing to the development of more secure and robust software systems.