• Title/Summary/Keyword: pre-trained models

Search Result 153, Processing Time 0.03 seconds

Probing Sentence Embeddings in L2 Learners' LSTM Neural Language Models Using Adaptation Learning

  • Kim, Euhee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.13-23
    • /
    • 2022
  • In this study we leveraged a probing method to evaluate how a pre-trained L2 LSTM language model represents sentences with relative and coordinate clauses. The probing experiment employed adapted models based on the pre-trained L2 language models to trace the syntactic properties of sentence embedding vector representations. The dataset for probing was automatically generated using several templates related to different sentence structures. To classify the syntactic properties of sentences for each probing task, we measured the adaptation effects of the language models using syntactic priming. We performed linear mixed-effects model analyses to analyze the relation between adaptation effects in a complex statistical manner and reveal how the L2 language models represent syntactic features for English sentences. When the L2 language models were compared with the baseline L1 Gulordava language models, the analogous results were found for each probing task. In addition, it was confirmed that the L2 language models contain syntactic features of relative and coordinate clauses hierarchically in the sentence embedding representations.

A Methodology on Estimating the Product Life Cycle Cost using Artificial Neural Networks in the Conceptual Design Phase (개념 설계 단계에서 인공 신경망을 이용한 제품의 Life Cycle Cost평가 방법론)

  • 서광규;박지형
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.21 no.9
    • /
    • pp.85-94
    • /
    • 2004
  • As over 70% of the total life cycle cost (LCC) of a product is committed at the early design stage, designers are in an important position to substantially reduce the LCC of the products they design by giving due to life cycle implications of their design decisions. During early design stages, there may be competing concepts with dramatic differences. In addition, the detailed information is scarce and decisions must be made quickly. Thus, both the overhead in developing parametric LCC models fur a wide range of concepts, and the lack of detailed information make the application of traditional LCC models impractical. A different approach is needed, because a traditional LCC method is to be incorporated in the very early design stages. This paper explores an approximate method for providing the preliminary LCC, Learning algorithms trained to use the known characteristics of existing products might allow the LCC of new products to be approximated quickly during the conceptual design phase without the overhead of defining new LCC models. Artificial neural networks are trained to generalize product attributes and LCC data from pre-existing LCC studies. Then the product designers query the trained artificial model with new high-level product attribute data to quickly obtain an LCC for a new product concept. Foundations fur the learning LCC approach are established, and then an application is provided.

Zero-shot Korean Sentiment Analysis with Large Language Models: Comparison with Pre-trained Language Models

  • Soon-Chan Kwon;Dong-Hee Lee;Beak-Cheol Jang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.43-50
    • /
    • 2024
  • This paper evaluates the Korean sentiment analysis performance of large language models like GPT-3.5 and GPT-4 using a zero-shot approach facilitated by the ChatGPT API, comparing them to pre-trained Korean models such as KoBERT. Through experiments utilizing various Korean sentiment analysis datasets in fields like movies, gaming, and shopping, the efficiency of these models is validated. The results reveal that the LMKor-ELECTRA model displayed the highest performance based on F1-score, while GPT-4 particularly achieved high accuracy and F1-scores in movie and shopping datasets. This indicates that large language models can perform effectively in Korean sentiment analysis without prior training on specific datasets, suggesting their potential in zero-shot learning. However, relatively lower performance in some datasets highlights the limitations of the zero-shot based methodology. This study explores the feasibility of using large language models for Korean sentiment analysis, providing significant implications for future research in this area.

Raindrop Removal and Background Information Recovery in Coastal Wave Video Imagery using Generative Adversarial Networks (적대적생성신경망을 이용한 연안 파랑 비디오 영상에서의 빗방울 제거 및 배경 정보 복원)

  • Huh, Dong;Kim, Jaeil;Kim, Jinah
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.5
    • /
    • pp.1-9
    • /
    • 2019
  • In this paper, we propose a video enhancement method using generative adversarial networks to remove raindrops and restore the background information on the removed region in the coastal wave video imagery distorted by raindrops during rainfall. Two experimental models are implemented: Pix2Pix network widely used for image-to-image translation and Attentive GAN, which is currently performing well for raindrop removal on a single images. The models are trained with a public dataset of paired natural images with and without raindrops and the trained models are evaluated their performance of raindrop removal and background information recovery of rainwater distortion of coastal wave video imagery. In order to improve the performance, we have acquired paired video dataset with and without raindrops at the real coast and conducted transfer learning to the pre-trained models with those new dataset. The performance of fine-tuned models is improved by comparing the results from pre-trained models. The performance is evaluated using the peak signal-to-noise ratio and structural similarity index and the fine-tuned Pix2Pix network by transfer learning shows the best performance to reconstruct distorted coastal wave video imagery by raindrops.

Development of Tourism Information Named Entity Recognition Datasets for the Fine-tune KoBERT-CRF Model

  • Jwa, Myeong-Cheol;Jwa, Jeong-Woo
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.55-62
    • /
    • 2022
  • A smart tourism chatbot is needed as a user interface to efficiently provide smart tourism services such as recommended travel products, tourist information, my travel itinerary, and tour guide service to tourists. We have been developed a smart tourism app and a smart tourism information system that provide smart tourism services to tourists. We also developed a smart tourism chatbot service consisting of khaiii morpheme analyzer, rule-based intention classification, and tourism information knowledge base using Neo4j graph database. In this paper, we develop the Korean and English smart tourism Name Entity (NE) datasets required for the development of the NER model using the pre-trained language models (PLMs) for the smart tourism chatbot system. We create the tourism information NER datasets by collecting source data through smart tourism app, visitJeju web of Jeju Tourism Organization (JTO), and web search, and preprocessing it using Korean and English tourism information Name Entity dictionaries. We perform training on the KoBERT-CRF NER model using the developed Korean and English tourism information NER datasets. The weight-averaged precision, recall, and f1 scores are 0.94, 0.92 and 0.94 on Korean and English tourism information NER datasets.

Encoding Dictionary Feature for Deep Learning-based Named Entity Recognition

  • Ronran, Chirawan;Unankard, Sayan;Lee, Seungwoo
    • International Journal of Contents
    • /
    • v.17 no.4
    • /
    • pp.1-15
    • /
    • 2021
  • Named entity recognition (NER) is a crucial task for NLP, which aims to extract information from texts. To build NER systems, deep learning (DL) models are learned with dictionary features by mapping each word in the dataset to dictionary features and generating a unique index. However, this technique might generate noisy labels, which pose significant challenges for the NER task. In this paper, we proposed DL-dictionary features, and evaluated them on two datasets, including the OntoNotes 5.0 dataset and our new infectious disease outbreak dataset named GFID. We used (1) a Bidirectional Long Short-Term Memory (BiLSTM) character and (2) pre-trained embedding to concatenate with (3) our proposed features, named the Convolutional Neural Network (CNN), BiLSTM, and self-attention dictionaries, respectively. The combined features (1-3) were fed through BiLSTM - Conditional Random Field (CRF) to predict named entity classes as outputs. We compared these outputs with other predictions of the BiLSTM character, pre-trained embedding, and dictionary features from previous research, which used the exact matching and partial matching dictionary technique. The findings showed that the model employing our dictionary features outperformed other models that used existing dictionary features. We also computed the F1 score with the GFID dataset to apply this technique to extract medical or healthcare information.

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

  • Nam, Gun-Min;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.157-165
    • /
    • 2021
  • Recently, studies using deep learning to analyze a large amount of text are being actively conducted. In particular, a pre-trained language model that applies the learning results of a large amount of text to the analysis of a specific domain text is attracting attention. Among various pre-trained language models, BERT(Bidirectional Encoder Representations from Transformers)-based model is the most widely used. Recently, research to improve the performance of analysis is being conducted through further pre-training using BERT's MLM(Masked Language Model). However, the traditional MLM has difficulties in clearly understands the meaning of sentences containing new words such as newly coined words. Therefore, in this study, we newly propose NTM(Newly coined words Target Masking), which performs masking only on new words. As a result of analyzing about 700,000 movie reviews of portal 'N' by applying the proposed methodology, it was confirmed that the proposed NTM showed superior performance in terms of accuracy of sensitivity analysis compared to the existing random masking.

A Computer-Aided Diagnosis of Brain Tumors Using a Fine-Tuned YOLO-based Model with Transfer Learning

  • Montalbo, Francis Jesmar P.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.12
    • /
    • pp.4816-4834
    • /
    • 2020
  • This paper proposes transfer learning and fine-tuning techniques for a deep learning model to detect three distinct brain tumors from Magnetic Resonance Imaging (MRI) scans. In this work, the recent YOLOv4 model trained using a collection of 3064 T1-weighted Contrast-Enhanced (CE)-MRI scans that were pre-processed and labeled for the task. This work trained with the partial 29-layer YOLOv4-Tiny and fine-tuned to work optimally and run efficiently in most platforms with reliable performance. With the help of transfer learning, the model had initial leverage to train faster with pre-trained weights from the COCO dataset, generating a robust set of features required for brain tumor detection. The results yielded the highest mean average precision of 93.14%, a 90.34% precision, 88.58% recall, and 89.45% F1-Score outperforming other previous versions of the YOLO detection models and other studies that used bounding box detections for the same task like Faster R-CNN. As concluded, the YOLOv4-Tiny can work efficiently to detect brain tumors automatically at a rapid phase with the help of proper fine-tuning and transfer learning. This work contributes mainly to assist medical experts in the diagnostic process of brain tumors.

A Study on Algorithm of Life Cycle Cost for Improving Reliability in Product Design (제품설계 신뢰성 제고를 위한 LCC의 알고리즘 연구)

  • Kim Dong-Kwan;Jung Soo-Il
    • Journal of the Korea Safety Management & Science
    • /
    • v.7 no.5
    • /
    • pp.155-174
    • /
    • 2005
  • Parametric life-cycle cost(LCC) models have been integrated with traditional design tools, and used in prior work to demonstrate the rapid solution of holistic, analytical tradeoffs between detailed design variations. During early designs stages there may be competing concepts with dramatic differences. Additionally, detailed information is scarce, and decisions must be models. for a diverse range of concepts, and the lack of detailed information make the integration make the integration of traditional LCC models impractical. This paper explores an approximate method for providing preliminary life-cycle cost. Learning algorithms trained using the known characteristics of existing products be approximated quickly during conceptual design without the overhead of defining new models. Artificial neural networks are trained to generalize on product attributes and life cycle cost date from pre-existing LCC studies. The Product attribute data to quickly obtain and LCC for a new and then an application is provided. In additions, the statistical method, called regression analysis, is suggested to predict the LCC. Tests have shown it is possible to predict the life cycle cost, and the comparison results between a learning LCC model and a regression analysis is also shown

A Study on Fine-Tuning and Transfer Learning to Construct Binary Sentiment Classification Model in Korean Text (한글 텍스트 감정 이진 분류 모델 생성을 위한 미세 조정과 전이학습에 관한 연구)

  • JongSoo Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.15-30
    • /
    • 2023
  • Recently, generative models based on the Transformer architecture, such as ChatGPT, have been gaining significant attention. The Transformer architecture has been applied to various neural network models, including Google's BERT(Bidirectional Encoder Representations from Transformers) sentence generation model. In this paper, a method is proposed to create a text binary classification model for determining whether a comment on Korean movie review is positive or negative. To accomplish this, a pre-trained multilingual BERT sentence generation model is fine-tuned and transfer learned using a new Korean training dataset. To achieve this, a pre-trained BERT-Base model for multilingual sentence generation with 104 languages, 12 layers, 768 hidden, 12 attention heads, and 110M parameters is used. To change the pre-trained BERT-Base model into a text classification model, the input and output layers were fine-tuned, resulting in the creation of a new model with 178 million parameters. Using the fine-tuned model, with a maximum word count of 128, a batch size of 16, and 5 epochs, transfer learning is conducted with 10,000 training data and 5,000 testing data. A text sentiment binary classification model for Korean movie review with an accuracy of 0.9582, a loss of 0.1177, and an F1 score of 0.81 has been created. As a result of performing transfer learning with a dataset five times larger, a model with an accuracy of 0.9562, a loss of 0.1202, and an F1 score of 0.86 has been generated.